home / skills / sickn33 / antigravity-awesome-skills / fal-audio
This skill enables text-to-speech and speech-to-text with fal.ai audio models, streamlining voice-enabled workflows and accessibility in applications.
npx playbooks add skill sickn33/antigravity-awesome-skills --skill fal-audioReview the files below or copy the command above to add this skill to your agents.
---
name: fal-audio
description: "Text-to-speech and speech-to-text using fal.ai audio models"
source: "https://github.com/fal-ai-community/skills/blob/main/skills/claude.ai/fal-audio/SKILL.md"
risk: safe
---
# Fal Audio
## Overview
Text-to-speech and speech-to-text using fal.ai audio models
## When to Use This Skill
Use this skill when you need to work with text-to-speech and speech-to-text using fal.ai audio models.
## Instructions
This skill provides guidance and patterns for text-to-speech and speech-to-text using fal.ai audio models.
For more information, see the [source repository](https://github.com/fal-ai-community/skills/blob/main/skills/claude.ai/fal-audio/SKILL.md).
This skill integrates text-to-speech and speech-to-text using fal.ai audio models. It provides ready patterns and helper code to convert text into natural-sounding audio and transcribe speech back into text. The implementation is focused on practical agent integration for conversational agents and automation workflows.
The skill wraps fal.ai audio model endpoints with concise client code and examples for both synthesis and transcription. It handles audio encoding/decoding, streaming or file-based I/O, and basic request/response orchestration so agents can send text and receive audio, or send audio and receive transcribed text. Error handling and parameter controls for voice, language, and quality are included to support production usage.
Which audio formats are supported?
Common formats like WAV and MP3 are supported; choose the format and sampling rate that match your playback requirements and bandwidth constraints.
How do I handle long recordings?
Split long recordings into chunks, transcribe sequentially, and recombine results. Chunking reduces memory and API timeouts while improving accuracy.
Can I customize voices and languages?
Yes. The skill exposes model parameters for voice selection and language settings; consult model docs for available voices and locale support.