home / skills / sickn33 / antigravity-awesome-skills / fal-audio

fal-audio skill

/skills/fal-audio

This skill enables text-to-speech and speech-to-text with fal.ai audio models, streamlining voice-enabled workflows and accessibility in applications.

This is most likely a fork of the fal-audio skill from xfstudio
npx playbooks add skill sickn33/antigravity-awesome-skills --skill fal-audio

Review the files below or copy the command above to add this skill to your agents.

Files (1)
SKILL.md
694 B
---
name: fal-audio
description: "Text-to-speech and speech-to-text using fal.ai audio models"
source: "https://github.com/fal-ai-community/skills/blob/main/skills/claude.ai/fal-audio/SKILL.md"
risk: safe
---

# Fal Audio

## Overview

Text-to-speech and speech-to-text using fal.ai audio models

## When to Use This Skill

Use this skill when you need to work with text-to-speech and speech-to-text using fal.ai audio models.

## Instructions

This skill provides guidance and patterns for text-to-speech and speech-to-text using fal.ai audio models.

For more information, see the [source repository](https://github.com/fal-ai-community/skills/blob/main/skills/claude.ai/fal-audio/SKILL.md).

Overview

This skill integrates text-to-speech and speech-to-text using fal.ai audio models. It provides ready patterns and helper code to convert text into natural-sounding audio and transcribe speech back into text. The implementation is focused on practical agent integration for conversational agents and automation workflows.

How this skill works

The skill wraps fal.ai audio model endpoints with concise client code and examples for both synthesis and transcription. It handles audio encoding/decoding, streaming or file-based I/O, and basic request/response orchestration so agents can send text and receive audio, or send audio and receive transcribed text. Error handling and parameter controls for voice, language, and quality are included to support production usage.

When to use it

  • Add natural voice output to chatbots, virtual assistants, or accessibility tools.
  • Transcribe user audio, call recordings, or voice notes for logging and analysis.
  • Prototype multimodal agents that accept spoken input and respond with speech.
  • Automate voice-based workflows like IVR, voice forms, or voice-activated controls.
  • Create audio content, narration, or read-aloud features in apps and services.

Best practices

  • Normalize and sanitize input text before synthesis to avoid unexpected pronunciation.
  • Use short audio chunks for streaming or long recordings to improve reliability and latency.
  • Select appropriate voice and sampling settings based on client device and bandwidth.
  • Cache generated audio for repeated responses to reduce API usage and cost.
  • Include confidence thresholds and fallback strategies for low-quality transcriptions.

Example use cases

  • A support bot that speaks answers aloud and records customer voice notes for later analysis.
  • An accessibility mode that reads UI text and transcribes user voice commands.
  • Automated podcast narration: convert article text into a voiced episode with minimal editing.
  • Call center tools that transcribe calls in real time and surface action items to agents.

FAQ

Which audio formats are supported?

Common formats like WAV and MP3 are supported; choose the format and sampling rate that match your playback requirements and bandwidth constraints.

How do I handle long recordings?

Split long recordings into chunks, transcribe sequentially, and recombine results. Chunking reduces memory and API timeouts while improving accuracy.

Can I customize voices and languages?

Yes. The skill exposes model parameters for voice selection and language settings; consult model docs for available voices and locale support.