home / skills / krishagel / geoffrey / elevenlabs-tts

elevenlabs-tts skill

safe

This skill helps you generate high-quality audio from text using Eleven Labs, enabling podcasts, narration, and voice-overs with customizable voices.

npx playbooks add skill krishagel/geoffrey --skill elevenlabs-tts

Review the files below or copy the command above to add this skill to your agents.

Files (4)

SKILL.md

3.8 KB

---
name: elevenlabs-tts
description: Generate high-quality audio from text using Eleven Labs API. Use for podcasts, narration, voice-overs, and audio summaries.
triggers:
  - "generate audio"
  - "text to speech"
  - "create podcast"
  - "eleven labs"
  - "make audio"
  - "narrate"
  - "voice over"
allowed-tools: Read, Write, Bash
version: 1.0.0
---

# Eleven Labs Text-to-Speech

Generate high-quality audio from text using the Eleven Labs API. Ideal for podcast-style summaries, narration, and voice-overs.

## Quick Start

```bash
# Generate audio with default voice (Rachel)
uv run scripts/generate_audio.py --text "Hello, this is a test."

# Generate from a text file
uv run scripts/generate_audio.py --file report.txt --output ~/Desktop/report.mp3

# Use a specific voice
uv run scripts/generate_audio.py --text "Breaking news..." --voice Josh

# List available voices
uv run scripts/list_voices.py
```

## Scripts

### generate_audio.py

Main script for TTS generation.

**Arguments:**
| Argument | Description | Default |
|----------|-------------|---------|
| `--text` | Text content to convert | - |
| `--file` | Path to text file | - |
| `--voice` | Voice name or ID | Rachel |
| `--model` | Model ID | eleven_multilingual_v2 |
| `--output` | Output file path | ~/Desktop/audio_TIMESTAMP.mp3 |

**Features:**
- Auto-chunks text >10k characters at sentence boundaries
- Concatenates chunks into single MP3
- Returns JSON with metadata (file path, voice, model, char count, chunks)

### list_voices.py

Fetch and display available voices.

**Arguments:**
| Argument | Description | Default |
|----------|-------------|---------|
| `--all` | Show all voices (not just premade) | false |
| `--json` | Output as JSON | false |

## Curated Voices

Six voices selected for variety (see `references/voices.md` for details):

| Voice | Style | Best For |
|-------|-------|----------|
| **Rachel** | Calm, clear | Narration, podcasts (default) |
| **Bella** | Soft, gentle | Storytelling, meditation |
| **Elli** | Young, expressive | Casual content, tutorials |
| **Josh** | Deep, authoritative | News, professional content |
| **Adam** | Middle-aged, clear | Business, documentaries |
| **Antoni** | Warm, versatile | General purpose |

Use `list_voices.py` to discover additional voices.

## Models

| Model | ID | Best For | Char Limit |
|-------|-----|----------|------------|
| **Multilingual v2** | `eleven_multilingual_v2` | Long-form (default) | 10,000 |
| Flash v2.5 | `eleven_flash_v2_5` | Quick, real-time | 40,000 |
| Turbo v2.5 | `eleven_turbo_v2_5` | Balanced | 40,000 |
| Eleven v3 | `eleven_v3` | Maximum expression | 3,000 |

## Environment Setup

Requires `ELEVENLABS_API_KEY` in:
```
~/Library/Mobile Documents/com~apple~CloudDocs/Geoffrey/secrets/.env
```

## Examples

### Daily Summary Podcast
```bash
# Generate individual segments
uv run scripts/generate_audio.py --file weather.txt --voice Rachel --output ~/Desktop/weather.mp3
uv run scripts/generate_audio.py --file news.txt --voice Josh --output ~/Desktop/news.mp3
uv run scripts/generate_audio.py --file calendar.txt --voice Bella --output ~/Desktop/calendar.mp3
```

### Long Report
```bash
# Auto-chunks long text, concatenates into single file
uv run scripts/generate_audio.py --file quarterly_report.txt --model eleven_turbo_v2_5 --output report.mp3
```

### Quick Test
```bash
uv run scripts/generate_audio.py --text "Testing one two three" --voice Adam
```

## Output

Default: `~/Desktop/audio_YYYY-MM-DD_HHMMSS.mp3`

Script returns JSON:
```json
{
  "success": true,
  "file": "/Users/user/Desktop/audio_2026-01-23_143022.mp3",
  "voice": "Rachel",
  "model": "eleven_multilingual_v2",
  "characters": 1234,
  "chunks": 1
}
```

## Limitations

- Character limits vary by model (see table above)
- One voice per generation (multi-voice requires separate files)
- Output format: MP3 only

Overview

This skill generates high-quality audio from text using the Eleven Labs API for podcasts, narration, voice-overs, and audio summaries. It provides ready-to-use scripts to produce MP3 files, auto-chunk long texts, and return metadata about each generation. The default voice and model settings are tuned for long-form narration while allowing quick overrides per-run.

How this skill works

Run the provided CLI scripts to convert either inline text or text files into MP3 audio. The generator selects a voice and model, auto-chunks input exceeding model character limits at sentence boundaries, synthesizes each chunk, and concatenates chunks into a single MP3. The script returns JSON metadata including file path, voice, model, character count, and chunk count.

When to use it

Create podcast segments or daily audio newsletters from written summaries
Produce narrated reports, training modules, or explainer videos
Generate voice-overs for marketing videos and social media content
Quickly test voice options and output using the CLI for small scripts or demos
Convert long documents into a single, continuous MP3 via automatic chunking

Best practices

Choose a model that matches length and expression needs (multilingual v2 for long-form, flash/turbo for larger limits, v3 for expressive short clips)
Respect per-model character limits; rely on auto-chunking for long inputs but split semantically for best pacing
Pick curated voices by use case (Rachel for narration, Josh for news, Bella for calm storytelling)
Provide clean, edited text to avoid awkward pauses or mispronunciations
Run list_voices.py to discover available and premade voices before bulk generation

Example use cases

Daily summary podcast: generate separate segments (weather, news, calendar) and concatenate into an episode
Long report narration: auto-chunk a quarterly report and output a single MP3 for stakeholders
Tutorial voiceover: create step-by-step audio for instructional videos using an expressive voice
Quick QA: run a short text snippet locally to verify voice, pacing, and model selection

FAQ

Which audio formats are supported?

Output is MP3 only; use external tools to convert formats if needed.

Can I use multiple voices in a single output?

Not in one run — the tool supports one voice per generated file. Generate separate files per voice and concatenate externally if required.