home / skills / benchflow-ai / skillsbench / elevenlabs-tts

This skill generates high-quality speech audio from text using ElevenLabs API, enabling realistic voice synthesis for applications.

npx playbooks add skill benchflow-ai/skillsbench --skill elevenlabs-tts

Review the files below or copy the command above to add this skill to your agents.

Files (1)
SKILL.md
2.3 KB
---
name: elevenlabs-tts
description: "ElevenLabs Text-to-Speech API for high-quality speech synthesis."
---

# ElevenLabs Text-to-Speech

Generate high-quality speech audio from text using the ElevenLabs API.

## Authentication

The API key is available as environment variable:
```bash
ELEVENLABS_API_KEY
```

Include in request header:
```
xi-api-key: $ELEVENLABS_API_KEY
```

## API Endpoints

### Text-to-Speech
```
POST https://api.elevenlabs.io/v1/text-to-speech/{voice_id}
```

**Request Body:**
```json
{
  "text": "Your text here",
  "model_id": "eleven_turbo_v2_5",
  "voice_settings": {
    "stability": 0.5,
    "similarity_boost": 0.5
  }
}
```

**Response:** Audio bytes (mp3 format by default)

## Popular Voice IDs

- `21m00Tcm4TlvDq8ikWAM` - Rachel (female, calm)
- `EXAVITQu4vr4xnSDxMaL` - Bella (female, soft)
- `ErXwobaYiN019PkySvjV` - Antoni (male, warm)
- `TxGEqnHWrfWFTfGW9XjX` - Josh (male, deep)

## Python Example

```python
import os
import requests

ELEVENLABS_API_KEY = os.environ.get("ELEVENLABS_API_KEY")
VOICE_ID = "21m00Tcm4TlvDq8ikWAM"  # Rachel

def text_to_speech(text, output_path):
    url = f"https://api.elevenlabs.io/v1/text-to-speech/{VOICE_ID}"

    headers = {
        "xi-api-key": ELEVENLABS_API_KEY,
        "Content-Type": "application/json"
    }

    data = {
        "text": text,
        "model_id": "eleven_turbo_v2_5",
        "voice_settings": {
            "stability": 0.5,
            "similarity_boost": 0.75
        }
    }

    response = requests.post(url, json=data, headers=headers)

    if response.status_code == 200:
        with open(output_path, "wb") as f:
            f.write(response.content)
        return True
    else:
        print(f"Error: {response.status_code} - {response.text}")
        return False
```

## Handling Long Text

ElevenLabs has a ~5000 character limit per request. For long documents:

1. Split text into chunks at sentence boundaries
2. Generate audio for each chunk
3. Concatenate using ffmpeg or pydub:

```bash
# Create file list
echo "file 'chunk1.mp3'" > files.txt
echo "file 'chunk2.mp3'" >> files.txt

# Concatenate
ffmpeg -f concat -safe 0 -i files.txt -c copy output.mp3
```

## Best Practices

- Split at sentence boundaries to avoid cutting words
- Check rate limits (varies by subscription tier)
- Cache generated audio to avoid redundant API calls

Overview

This skill connects to the ElevenLabs Text-to-Speech API to produce high-quality spoken audio from text. It supports configurable voices, model selection, and voice settings like stability and similarity boost. The output is audio bytes (MP3 by default) suitable for streaming or saving to files.

How this skill works

The skill sends POST requests to ElevenLabs /v1/text-to-speech/{voice_id} with your text, model_id, and optional voice_settings. Authenticate via an xi-api-key header supplied from the ELEVENLABS_API_KEY environment variable. The API returns audio bytes which the skill writes to disk or returns as a stream; long texts must be chunked and concatenated afterward.

When to use it

  • Create spoken narration for apps, videos, or podcasts
  • Add voice output to accessibility features and screen readers
  • Generate voice prompts for IVR, chatbots, or guided tutorials
  • Produce localized or different-voice variations for the same content
  • Batch-convert articles or documentation into audio files

Best practices

  • Store ELEVENLABS_API_KEY securely as an environment variable and never embed it in client code
  • Limit request size to the ~5000-character per-call recommendation; split at sentence boundaries
  • Cache generated audio files to reduce API calls and stay within rate limits
  • Tune voice_settings (stability, similarity_boost) incrementally to avoid synthetic artifacts
  • Concatenate chunked outputs with ffmpeg or pydub rather than re-requesting large combined text

Example use cases

  • Generate chaptered audiobooks by splitting long text into sentence-boundary chunks and concatenating MP3s
  • Produce consistent voiceovers for marketing videos using a selected voice_id
  • Add on-demand narration to an e-learning platform using cached audio snippets
  • Create multi-voice dialogues by requesting different voice_ids for each character
  • Automate voicemail and IVR prompts with stable, pre-rendered audio files

FAQ

How do I authenticate requests?

Set ELEVENLABS_API_KEY as an environment variable and include it in the xi-api-key header for each request.

What if my text is longer than the limit?

Split text into chunks at sentence boundaries, generate audio for each chunk, then concatenate the resulting MP3s with ffmpeg or pydub.