home / skills / vm0-ai / vm0-skills / elevenlabs

elevenlabs skill

/elevenlabs

This skill enables text-to-speech generation using the ElevenLabs API via curl to render realistic voices for apps and media.

npx playbooks add skill vm0-ai/vm0-skills --skill elevenlabs

Review the files below or copy the command above to add this skill to your agents.

Files (1)
SKILL.md
5.4 KB
---
name: elevenlabs
description: ElevenLabs AI voice generation API via curl. Use this skill to convert text to speech with realistic AI voices.
vm0_secrets:
  - ELEVENLABS_API_KEY
---

# ElevenLabs API

Use the ElevenLabs API via direct `curl` calls to generate **realistic AI speech** from text.

> Official docs: `https://elevenlabs.io/docs/api-reference`

---

## When to Use

Use this skill when you need to:

- **Convert text to speech** with high-quality AI voices
- **List available voices** to find the right voice for your use case
- **Stream audio output** for real-time playback
- **Generate voiceovers** for videos, podcasts, or accessibility

---

## Prerequisites

1. Sign up at [ElevenLabs](https://elevenlabs.io/) and create an account
2. Go to your profile settings and generate an API key
3. Store your API key in the environment variable `ELEVENLABS_API_KEY`

```bash
export ELEVENLABS_API_KEY="your-api-key"
```

### API Limits

- Free tier: limited characters per month
- API key is passed via the `xi-api-key` header

---


> **Important:** When using `$VAR` in a command that pipes to another command, wrap the command containing `$VAR` in `bash -c '...'`. Due to a Claude Code bug, environment variables are silently cleared when pipes are used directly.
> ```bash
> bash -c 'curl -s "https://api.example.com" -H "Authorization: Bearer $API_KEY"' | jq .
> ```

## How to Use

All examples below assume you have `ELEVENLABS_API_KEY` set.

The base URL for the ElevenLabs API is:

- `https://api.elevenlabs.io/v1`

---

### 1. List Available Voices

Get all voices available to your account:

```bash
bash -c 'curl -s -X GET "https://api.elevenlabs.io/v1/voices" --header "xi-api-key: ${ELEVENLABS_API_KEY}"' | jq '.voices[] | {voice_id, name, category}'
```

This returns voice IDs needed for text-to-speech. Common voice categories:
- `premade`: ElevenLabs default voices
- `cloned`: Your cloned voices
- `generated`: AI-designed voices

---

### 2. Get Voice Details

Get detailed information about a specific voice. Replace `<your-voice-id>` with an actual voice ID:

```bash
bash -c 'curl -s -X GET "https://api.elevenlabs.io/v1/voices/<your-voice-id>" --header "xi-api-key: ${ELEVENLABS_API_KEY}"'
```

---

### 3. List Available Models

Get all available TTS models:

```bash
bash -c 'curl -s -X GET "https://api.elevenlabs.io/v1/models" --header "xi-api-key: ${ELEVENLABS_API_KEY}"' | jq '.[] | {model_id, name, can_do_text_to_speech}'
```

Common models:
- `eleven_multilingual_v2`: Best quality, supports 29 languages
- `eleven_flash_v2_5`: Low latency, good for real-time
- `eleven_v3`: Latest model (alpha)

---

### 4. Text to Speech (Save to File)

Convert text to speech and save as MP3. Replace `<your-voice-id>` with an actual voice ID from the list voices endpoint:

Write to `/tmp/elevenlabs_request.json`:

```json
{
  "text": "Hello! This is a test of the ElevenLabs text to speech API.",
  "model_id": "eleven_multilingual_v2",
  "voice_settings": {
    "stability": 0.5,
    "similarity_boost": 0.75
  }
}
```

Then run:

```bash
curl -s -X POST "https://api.elevenlabs.io/v1/text-to-speech/<your-voice-id>" --header "xi-api-key: ${ELEVENLABS_API_KEY}" --header "Content-Type: application/json" --header "Accept: audio/mpeg" -d @/tmp/elevenlabs_request.json --output speech.mp3
```

**Voice settings:**

- `stability` (0.0-1.0): Higher = more consistent, lower = more expressive
- `similarity_boost` (0.0-1.0): Higher = closer to original voice

---

### 5. Text to Speech with Streaming

Stream audio for real-time playback. Replace `<your-voice-id>` with an actual voice ID:

Write to `/tmp/elevenlabs_request.json`:

```json
{
  "text": "This audio is being streamed in real-time.",
  "model_id": "eleven_flash_v2_5"
}
```

Then run:

```bash
curl -s -X POST "https://api.elevenlabs.io/v1/text-to-speech/<your-voice-id>/stream" --header "xi-api-key: ${ELEVENLABS_API_KEY}" --header "Content-Type: application/json" --header "Accept: audio/mpeg" -d @/tmp/elevenlabs_request.json --output streamed.mp3
```

---

### 6. Get User Subscription Info

Check your usage and character limits:

```bash
bash -c 'curl -s -X GET "https://api.elevenlabs.io/v1/user/subscription" --header "xi-api-key: ${ELEVENLABS_API_KEY}"' | jq '{character_count, character_limit, tier}'
```

---

## Output Formats

You can specify different output formats via the `output_format` query parameter. Replace `<your-voice-id>` with an actual voice ID:

Write to `/tmp/elevenlabs_request.json`:

```json
{
  "text": "Hello world",
  "model_id": "eleven_multilingual_v2"
}
```

Then run:

```bash
curl -s -X POST "https://api.elevenlabs.io/v1/text-to-speech/<your-voice-id>?output_format=pcm_16000" --header "xi-api-key: ${ELEVENLABS_API_KEY}" --header "Content-Type: application/json" -d @/tmp/elevenlabs_request.json --output speech.pcm
```

Available formats:
- `mp3_44100_192` (default): MP3 at 44.1kHz, 192kbps
- `mp3_44100_128`: MP3 at 44.1kHz, 128kbps
- `pcm_16000`: PCM at 16kHz
- `pcm_22050`: PCM at 22.05kHz
- `pcm_24000`: PCM at 24kHz

---

## Guidelines

1. **Choose the right model**: Use `eleven_flash_v2_5` for low latency, `eleven_multilingual_v2` for best quality
2. **Monitor usage**: Check subscription endpoint to avoid exceeding character limits
3. **Experiment with voice settings**: Adjust stability and similarity_boost for different effects
4. **Use streaming for long text**: Stream endpoint is better for real-time applications
5. **Cache voice IDs**: Store frequently used voice IDs to avoid repeated API calls

Overview

This skill provides ready-to-use curl examples for ElevenLabs AI text-to-speech. It helps you list voices, inspect models, generate high-quality audio files, and stream real-time audio using the ElevenLabs API. Examples assume the ELEVENLABS_API_KEY environment variable is set.

How this skill works

The skill uses direct HTTP calls to the ElevenLabs v1 API via curl, passing the API key in the xi-api-key header. It shows endpoints for listing voices and models, fetching voice details, submitting text-to-speech requests (file or streamed), and checking subscription usage. Request payloads and common response handling with jq are demonstrated.

When to use it

  • Generate realistic voiceovers for videos, podcasts, or accessibility features
  • Stream audio for live playback or low-latency applications
  • Explore and select voices or models available to your account
  • Save synthesized speech to files in MP3 or PCM formats
  • Monitor subscription usage and character limits

Best practices

  • Store your API key in ELEVENLABS_API_KEY and use bash -c when piping to avoid environment variable clearing
  • Pick the appropriate model: eleven_multilingual_v2 for quality, eleven_flash_v2_5 for low latency
  • Cache frequently used voice IDs to reduce repeated list calls
  • Tune voice_settings (stability and similarity_boost) to balance consistency and expressiveness
  • Use the streaming endpoint for long text or real-time UX to reduce latency

Example use cases

  • Batch-generate narration MP3s for an e-learning course using a chosen voice ID
  • Stream short prompts in a chatbot for instant audible replies with eleven_flash_v2_5
  • Clone and test a custom voice, then adjust similarity_boost for final output
  • Export PCM audio for further processing in audio pipelines or ASR systems
  • Automate usage checks by calling the subscription endpoint to avoid overruns

FAQ

How do I authenticate requests?

Set ELEVENLABS_API_KEY and include it as xi-api-key in curl headers.

Which model should I use for best quality?

Use eleven_multilingual_v2 for top quality; use eleven_flash_v2_5 when you need lower latency.