home / skills / vm0-ai / vm0-skills / elevenlabs

elevenlabs skill

safe

This skill enables text-to-speech generation using the ElevenLabs API via curl to render realistic voices for apps and media.

npx playbooks add skill vm0-ai/vm0-skills --skill elevenlabs

Review the files below or copy the command above to add this skill to your agents.

Files (1)

SKILL.md

5.4 KB

---
name: elevenlabs
description: ElevenLabs AI voice generation API via curl. Use this skill to convert text to speech with realistic AI voices.
vm0_secrets:
  - ELEVENLABS_API_KEY
---

# ElevenLabs API

Use the ElevenLabs API via direct `curl` calls to generate **realistic AI speech** from text.

> Official docs: `https://elevenlabs.io/docs/api-reference`

---

## When to Use

Use this skill when you need to:

- **Convert text to speech** with high-quality AI voices
- **List available voices** to find the right voice for your use case
- **Stream audio output** for real-time playback
- **Generate voiceovers** for videos, podcasts, or accessibility

---

## Prerequisites

1. Sign up at [ElevenLabs](https://elevenlabs.io/) and create an account
2. Go to your profile settings and generate an API key
3. Store your API key in the environment variable `ELEVENLABS_API_KEY`

```bash
export ELEVENLABS_API_KEY="your-api-key"
```

### API Limits

- Free tier: limited characters per month
- API key is passed via the `xi-api-key` header

---


> **Important:** When using `$VAR` in a command that pipes to another command, wrap the command containing `$VAR` in `bash -c '...'`. Due to a Claude Code bug, environment variables are silently cleared when pipes are used directly.
> ```bash
> bash -c 'curl -s "https://api.example.com" -H "Authorization: Bearer $API_KEY"' | jq .
> ```

## How to Use

All examples below assume you have `ELEVENLABS_API_KEY` set.

The base URL for the ElevenLabs API is:

- `https://api.elevenlabs.io/v1`

---

### 1. List Available Voices

Get all voices available to your account:

```bash
bash -c 'curl -s -X GET "https://api.elevenlabs.io/v1/voices" --header "xi-api-key: ${ELEVENLABS_API_KEY}"' | jq '.voices[] | {voice_id, name, category}'
```

This returns voice IDs needed for text-to-speech. Common voice categories:
- `premade`: ElevenLabs default voices
- `cloned`: Your cloned voices
- `generated`: AI-designed voices

---

### 2. Get Voice Details

Get detailed information about a specific voice. Replace `<your-voice-id>` with an actual voice ID:

```bash
bash -c 'curl -s -X GET "https://api.elevenlabs.io/v1/voices/<your-voice-id>" --header "xi-api-key: ${ELEVENLABS_API_KEY}"'
```

---

### 3. List Available Models

Get all available TTS models:

```bash
bash -c 'curl -s -X GET "https://api.elevenlabs.io/v1/models" --header "xi-api-key: ${ELEVENLABS_API_KEY}"' | jq '.[] | {model_id, name, can_do_text_to_speech}'
```

Common models:
- `eleven_multilingual_v2`: Best quality, supports 29 languages
- `eleven_flash_v2_5`: Low latency, good for real-time
- `eleven_v3`: Latest model (alpha)

---

### 4. Text to Speech (Save to File)

Convert text to speech and save as MP3. Replace `<your-voice-id>` with an actual voice ID from the list voices endpoint:

Write to `/tmp/elevenlabs_request.json`:

```json
{
  "text": "Hello! This is a test of the ElevenLabs text to speech API.",
  "model_id": "eleven_multilingual_v2",
  "voice_settings": {
    "stability": 0.5,
    "similarity_boost": 0.75
  }
}
```

Then run:

```bash
curl -s -X POST "https://api.elevenlabs.io/v1/text-to-speech/<your-voice-id>" --header "xi-api-key: ${ELEVENLABS_API_KEY}" --header "Content-Type: application/json" --header "Accept: audio/mpeg" -d @/tmp/elevenlabs_request.json --output speech.mp3
```

**Voice settings:**

- `stability` (0.0-1.0): Higher = more consistent, lower = more expressive
- `similarity_boost` (0.0-1.0): Higher = closer to original voice

---

### 5. Text to Speech with Streaming

Stream audio for real-time playback. Replace `<your-voice-id>` with an actual voice ID:

Write to `/tmp/elevenlabs_request.json`:

```json
{
  "text": "This audio is being streamed in real-time.",
  "model_id": "eleven_flash_v2_5"
}
```

Then run:

```bash
curl -s -X POST "https://api.elevenlabs.io/v1/text-to-speech/<your-voice-id>/stream" --header "xi-api-key: ${ELEVENLABS_API_KEY}" --header "Content-Type: application/json" --header "Accept: audio/mpeg" -d @/tmp/elevenlabs_request.json --output streamed.mp3
```

---

### 6. Get User Subscription Info

Check your usage and character limits:

```bash
bash -c 'curl -s -X GET "https://api.elevenlabs.io/v1/user/subscription" --header "xi-api-key: ${ELEVENLABS_API_KEY}"' | jq '{character_count, character_limit, tier}'
```

---

## Output Formats

You can specify different output formats via the `output_format` query parameter. Replace `<your-voice-id>` with an actual voice ID:

Write to `/tmp/elevenlabs_request.json`:

```json
{
  "text": "Hello world",
  "model_id": "eleven_multilingual_v2"
}
```

Then run:

```bash
curl -s -X POST "https://api.elevenlabs.io/v1/text-to-speech/<your-voice-id>?output_format=pcm_16000" --header "xi-api-key: ${ELEVENLABS_API_KEY}" --header "Content-Type: application/json" -d @/tmp/elevenlabs_request.json --output speech.pcm
```

Available formats:
- `mp3_44100_192` (default): MP3 at 44.1kHz, 192kbps
- `mp3_44100_128`: MP3 at 44.1kHz, 128kbps
- `pcm_16000`: PCM at 16kHz
- `pcm_22050`: PCM at 22.05kHz
- `pcm_24000`: PCM at 24kHz

---

## Guidelines

1. **Choose the right model**: Use `eleven_flash_v2_5` for low latency, `eleven_multilingual_v2` for best quality
2. **Monitor usage**: Check subscription endpoint to avoid exceeding character limits
3. **Experiment with voice settings**: Adjust stability and similarity_boost for different effects
4. **Use streaming for long text**: Stream endpoint is better for real-time applications
5. **Cache voice IDs**: Store frequently used voice IDs to avoid repeated API calls

Overview

This skill provides ready-to-use curl examples for ElevenLabs AI text-to-speech. It helps you list voices, inspect models, generate high-quality audio files, and stream real-time audio using the ElevenLabs API. Examples assume the ELEVENLABS_API_KEY environment variable is set.

How this skill works

The skill uses direct HTTP calls to the ElevenLabs v1 API via curl, passing the API key in the xi-api-key header. It shows endpoints for listing voices and models, fetching voice details, submitting text-to-speech requests (file or streamed), and checking subscription usage. Request payloads and common response handling with jq are demonstrated.

When to use it

Generate realistic voiceovers for videos, podcasts, or accessibility features
Stream audio for live playback or low-latency applications
Explore and select voices or models available to your account
Save synthesized speech to files in MP3 or PCM formats
Monitor subscription usage and character limits

Best practices

Store your API key in ELEVENLABS_API_KEY and use bash -c when piping to avoid environment variable clearing
Pick the appropriate model: eleven_multilingual_v2 for quality, eleven_flash_v2_5 for low latency
Cache frequently used voice IDs to reduce repeated list calls
Tune voice_settings (stability and similarity_boost) to balance consistency and expressiveness
Use the streaming endpoint for long text or real-time UX to reduce latency

Example use cases

Batch-generate narration MP3s for an e-learning course using a chosen voice ID
Stream short prompts in a chatbot for instant audible replies with eleven_flash_v2_5
Clone and test a custom voice, then adjust similarity_boost for final output
Export PCM audio for further processing in audio pipelines or ASR systems
Automate usage checks by calling the subscription endpoint to avoid overruns

FAQ

How do I authenticate requests?

Set ELEVENLABS_API_KEY and include it as xi-api-key in curl headers.

Which model should I use for best quality?

Use eleven_multilingual_v2 for top quality; use eleven_flash_v2_5 when you need lower latency.