home / skills / openclaw / skills / elevenlabs-voices

elevenlabs-voices skill

/skills/robbyczgw-cla/elevenlabs-voices

This skill synthesizes high-quality ElevenLabs voices across 32 languages, enabling fast, customizable TTS with batch processing and voice design.

npx playbooks add skill openclaw/skills --skill elevenlabs-voices

Review the files below or copy the command above to add this skill to your agents.

Files (13)
SKILL.md
12.1 KB
---
name: elevenlabs-voices
version: 2.1.6
description: High-quality voice synthesis with 18 personas, 32 languages, sound effects, batch processing, and voice design using ElevenLabs API.
tags: [tts, voice, speech, elevenlabs, audio, sound-effects, voice-design, multilingual]
metadata: {"openclaw":{"requires":{"bins":["python3"],"env":{"ELEVEN_API_KEY":"required","ELEVENLABS_API_KEY":"optional"},"note":"Set ELEVEN_API_KEY. ELEVENLABS_API_KEY is an accepted alias."}}}
---

# ElevenLabs Voice Personas v2.1

Comprehensive voice synthesis toolkit using ElevenLabs API.

## πŸš€ First Run - Setup Wizard

When you first use this skill (no `config.json` exists), run the interactive setup wizard:

```bash
python3 scripts/setup.py
```

The wizard will guide you through:
1. **API Key** - Enter your ElevenLabs API key (required)
2. **Default Voice** - Choose from popular voices (Rachel, Adam, Bella, etc.)
3. **Language** - Set your preferred language (32 supported)
4. **Audio Quality** - Standard or high quality output
5. **Cost Tracking** - Enable usage and cost monitoring
6. **Budget Limit** - Optional monthly spending cap

**πŸ”’ Privacy:** Your API key is stored locally in `config.json` only. It never leaves your machine and is automatically excluded from git via `.gitignore`.

To reconfigure at any time, simply run the setup wizard again.

---

## ✨ Features

- **18 Voice Personas** - Carefully curated voices for different use cases
- **32 Languages** - Multi-language synthesis with the multilingual v2 model
- **Streaming Mode** - Real-time audio output as it generates
- **Sound Effects (SFX)** - AI-generated sound effects from text prompts
- **Batch Processing** - Process multiple texts in one go
- **Cost Tracking** - Monitor character usage and estimated costs
- **Voice Design** - Create custom voices from descriptions
- **Pronunciation Dictionary** - Custom word pronunciation rules
- **OpenClaw Integration** - Works with OpenClaw's built-in TTS

---

## πŸŽ™ Available Voices

| Voice | Accent | Gender | Persona | Best For |
|-------|--------|--------|---------|----------|
| rachel | πŸ‡ΊπŸ‡Έ US | female | warm | Conversations, tutorials |
| adam | πŸ‡ΊπŸ‡Έ US | male | narrator | Documentaries, audiobooks |
| bella | πŸ‡ΊπŸ‡Έ US | female | professional | Business, presentations |
| brian | πŸ‡ΊπŸ‡Έ US | male | comforting | Meditation, calm content |
| george | πŸ‡¬πŸ‡§ UK | male | storyteller | Audiobooks, storytelling |
| alice | πŸ‡¬πŸ‡§ UK | female | educator | Tutorials, explanations |
| callum | πŸ‡ΊπŸ‡Έ US | male | trickster | Playful, gaming |
| charlie | πŸ‡¦πŸ‡Ί AU | male | energetic | Sports, motivation |
| jessica | πŸ‡ΊπŸ‡Έ US | female | playful | Social media, casual |
| lily | πŸ‡¬πŸ‡§ UK | female | actress | Drama, elegant content |
| matilda | πŸ‡ΊπŸ‡Έ US | female | professional | Corporate, news |
| river | πŸ‡ΊπŸ‡Έ US | neutral | neutral | Inclusive, informative |
| roger | πŸ‡ΊπŸ‡Έ US | male | casual | Podcasts, relaxed |
| daniel | πŸ‡¬πŸ‡§ UK | male | broadcaster | News, announcements |
| eric | πŸ‡ΊπŸ‡Έ US | male | trustworthy | Business, corporate |
| chris | πŸ‡ΊπŸ‡Έ US | male | friendly | Tutorials, approachable |
| will | πŸ‡ΊπŸ‡Έ US | male | optimist | Motivation, uplifting |
| liam | πŸ‡ΊπŸ‡Έ US | male | social | YouTube, social media |

## 🎯 Quick Presets

- `default` β†’ rachel (warm, friendly)
- `narrator` β†’ adam (documentaries)
- `professional` β†’ matilda (corporate)
- `storyteller` β†’ george (audiobooks)
- `educator` β†’ alice (tutorials)
- `calm` β†’ brian (meditation)
- `energetic` β†’ liam (social media)
- `trustworthy` β†’ eric (business)
- `neutral` β†’ river (inclusive)
- `british` β†’ george
- `australian` β†’ charlie
- `broadcaster` β†’ daniel (news)

---

## 🌍 Supported Languages (32)

The multilingual v2 model supports these languages:

| Code | Language | Code | Language |
|------|----------|------|----------|
| en | English | pl | Polish |
| de | German | nl | Dutch |
| es | Spanish | sv | Swedish |
| fr | French | da | Danish |
| it | Italian | fi | Finnish |
| pt | Portuguese | no | Norwegian |
| ru | Russian | tr | Turkish |
| uk | Ukrainian | cs | Czech |
| ja | Japanese | sk | Slovak |
| ko | Korean | hu | Hungarian |
| zh | Chinese | ro | Romanian |
| ar | Arabic | bg | Bulgarian |
| hi | Hindi | hr | Croatian |
| ta | Tamil | el | Greek |
| id | Indonesian | ms | Malay |
| vi | Vietnamese | th | Thai |

```bash
# Synthesize in German
python3 tts.py --text "Guten Tag!" --voice rachel --lang de

# Synthesize in French
python3 tts.py --text "Bonjour le monde!" --voice adam --lang fr

# List all languages
python3 tts.py --languages
```

---

## πŸ’» CLI Usage

### Basic Text-to-Speech

```bash
# List all voices
python3 scripts/tts.py --list

# Generate speech
python3 scripts/tts.py --text "Hello world" --voice rachel --output hello.mp3

# Use a preset
python3 scripts/tts.py --text "Breaking news..." --voice broadcaster --output news.mp3

# Multi-language
python3 scripts/tts.py --text "Bonjour!" --voice rachel --lang fr --output french.mp3
```

### Streaming Mode

Generate audio with real-time streaming (good for long texts):

```bash
# Stream audio as it generates
python3 scripts/tts.py --text "This is a long story..." --voice adam --stream

# Streaming with custom output
python3 scripts/tts.py --text "Chapter one..." --voice george --stream --output chapter1.mp3
```

### Batch Processing

Process multiple texts from a file:

```bash
# From newline-separated text file
python3 scripts/tts.py --batch texts.txt --voice rachel --output-dir ./audio

# From JSON file
python3 scripts/tts.py --batch batch.json --output-dir ./output
```

**JSON batch format:**
```json
[
  {"text": "First line", "voice": "rachel", "output": "line1.mp3"},
  {"text": "Second line", "voice": "adam", "output": "line2.mp3"},
  {"text": "Third line"}
]
```

**Simple text format (one per line):**
```
Hello, this is the first sentence.
This is the second sentence.
And this is the third.
```

### Usage Statistics

```bash
# Show usage stats and cost estimates
python3 scripts/tts.py --stats

# Reset statistics
python3 scripts/tts.py --reset-stats
```

---

## 🎡 Sound Effects (SFX)

Generate AI-powered sound effects from text descriptions:

```bash
# Generate a sound effect
python3 scripts/sfx.py --prompt "Thunder rumbling in the distance"

# With specific duration (0.5-22 seconds)
python3 scripts/sfx.py --prompt "Cat meowing" --duration 3 --output cat.mp3

# Adjust prompt influence (0.0-1.0)
python3 scripts/sfx.py --prompt "Footsteps on gravel" --influence 0.5

# Batch SFX generation
python3 scripts/sfx.py --batch sounds.json --output-dir ./sfx

# Show prompt examples
python3 scripts/sfx.py --examples
```

**Example prompts:**
- "Thunder rumbling in the distance"
- "Cat purring contentedly"
- "Typing on a mechanical keyboard"
- "Spaceship engine humming"
- "Coffee shop background chatter"

---

## 🎨 Voice Design

Create custom voices from text descriptions:

```bash
# Basic voice design
python3 scripts/voice-design.py --gender female --age middle_aged --accent american \
  --description "A warm, motherly voice"

# With custom preview text
python3 scripts/voice-design.py --gender male --age young --accent british \
  --text "Welcome to the adventure!" --output preview.mp3

# Save to your ElevenLabs library
python3 scripts/voice-design.py --gender female --age young --accent american \
  --description "Energetic podcast host" --save "MyHost"

# List all design options
python3 scripts/voice-design.py --options
```

**Voice Design Options:**

| Option | Values |
|--------|--------|
| Gender | male, female, neutral |
| Age | young, middle_aged, old |
| Accent | american, british, african, australian, indian, latin, middle_eastern, scandinavian, eastern_european |
| Accent Strength | 0.3-2.0 (subtle to strong) |

---

## πŸ“– Pronunciation Dictionary

Customize how words are pronounced:

Edit `pronunciations.json`:
```json
{
  "rules": [
    {
      "word": "OpenClaw",
      "replacement": "Open Claw",
      "comment": "Pronounce as two words"
    },
    {
      "word": "API",
      "replacement": "A P I",
      "comment": "Spell out acronym"
    }
  ]
}
```

Usage:
```bash
# Pronunciations are applied automatically
python3 scripts/tts.py --text "The OpenClaw API is great" --voice rachel

# Disable pronunciations
python3 scripts/tts.py --text "The API is great" --voice rachel --no-pronunciations
```

---

## πŸ’° Cost Tracking

The skill tracks your character usage and estimates costs:

```bash
python3 scripts/tts.py --stats
```

**Output:**
```
πŸ“Š ElevenLabs Usage Statistics

  Total Characters: 15,230
  Total Requests:   42
  Since:            2024-01-15

πŸ’° Estimated Costs:
  Starter    $4.57 ($0.30/1k chars)
  Creator    $3.66 ($0.24/1k chars)
  Pro        $2.74 ($0.18/1k chars)
  Scale      $1.68 ($0.11/1k chars)
```

---

## πŸ€– OpenClaw TTS Integration

### Using with OpenClaw's Built-in TTS

OpenClaw has built-in TTS support that can use ElevenLabs. Configure in `~/.openclaw/openclaw.json`:

```json
{
  "tts": {
    "enabled": true,
    "provider": "elevenlabs",
    "elevenlabs": {
      "apiKey": "your-api-key-here",
      "voice": "rachel",
      "model": "eleven_multilingual_v2"
    }
  }
}
```

### Triggering TTS in Chat

In OpenClaw conversations:
- Use `/tts on` to enable automatic TTS
- Use the `tts` tool directly for one-off speech
- Request "read this aloud" or "speak this"

### Using Skill Scripts from OpenClaw

```bash
# OpenClaw can run these scripts directly
exec python3 /path/to/skills/elevenlabs-voices/scripts/tts.py --text "Hello" --voice rachel
```

---

## βš™ Configuration

The scripts look for API key in this order:

1. `ELEVEN_API_KEY` or `ELEVENLABS_API_KEY` environment variable
2. Skill-local `.env` file (in the skill directory)

**Create .env file:**
```bash
echo 'ELEVEN_API_KEY=your-key-here' > .env
```

> **Note:** The skill no longer reads from `~/.openclaw/openclaw.json`. Use environment variables or the skill-local `.env` file.

---

## πŸŽ› Voice Settings

Each voice has tuned settings for optimal output:

| Setting | Range | Description |
|---------|-------|-------------|
| stability | 0.0-1.0 | Higher = consistent, lower = expressive |
| similarity_boost | 0.0-1.0 | How closely to match original voice |
| style | 0.0-1.0 | Exaggeration of speaking style |

---

## πŸ“ Triggers

- "use {voice_name} voice"
- "speak as {persona}"
- "list voices"
- "voice settings"
- "generate sound effect"
- "design a voice"

---

## πŸ“ Files

```
elevenlabs-voices/
β”œβ”€β”€ SKILL.md              # This documentation
β”œβ”€β”€ README.md             # Quick start guide
β”œβ”€β”€ config.json           # Your local config (created by setup, in .gitignore)
β”œβ”€β”€ voices.json           # Voice definitions & settings
β”œβ”€β”€ pronunciations.json   # Custom pronunciation rules
β”œβ”€β”€ examples.md           # Detailed usage examples
β”œβ”€β”€ scripts/
β”‚   β”œβ”€β”€ setup.py          # Interactive setup wizard
β”‚   β”œβ”€β”€ tts.py            # Main TTS script
β”‚   β”œβ”€β”€ sfx.py            # Sound effects generator
β”‚   └── voice-design.py   # Voice design tool
└── references/
    └── voice-guide.md    # Voice selection guide
```

---

## πŸ”— Links

- [ElevenLabs](https://elevenlabs.io)
- [API Documentation](https://docs.elevenlabs.io)
- [Voice Library](https://elevenlabs.io/voice-library)
- [Sound Effects API](https://elevenlabs.io/docs/api-reference/sound-generation)
- [Voice Design API](https://elevenlabs.io/docs/api-reference/voice-generation)

---

## πŸ“‹ Changelog

### v2.1.0
- Added interactive setup wizard (`scripts/setup.py`)
- Onboarding guides through API key, voice, language, quality, and budget settings
- Config stored locally in `config.json` (added to `.gitignore`)
- Professional, privacy-focused setup experience

### v2.0.0
- Added 32 language support with `--lang` parameter
- Added streaming mode with `--stream` flag
- Added sound effects generation (`sfx.py`)
- Added batch processing with `--batch` flag
- Added cost tracking with `--stats` flag
- Added voice design tool (`voice-design.py`)
- Added pronunciation dictionary support
- Added OpenClaw TTS integration documentation
- Improved error handling and progress output

Overview

This skill provides high-quality voice synthesis using the ElevenLabs API, packaged as CLI scripts and utilities. It offers 18 curated voice personas, support for 32 languages, streaming output, batch processing, SFX generation, and a voice design tool. A guided setup wizard stores your API key locally and helps configure default voice, language, quality, and cost tracking. The tooling is designed to integrate with OpenClaw or run standalone from the command line.

How this skill works

The skill exposes small Python scripts that call ElevenLabs endpoints to synthesize speech, generate sound effects, and create custom voices. It inspects local configuration (environment variables, OpenClaw config, or a .env file) to find your API key and applies voice presets, pronunciation rules, and per-voice tuning parameters. Streaming mode emits audio in real time for long text, while batch mode reads newline or JSON lists to produce many files in one run. Usage statistics and character-based cost estimates are collected locally for budgeting.

When to use it

  • Create narrated content quickly from scripts or articles
  • Produce multilingual TTS for tutorials, announcements, or accessibility
  • Generate short AI sound effects for games, videos, or prototypes
  • Design and preview custom voice personas before saving them to ElevenLabs
  • Batch-convert many lines of text into audio files for podcasts or e-learning

Best practices

  • Run the interactive setup on first use to securely store your API key and set defaults
  • Use presets for consistent voice selection across projects and to simplify CI automation
  • Enable streaming for very long texts to reduce latency and memory footprint
  • Keep pronunciation rules in pronunciations.json for brand names and acronyms
  • Monitor the --stats output regularly and set a monthly budget cap to avoid unexpected costs

Example use cases

  • Generate a narrated chapter: scripts/tts.py --text-file chapter.txt --voice storyteller --output chapter.mp3
  • Produce product announcements in multiple languages by looping lang codes and voices
  • Create background SFX for a scene: scripts/sfx.py --prompt "Thunder rumbling" --duration 4
  • Design a new host voice then save it to your ElevenLabs library and preview: scripts/voice-design.py --description "Warm podcast host" --save MyHost
  • Integrate with OpenClaw by enabling the elevenlabs provider in ~/.openclaw/openclaw.json

FAQ

How do I set the API key?

Run the setup wizard (python3 scripts/setup.py) or export ELEVEN_API_KEY / ELEVENLABS_API_KEY or create a .env file.

Can I synthesize in other languages?

Yes. The multilingual v2 model supports 32 languages; pass --lang with the language code.

How do batch jobs accept input?

Batch accepts a newline-separated text file or a JSON array with text, voice, and output fields.