home / skills / sanjay3290 / ai-skills / elevenlabs

elevenlabs skill

/skills/elevenlabs

This skill converts text to audio using ElevenLabs TTS, enabling single-voice narration or dual-host podcast generation from documents.

npx playbooks add skill sanjay3290/ai-skills --skill elevenlabs

Review the files below or copy the command above to add this skill to your agents.

Files (5)
SKILL.md
4.2 KB
---
name: elevenlabs
description: |
  Convert documents and text to audio using ElevenLabs text-to-speech.
  Use this skill when the user wants to create a podcast, narrate a document,
  read aloud text, generate audio from a file, or convert text to speech.
license: Apache-2.0
metadata:
  author: sanjay3290
  version: "1.0"
---

# ElevenLabs - Text-to-Speech & Podcast Skill

## Overview

This skill converts text and documents into high-quality audio using ElevenLabs TTS API. It supports two modes: single-voice narration and two-host conversational podcast generation.

## When to Use This Skill

Activate when the user mentions:
- "create podcast", "generate podcast", "podcast from document"
- "narrate document", "narrate this file", "read aloud"
- "text to speech", "TTS", "convert to audio"
- "audio from document", "audio version of"

## Setup

Config at `skills/elevenlabs/config.json`:
```json
{
  "api_key": "your-elevenlabs-api-key",
  "default_voice": "JBFqnCBsd6RMkjVDRZzb",
  "default_model": "eleven_multilingual_v2",
  "podcast_voice1": "JBFqnCBsd6RMkjVDRZzb",
  "podcast_voice2": "EXAVITQu4vr4xnSDxMaL"
}
```

Only `api_key` is required. Or set `ELEVENLABS_API_KEY` env var.

Dependencies: `pip install PyPDF2 python-docx` (only needed for PDF/DOCX files).

Requires `ffmpeg` for multi-chunk narration and podcasts.

## Commands

### List Voices

```bash
python skills/elevenlabs/scripts/elevenlabs.py voices
python skills/elevenlabs/scripts/elevenlabs.py voices --json
```

Use this to find voice IDs for the user.

### Single-Voice TTS

```bash
# From text
python skills/elevenlabs/scripts/elevenlabs.py tts --text "Hello world" --output ~/Downloads/hello.mp3

# From document
python skills/elevenlabs/scripts/elevenlabs.py tts --file /path/to/doc.pdf --output ~/Downloads/narration.mp3

# With specific voice
python skills/elevenlabs/scripts/elevenlabs.py tts --file doc.md --voice VOICE_ID --output out.mp3
```

The script handles text extraction, chunking at sentence boundaries (~4000 chars), TTS per chunk with voice continuity, and ffmpeg concatenation automatically.

### Podcast Generation

Podcast mode requires a JSON script file with conversation segments:

```json
[
  {"speaker": "host1", "text": "Welcome to our podcast! Today we're diving into..."},
  {"speaker": "host2", "text": "That's right! I found the section on..."},
  {"speaker": "host1", "text": "Let's break that down..."}
]
```

```bash
python skills/elevenlabs/scripts/elevenlabs.py podcast --script /tmp/script.json --voice1 ID1 --voice2 ID2 --output ~/Downloads/podcast.mp3
```

## Podcast Workflow (for Claude)

When the user asks to create a podcast from a document:

1. **Extract the document text**:
   ```bash
   python skills/elevenlabs/scripts/extract.py /path/to/document.pdf
   ```

2. **Generate a two-host conversation script** from the extracted text. Follow these guidelines:
   - Write as a natural, engaging discussion between two hosts
   - Host 1 typically leads/introduces topics, Host 2 adds analysis and reactions
   - Start with a brief intro welcoming listeners and stating the topic
   - End with a summary/outro
   - Keep each turn under 3000 characters
   - Vary turn lengths - mix short reactions with longer explanations
   - Use conversational language: "That's a great point", "What I found interesting was..."
   - Reference specific details from the source document
   - Avoid reading the document verbatim - discuss and interpret it

3. **Write the script** as a JSON array to a temp file:
   ```python
   # Write to /tmp/podcast_script.json
   [
     {"speaker": "host1", "text": "Welcome to today's episode..."},
     {"speaker": "host2", "text": "Thanks for having me..."},
     ...
   ]
   ```

4. **Generate the podcast**:
   ```bash
   python skills/elevenlabs/scripts/elevenlabs.py podcast --script /tmp/podcast_script.json --output ~/Downloads/podcast.mp3
   ```

5. **Clean up** the temp script file.

## Tips

- Run `voices` first to let the user pick voices they like
- For podcasts, suggest voice pairs with contrasting qualities (e.g., one deep, one bright)
- Default output to `~/Downloads/` unless the user specifies otherwise
- For large documents, warn the user about character usage on their ElevenLabs plan

Overview

This skill converts documents and plain text into high-quality audio using the ElevenLabs text-to-speech API. It supports single-voice narration for straightforward reading and a two-host conversational mode for podcast-style productions. The skill handles text extraction, chunking, voice continuity, and final audio assembly so you get a ready-to-use MP3.

How this skill works

The skill accepts raw text or common document files (PDF, DOCX, Markdown). It extracts and chunks text at sentence boundaries, sends chunks to ElevenLabs TTS with the selected voice(s), and concatenates resulting audio segments with ffmpeg when needed. For podcasts, you supply or generate a JSON conversation script; the skill renders each speaker turn with a distinct voice and outputs a single assembled file.

When to use it

  • You want an audio version of an article, report, or ebook for listening.
  • You need a narrated recording of a PDF or DOCX file for accessibility or review.
  • You want to produce a short conversational podcast from a document or script.
  • You need fast text-to-speech for demos, prototypes, or voice previews.
  • You want to generate separate host voices for multi-speaker audio.
  • You need batch narration of large documents with automatic chunking.

Best practices

  • Run the voices command first to let listeners pick realistic voice IDs.
  • Keep each podcast turn under 3000 characters and vary turn length for natural flow.
  • Use conversational rewriting when making podcasts—don’t read the source verbatim.
  • Install ffmpeg and the optional PyPDF2/python-docx deps if you plan to process PDFs or DOCX.
  • Warn users about ElevenLabs character usage on large documents or frequent runs.

Example use cases

  • Convert a research paper PDF into an MP3 for commuting listening.
  • Narrate a product manual into a single audio file for onboarding.
  • Generate a two-host podcast discussing a long-form article with host reactions.
  • Create short voice clips from marketing copy for prototyping voice UI.
  • Produce audiobook-style narration from Markdown or plain text files.

FAQ

What input formats are supported?

Plain text, Markdown, PDF, and DOCX are supported; PDF/DOCX require optional dependencies.

Do I need an ElevenLabs API key?

Yes. You must set an api_key in the config or the ELEVENLABS_API_KEY environment variable.

Is ffmpeg required?

ffmpeg is required for multi-chunk concatenation and podcast assembly; single small clips may work without it.