home / skills / phrazzld / claude-config / voiceover
This skill generates high-quality voiceover audio with ElevenLabs, returning audio and optional word-level timestamps for precise video synchronization.
npx playbooks add skill phrazzld/claude-config --skill voiceoverReview the files below or copy the command above to add this skill to your agents.
---
name: voiceover
description: |
Generate high-quality voiceover audio with ElevenLabs. Includes word-level
timestamps for video sync. Use when: creating demo narration, video voiceover,
podcast intros, or any TTS need. Keywords: voiceover, TTS, text to speech,
ElevenLabs, narration, audio, timestamps.
argument-hint: "[script text or file path]"
effort: high
---
# Voiceover (ElevenLabs)
## What This Does
- Accept script text or file path.
- Preprocess for TTS: expand acronyms, normalize numbers.
- Generate via ElevenLabs API.
- Return audio + optional word timestamps.
## Prerequisites
- `ELEVENLABS_API_KEY` env var set.
- ElevenLabs Creator plan ~ $5/mo for ~100k chars.
## Usage
- `/voiceover "Welcome to Heartbeat..."`
- `/voiceover demo-script.md --timestamps --voice adam`
## Voices
- See `skills/voiceover/references/elevenlabs-voices.md`.
- Default: `adam` (clear, professional).
## Output
- `voiceover.mp3`
- `timestamps.json` (word-level timing when requested)
## Integration
- Used by `/demo-video` for narration sync.
This skill generates high-quality voiceover audio using the ElevenLabs text-to-speech API and can produce word-level timestamps for precise video sync. It accepts raw script text or a file path, preprocesses the text for natural speech, and exports an MP3 plus optional timestamp JSON. Default voice is "adam," optimized for clear, professional narration.
The skill preprocesses input by expanding acronyms and normalizing numbers to improve pronunciation. It sends the cleaned script to the ElevenLabs API, requests audio generation, and returns a downloadable voiceover file. If timestamps are enabled, it also extracts and returns word-level timing data for frame-accurate alignment with video.
What credentials are required?
Set ELEVENLABS_API_KEY in your environment; the Creator plan is sufficient for typical usage.
How do I get word-level timestamps?
Pass the timestamps option when running the skill; it returns a timestamps.json with word timings for sync.