home / skills / phrazzld / claude-config / voiceover

voiceover skill

/skills/voiceover

This skill generates high-quality voiceover audio with ElevenLabs, returning audio and optional word-level timestamps for precise video synchronization.

npx playbooks add skill phrazzld/claude-config --skill voiceover

Review the files below or copy the command above to add this skill to your agents.

Files (3)
SKILL.md
1.0 KB
---
name: voiceover
description: |
  Generate high-quality voiceover audio with ElevenLabs. Includes word-level
  timestamps for video sync. Use when: creating demo narration, video voiceover,
  podcast intros, or any TTS need. Keywords: voiceover, TTS, text to speech,
  ElevenLabs, narration, audio, timestamps.
argument-hint: "[script text or file path]"
effort: high
---

# Voiceover (ElevenLabs)

## What This Does
- Accept script text or file path.
- Preprocess for TTS: expand acronyms, normalize numbers.
- Generate via ElevenLabs API.
- Return audio + optional word timestamps.

## Prerequisites
- `ELEVENLABS_API_KEY` env var set.
- ElevenLabs Creator plan ~ $5/mo for ~100k chars.

## Usage
- `/voiceover "Welcome to Heartbeat..."`
- `/voiceover demo-script.md --timestamps --voice adam`

## Voices
- See `skills/voiceover/references/elevenlabs-voices.md`.
- Default: `adam` (clear, professional).

## Output
- `voiceover.mp3`
- `timestamps.json` (word-level timing when requested)

## Integration
- Used by `/demo-video` for narration sync.

Overview

This skill generates high-quality voiceover audio using the ElevenLabs text-to-speech API and can produce word-level timestamps for precise video sync. It accepts raw script text or a file path, preprocesses the text for natural speech, and exports an MP3 plus optional timestamp JSON. Default voice is "adam," optimized for clear, professional narration.

How this skill works

The skill preprocesses input by expanding acronyms and normalizing numbers to improve pronunciation. It sends the cleaned script to the ElevenLabs API, requests audio generation, and returns a downloadable voiceover file. If timestamps are enabled, it also extracts and returns word-level timing data for frame-accurate alignment with video.

When to use it

  • Creating narration tracks for demo videos and product walkthroughs
  • Producing podcast intros, bumpers, or short ads
  • Generating voiceovers for training or e-learning modules
  • Syncing spoken lines to on-screen captions or animations
  • Quickly iterating script versions without studio recording

Best practices

  • Provide plain, edited script text to minimize pronunciation surprises
  • Enable timestamps when you need word-level sync for subtitles or animations
  • Set the ELEVENLABS_API_KEY environment variable before use
  • Prefer the default voice for general narration; test other voices for tone fit
  • Keep long scripts in smaller chunks to simplify edits and re-renders

Example use cases

  • Render a 90-second product demo narration and export timestamps for subtitle alignment
  • Produce a podcast episode intro using the default voice and save as voiceover.mp3
  • Convert a marketing script file into polished audio while normalizing numbers and acronyms
  • Generate multiple voice variations for A/B testing of ad copy
  • Integrate into a video pipeline to automatically replace placeholder narration

FAQ

What credentials are required?

Set ELEVENLABS_API_KEY in your environment; the Creator plan is sufficient for typical usage.

How do I get word-level timestamps?

Pass the timestamps option when running the skill; it returns a timestamps.json with word timings for sync.