home / skills / leegonzales / aiskills / read-aloud

read-aloud skill

safe

This skill generates a standalone HTML reader from markdown with Kokoro TTS audio and word-synced highlighting for easy listening and proofreading.

npx playbooks add skill leegonzales/aiskills --skill read-aloud

Review the files below or copy the command above to add this skill to your agents.

Files (13)

SKILL.md

2.9 KB

---
name: read-aloud
description: Generate a standalone HTML reader with Kokoro TTS audio and word-synced highlighting from any markdown file. Use when user wants to listen to an essay, proofread by ear, or create an audio reader.
---

# Read Aloud

Generate a standalone HTML reader with high-quality TTS audio and real-time word highlighting from any markdown file.

## When to Use

Invoke when user:
- Asks to "read this aloud" or "generate audio" for a markdown file
- Wants to proofread by ear (catch awkward phrasing, rhythm issues)
- Needs a shareable audio reader (single HTML file, no server needed)
- Uses `/read-aloud` command

## Prerequisites

- macOS with Apple Silicon (M1+)
- Python 3.13+ (required for MLX)
- ffmpeg (for MP3 compression)

## Core Command

```bash
SKILL_DIR="path/to/read-aloud"
$SKILL_DIR/scripts/read-aloud.sh <markdown-file> [options]
```

First run auto-installs dependencies to `~/.read-aloud/venv/`.

## Options

| Flag | Default | Description |
|------|---------|-------------|
| `--voice` | `af_heart` | Kokoro voice variant |
| `--speed` | `1.0` | Speech rate (0.5-2.0) |
| `--output-dir` | `~/.read-aloud/output/<slug>/` | Custom output directory |
| `--strip-sections` | *(none)* | Comma-separated heading names to skip |
| `--no-open` | *(off)* | Don't open browser after generation |

## Pipeline

Three-stage process (~2-5 min for a typical essay):

1. **Generate Audio** — Kokoro TTS produces per-paragraph WAV chunks
2. **Align Words** — Whisper transcribes each chunk for drift-free word timestamps
3. **Build Reader** — Assembles standalone HTML with embedded MP3 + word sync

## Output

Single `reader.html` file with:
- Base64-embedded MP3 audio (fully standalone, no server needed)
- Per-word highlighting synced to audio playback
- Sticky playback controls (play/pause, stop, speed, progress bar)
- Keyboard shortcuts (Space, Escape, Arrow keys)
- Click any word to seek

## Available Voices

| Voice | Description |
|-------|-------------|
| `af_heart` | American female, warm (default) |
| `af_bella` | American female, clear |
| `am_adam` | American male, deep |
| `am_michael` | American male, neutral |
| `bf_emma` | British female, elegant |
| `bm_george` | British male, distinguished |

## Examples

```bash
# Basic usage — generates reader at ~/.read-aloud/output/my-essay/
$SKILL_DIR/scripts/read-aloud.sh ~/Documents/my-essay.md

# Custom voice and speed
$SKILL_DIR/scripts/read-aloud.sh post.md --voice bm_george --speed 1.1

# Output to specific directory
$SKILL_DIR/scripts/read-aloud.sh draft.md --output-dir ./preview/

# Strip specific sections before generating
$SKILL_DIR/scripts/read-aloud.sh post.md --strip-sections "Brief,Links & Resources"
```

## References

- Load `references/voice-guide.md` for detailed voice descriptions and speed tuning
- Load `references/examples.md` for worked examples with expected output
- Load `references/troubleshooting.md` when errors occur

Overview

This skill generates a standalone HTML reader with Kokoro TTS audio and word-synced highlighting from any markdown file. It produces a single self-contained reader.html that embeds compressed audio and real-time word highlighting for listening, proofreading, or sharing. The tool runs locally and outputs a pure static file you can open in any browser.

How this skill works

The pipeline creates per-paragraph WAV chunks with Kokoro TTS, transcribes each chunk with Whisper to get drift-free word timestamps, and then assembles a single HTML file embedding an MP3 and synchronized per-word timing. The reader includes playback controls, keyboard shortcuts, and click-to-seek behavior so users can jump to any word. A small wrapper script auto-creates a virtual environment and installs dependencies on first run.

When to use it

You want to listen to a markdown essay, article, or notes with synced highlighting.
You need to proofread by ear to catch rhythm, tone, or awkward phrasing.
You want a shareable, serverless audio reader (single HTML file) to distribute.
You prefer generating lightweight, offline audio readers for accessibility testing.
You need to convert chapters or sections into per-word-synced playback for review.

Best practices

Run on a macOS machine with Apple Silicon (M1+) for best compatibility and performance.
Use Python 3.13+ and install ffmpeg to produce compressed MP3 output.
Keep paragraphs reasonably sized to improve alignment accuracy during transcription.
Adjust --speed and --voice to match desired tone before generating the final reader.
Use --strip-sections to exclude non-narrative headings (e.g., References) to shorten audio.

Example use cases

Generate an audio reader for a long-form essay to proofread pacing and clarity by ear.
Create an offline HTML reader to share a narrated version of documentation or tutorials.
Produce a highlighted playback for accessibility reviews or user testing sessions.
Export chapter previews from markdown files to embed in a portfolio or preview page.
Quickly generate narrated drafts to evaluate voice, speed, and emphasis choices.

FAQ

What platforms are supported?

The tool is designed for macOS on Apple Silicon and requires Python 3.13+. Other platforms may not be supported.

How long does generation take?

Typical essays take about 2–5 minutes; time depends on length and system performance.

Can I change voice or speed?

Yes. Use the --voice and --speed flags to pick a Kokoro voice variant and speech rate.