home / skills / leegonzales / aiskills / read-aloud

read-aloud skill

/ReadAloud/read-aloud

This skill generates a standalone HTML reader from markdown with Kokoro TTS audio and word-synced highlighting for easy listening and proofreading.

npx playbooks add skill leegonzales/aiskills --skill read-aloud

Review the files below or copy the command above to add this skill to your agents.

Files (13)
SKILL.md
2.9 KB
---
name: read-aloud
description: Generate a standalone HTML reader with Kokoro TTS audio and word-synced highlighting from any markdown file. Use when user wants to listen to an essay, proofread by ear, or create an audio reader.
---

# Read Aloud

Generate a standalone HTML reader with high-quality TTS audio and real-time word highlighting from any markdown file.

## When to Use

Invoke when user:
- Asks to "read this aloud" or "generate audio" for a markdown file
- Wants to proofread by ear (catch awkward phrasing, rhythm issues)
- Needs a shareable audio reader (single HTML file, no server needed)
- Uses `/read-aloud` command

## Prerequisites

- macOS with Apple Silicon (M1+)
- Python 3.13+ (required for MLX)
- ffmpeg (for MP3 compression)

## Core Command

```bash
SKILL_DIR="path/to/read-aloud"
$SKILL_DIR/scripts/read-aloud.sh <markdown-file> [options]
```

First run auto-installs dependencies to `~/.read-aloud/venv/`.

## Options

| Flag | Default | Description |
|------|---------|-------------|
| `--voice` | `af_heart` | Kokoro voice variant |
| `--speed` | `1.0` | Speech rate (0.5-2.0) |
| `--output-dir` | `~/.read-aloud/output/<slug>/` | Custom output directory |
| `--strip-sections` | *(none)* | Comma-separated heading names to skip |
| `--no-open` | *(off)* | Don't open browser after generation |

## Pipeline

Three-stage process (~2-5 min for a typical essay):

1. **Generate Audio** — Kokoro TTS produces per-paragraph WAV chunks
2. **Align Words** — Whisper transcribes each chunk for drift-free word timestamps
3. **Build Reader** — Assembles standalone HTML with embedded MP3 + word sync

## Output

Single `reader.html` file with:
- Base64-embedded MP3 audio (fully standalone, no server needed)
- Per-word highlighting synced to audio playback
- Sticky playback controls (play/pause, stop, speed, progress bar)
- Keyboard shortcuts (Space, Escape, Arrow keys)
- Click any word to seek

## Available Voices

| Voice | Description |
|-------|-------------|
| `af_heart` | American female, warm (default) |
| `af_bella` | American female, clear |
| `am_adam` | American male, deep |
| `am_michael` | American male, neutral |
| `bf_emma` | British female, elegant |
| `bm_george` | British male, distinguished |

## Examples

```bash
# Basic usage — generates reader at ~/.read-aloud/output/my-essay/
$SKILL_DIR/scripts/read-aloud.sh ~/Documents/my-essay.md

# Custom voice and speed
$SKILL_DIR/scripts/read-aloud.sh post.md --voice bm_george --speed 1.1

# Output to specific directory
$SKILL_DIR/scripts/read-aloud.sh draft.md --output-dir ./preview/

# Strip specific sections before generating
$SKILL_DIR/scripts/read-aloud.sh post.md --strip-sections "Brief,Links & Resources"
```

## References

- Load `references/voice-guide.md` for detailed voice descriptions and speed tuning
- Load `references/examples.md` for worked examples with expected output
- Load `references/troubleshooting.md` when errors occur

Overview

This skill generates a standalone HTML reader with Kokoro TTS audio and word-synced highlighting from any markdown file. It produces a single self-contained reader.html that embeds compressed audio and real-time word highlighting for listening, proofreading, or sharing. The tool runs locally and outputs a pure static file you can open in any browser.

How this skill works

The pipeline creates per-paragraph WAV chunks with Kokoro TTS, transcribes each chunk with Whisper to get drift-free word timestamps, and then assembles a single HTML file embedding an MP3 and synchronized per-word timing. The reader includes playback controls, keyboard shortcuts, and click-to-seek behavior so users can jump to any word. A small wrapper script auto-creates a virtual environment and installs dependencies on first run.

When to use it

  • You want to listen to a markdown essay, article, or notes with synced highlighting.
  • You need to proofread by ear to catch rhythm, tone, or awkward phrasing.
  • You want a shareable, serverless audio reader (single HTML file) to distribute.
  • You prefer generating lightweight, offline audio readers for accessibility testing.
  • You need to convert chapters or sections into per-word-synced playback for review.

Best practices

  • Run on a macOS machine with Apple Silicon (M1+) for best compatibility and performance.
  • Use Python 3.13+ and install ffmpeg to produce compressed MP3 output.
  • Keep paragraphs reasonably sized to improve alignment accuracy during transcription.
  • Adjust --speed and --voice to match desired tone before generating the final reader.
  • Use --strip-sections to exclude non-narrative headings (e.g., References) to shorten audio.

Example use cases

  • Generate an audio reader for a long-form essay to proofread pacing and clarity by ear.
  • Create an offline HTML reader to share a narrated version of documentation or tutorials.
  • Produce a highlighted playback for accessibility reviews or user testing sessions.
  • Export chapter previews from markdown files to embed in a portfolio or preview page.
  • Quickly generate narrated drafts to evaluate voice, speed, and emphasis choices.

FAQ

What platforms are supported?

The tool is designed for macOS on Apple Silicon and requires Python 3.13+. Other platforms may not be supported.

How long does generation take?

Typical essays take about 2–5 minutes; time depends on length and system performance.

Can I change voice or speed?

Yes. Use the --voice and --speed flags to pick a Kokoro voice variant and speech rate.