home / skills / krishagel / geoffrey / local-tts
This skill generates high-quality local speech from text using MLX-accelerated Kokoro-82M, with offline operation and flexible voice presets.
npx playbooks add skill krishagel/geoffrey --skill local-ttsReview the files below or copy the command above to add this skill to your agents.
---
name: local-tts
version: 1.0.0
description: Local text-to-speech using MLX and Kokoro model
triggers:
- local-tts
- local tts
- generate audio locally
- kokoro tts
dependencies:
- mlx-audio (via uv --with)
- pydub (via uv --with)
---
# Local TTS Skill
Generate high-quality speech audio locally using Apple Silicon MLX acceleration and the Kokoro-82M model. No API keys or recurring costs.
## Quick Start
```bash
# Generate MP3 from text
uv run --with mlx-audio --with pydub skills/local-tts/scripts/generate_audio.py \
--text "Hello, this is a test." \
--output ~/Desktop/test.mp3
# Generate from file
uv run --with mlx-audio --with pydub skills/local-tts/scripts/generate_audio.py \
--file /tmp/script.txt \
--voice af_heart \
--output ~/Desktop/podcast.mp3
# List available voices
uv run --with mlx-audio skills/local-tts/scripts/list_voices.py
```
## Parameters
| Parameter | Required | Default | Description |
|-----------|----------|---------|-------------|
| `--text` | One of text/file | - | Text to convert |
| `--file` | One of text/file | - | Path to text file |
| `--voice` | No | `af_heart` | Voice preset |
| `--output` | Yes | - | Output file path (.mp3, .wav) |
| `--model` | No | `Kokoro-82M-bf16` | Model to use |
| `--list-voices` | No | - | Show available voices |
## Voice Presets
### American English Female (prefix: af_)
- `af_heart` - Warm, friendly **(default)**
- `af_bella` - Soft, calm
- `af_nova` - Clear, professional
- `af_river` - Clear, confident
- `af_sarah` - Soft, expressive
### American English Male (prefix: am_)
- `am_adam` - Clear, professional
- `am_echo` - Deep, smooth
- `am_liam` - Articulate, conversational
- `am_michael` - Soft, measured
### British English (prefix: bf_, bm_)
- `bf_emma` - Clear, refined female
- `bm_daniel` - Clear, professional male
- `bm_george` - Distinguished male
See `references/voices.md` for full list.
## Output Format
```json
{
"success": true,
"file": "/Users/hagelk/Desktop/podcast.mp3",
"voice": "af_heart",
"model": "Kokoro-82M-bf16",
"characters": 9824,
"chunks": 20,
"duration_seconds": 612.5,
"generation_time": 45.2
}
```
## Performance
| Hardware | Speed | Notes |
|----------|-------|-------|
| M3 Pro 36GB | ~3-4x realtime | First run slower (model loading) |
| M1/M2 Mac Mini 8GB | ~1.5x realtime | Works well for briefings |
| M1/M2 Mac Mini 16GB | ~2x realtime | Comfortable headroom |
## Technical Details
- **Model**: Kokoro-82M-bf16 (~200MB download on first run)
- **Sample rate**: 24kHz mono
- **Chunking**: Text split at ~400 chars per chunk for quality
- **Concatenation**: Chunks joined seamlessly via pydub
- **Formats**: MP3, WAV, M4A, OGG
## Important Notes
1. **MUST use `--with` flags** - Do not use PEP 723 inline deps. mlx-audio requires uv's cached environment.
2. **First run is slower** - Model downloads ~200MB and espeak dependencies initialize.
3. **Model cached at**: `~/.cache/huggingface/hub/models--mlx-community--Kokoro-82M-bf16/`
## Integration with Morning Briefing
The morning-briefing skill uses this for podcast generation:
```bash
uv run --with mlx-audio --with pydub skills/local-tts/scripts/generate_audio.py \
--file /tmp/morning_briefing_podcast.txt \
--voice af_heart \
--output ~/Desktop/morning_briefing.mp3
```
This skill provides local text-to-speech using Apple Silicon MLX acceleration and the Kokoro-82M model so you can generate high-quality audio without external APIs or recurring costs. It focuses on fast, offline generation on Mac hardware with multiple voice presets and common output formats (MP3, WAV, M4A, OGG).
The skill loads the Kokoro-82M model locally and uses mlx-audio for hardware-accelerated inference on Apple Silicon. Input text is chunked (~400 characters) for stable generation, then audio chunks are concatenated with pydub and written to the requested file format. No API keys are required; the model downloads once to a local cache.
Do I need an API key or internet access to run?
No API key is required. Internet is only needed for the initial model download; subsequent runs are fully local.
Which hardware is recommended?
Apple Silicon (M1/M2/M3) is recommended. M3 Pro shows ~3–4x realtime; M1/M2 provide roughly 1.5–2x depending on memory.