home / skills / openclaw / skills / sag
This skill enables natural ElevenLabs TTS playback locally, transforming text into expressive audio with quick voice selection.
npx playbooks add skill openclaw/skills --skill sagReview the files below or copy the command above to add this skill to your agents.
---
name: sag
description: ElevenLabs text-to-speech with mac-style say UX.
homepage: https://sag.sh
metadata: {"clawdbot":{"emoji":"🗣️","requires":{"bins":["sag"],"env":["ELEVENLABS_API_KEY"]},"primaryEnv":"ELEVENLABS_API_KEY","install":[{"id":"brew","kind":"brew","formula":"steipete/tap/sag","bins":["sag"],"label":"Install sag (brew)"}]}}
---
# sag
Use `sag` for ElevenLabs TTS with local playback.
API key (required)
- `ELEVENLABS_API_KEY` (preferred)
- `SAG_API_KEY` also supported by the CLI
Quick start
- `sag "Hello there"`
- `sag speak -v "Roger" "Hello"`
- `sag voices`
- `sag prompting` (model-specific tips)
Model notes
- Default: `eleven_v3` (expressive)
- Stable: `eleven_multilingual_v2`
- Fast: `eleven_flash_v2_5`
Pronunciation + delivery rules
- First fix: respell (e.g. "key-note"), add hyphens, adjust casing.
- Numbers/units/URLs: `--normalize auto` (or `off` if it harms names).
- Language bias: `--lang en|de|fr|...` to guide normalization.
- v3: SSML `<break>` not supported; use `[pause]`, `[short pause]`, `[long pause]`.
- v2/v2.5: SSML `<break time="1.5s" />` supported; `<phoneme>` not exposed in `sag`.
v3 audio tags (put at the entrance of a line)
- `[whispers]`, `[shouts]`, `[sings]`
- `[laughs]`, `[starts laughing]`, `[sighs]`, `[exhales]`
- `[sarcastic]`, `[curious]`, `[excited]`, `[crying]`, `[mischievously]`
- Example: `sag "[whispers] keep this quiet. [short pause] ok?"`
Voice defaults
- `ELEVENLABS_VOICE_ID` or `SAG_VOICE_ID`
Confirm voice + speaker before long output.
## Chat voice responses
When Peter asks for a "voice" reply (e.g., "crazy scientist voice", "explain in voice"), generate audio and send it:
```bash
# Generate audio file
sag -v Clawd -o /tmp/voice-reply.mp3 "Your message here"
# Then include in reply:
# MEDIA:/tmp/voice-reply.mp3
```
Voice character tips:
- Crazy scientist: Use `[excited]` tags, dramatic pauses `[short pause]`, vary intensity
- Calm: Use `[whispers]` or slower pacing
- Dramatic: Use `[sings]` or `[shouts]` sparingly
Default voice for Clawd: `lj2rcrvANS3gaWWnczSX` (or just `-v Clawd`)
This skill provides a mac-style "say" UX backed by ElevenLabs text-to-speech for local playback and quick CLI use. It focuses on expressive, stable, and fast ElevenLabs models and includes handy audio tags and pronunciation controls for reliable output. It’s optimized for quick voice replies, batch speaking, and experimenting with voice characters.
The CLI sends text to ElevenLabs TTS using your API key and fetches audio files for local playback. It supports model selection (expressive, stable, fast), voice defaults via environment variables, normalization rules, and model-specific audio tags or SSML where supported. You can generate files, play them locally, or include them as media attachments in chat workflows.
Which API key is required?
Set ELEVENLABS_API_KEY (preferred). SAG_API_KEY is also supported by the CLI.
How do I fix mispronunciations?
Respell words, add hyphens, adjust casing, or use --normalize auto for numbers; test fixes on short samples before long runs.