home / skills / openclaw / skills / sag

sag skill

/skills/steipete/sag

This skill enables natural ElevenLabs TTS playback locally, transforming text into expressive audio with quick voice selection.

npx playbooks add skill openclaw/skills --skill sag

Review the files below or copy the command above to add this skill to your agents.

Files (2)
SKILL.md
2.0 KB
---
name: sag
description: ElevenLabs text-to-speech with mac-style say UX.
homepage: https://sag.sh
metadata: {"clawdbot":{"emoji":"🗣️","requires":{"bins":["sag"],"env":["ELEVENLABS_API_KEY"]},"primaryEnv":"ELEVENLABS_API_KEY","install":[{"id":"brew","kind":"brew","formula":"steipete/tap/sag","bins":["sag"],"label":"Install sag (brew)"}]}}
---

# sag

Use `sag` for ElevenLabs TTS with local playback.

API key (required)
- `ELEVENLABS_API_KEY` (preferred)
- `SAG_API_KEY` also supported by the CLI

Quick start
- `sag "Hello there"`
- `sag speak -v "Roger" "Hello"`
- `sag voices`
- `sag prompting` (model-specific tips)

Model notes
- Default: `eleven_v3` (expressive)
- Stable: `eleven_multilingual_v2`
- Fast: `eleven_flash_v2_5`

Pronunciation + delivery rules
- First fix: respell (e.g. "key-note"), add hyphens, adjust casing.
- Numbers/units/URLs: `--normalize auto` (or `off` if it harms names).
- Language bias: `--lang en|de|fr|...` to guide normalization.
- v3: SSML `<break>` not supported; use `[pause]`, `[short pause]`, `[long pause]`.
- v2/v2.5: SSML `<break time="1.5s" />` supported; `<phoneme>` not exposed in `sag`.

v3 audio tags (put at the entrance of a line)
- `[whispers]`, `[shouts]`, `[sings]`
- `[laughs]`, `[starts laughing]`, `[sighs]`, `[exhales]`
- `[sarcastic]`, `[curious]`, `[excited]`, `[crying]`, `[mischievously]`
- Example: `sag "[whispers] keep this quiet. [short pause] ok?"`

Voice defaults
- `ELEVENLABS_VOICE_ID` or `SAG_VOICE_ID`

Confirm voice + speaker before long output.

## Chat voice responses

When Peter asks for a "voice" reply (e.g., "crazy scientist voice", "explain in voice"), generate audio and send it:

```bash
# Generate audio file
sag -v Clawd -o /tmp/voice-reply.mp3 "Your message here"

# Then include in reply:
# MEDIA:/tmp/voice-reply.mp3
```

Voice character tips:
- Crazy scientist: Use `[excited]` tags, dramatic pauses `[short pause]`, vary intensity
- Calm: Use `[whispers]` or slower pacing
- Dramatic: Use `[sings]` or `[shouts]` sparingly

Default voice for Clawd: `lj2rcrvANS3gaWWnczSX` (or just `-v Clawd`)

Overview

This skill provides a mac-style "say" UX backed by ElevenLabs text-to-speech for local playback and quick CLI use. It focuses on expressive, stable, and fast ElevenLabs models and includes handy audio tags and pronunciation controls for reliable output. It’s optimized for quick voice replies, batch speaking, and experimenting with voice characters.

How this skill works

The CLI sends text to ElevenLabs TTS using your API key and fetches audio files for local playback. It supports model selection (expressive, stable, fast), voice defaults via environment variables, normalization rules, and model-specific audio tags or SSML where supported. You can generate files, play them locally, or include them as media attachments in chat workflows.

When to use it

  • Create quick, local TTS playback from scripts or terminals
  • Generate voice replies for chat agents or bots
  • Produce voice demos using expressive or character voices
  • Normalize spoken numbers, units, and URLs automatically
  • Test different ElevenLabs models and voice presets before long runs

Best practices

  • Set ELEVENLABS_API_KEY or SAG_API_KEY in your environment to avoid repeated prompts
  • Confirm the chosen voice with short samples before generating long audio
  • Use respelling, hyphens, or casing tweaks to fix pronunciations (e.g., "key-note")
  • Use --normalize auto to handle numbers/units but disable if it mangles names
  • Apply model-appropriate tags: v3 uses bracketed audio tags and avoids SSML <break>, v2/v2.5 accept SSML <break>

Example use cases

  • Say a single phrase: sag "Hello there" for immediate playback
  • Character replies: sag -v Clawd -o /tmp/voice.mp3 "Your message" then attach MEDIA:/tmp/voice.mp3
  • List available voices with sag voices and set defaults via ELEVENLABS_VOICE_ID
  • Produce controlled pauses and tone with tags like [short pause], [excited], [whispers]
  • Batch script to narrate logs or alerts using the stable or fast model

FAQ

Which API key is required?

Set ELEVENLABS_API_KEY (preferred). SAG_API_KEY is also supported by the CLI.

How do I fix mispronunciations?

Respell words, add hyphens, adjust casing, or use --normalize auto for numbers; test fixes on short samples before long runs.