home / skills / openclaw / skills / beware-piper-tts

beware-piper-tts skill

safe

This skill enables local text-to-speech using Piper to deliver voice messages on messaging platforms without cloud calls or API usage.

npx playbooks add skill openclaw/skills --skill beware-piper-tts

Review the files below or copy the command above to add this skill to your agents.

Files (4)

SKILL.md

2.2 KB

---
name: piper-tts
description: Local text-to-speech using Piper for voice message delivery. Use when the user asks for voice responses, audio messages, TTS, text-to-speech, voice notes, or wants to hear something spoken aloud. Converts text to speech locally (no cloud APIs, no cost, no latency) and delivers as voice messages on Telegram, Discord, or any channel supporting audio.
---

# Piper TTS — Local Voice Messages

Generate voice messages using [Piper](https://github.com/rhasspy/piper), a fast local TTS engine. Zero cloud calls, zero cost, zero API keys.

## Setup

If Piper is not installed, run the setup script:

```bash
scripts/setup-piper.sh
```

This installs `piper-tts` via pip and downloads a default voice (`en_US-kusal-medium`).

## Generating Voice Messages

Use `scripts/piper-speak.sh` to generate and deliver voice:

```bash
scripts/piper-speak.sh "<text>" [voice]
```

- `text`: The text to speak (required)
- `voice`: Piper voice name (default: `en_US-kusal-medium`)

The script outputs an MP3 path. Include it in your reply as:

```
[[audio_as_voice]]
MEDIA:<path-to-mp3>
```

This delivers the audio as a native voice message on supported channels (Telegram, Discord, etc.).

## Example Workflow

1. User asks: "Tell me a joke as audio"
2. Run: `scripts/piper-speak.sh "Why do programmers prefer dark mode? Because light attracts bugs!"`
3. Get MP3 path from output
4. Reply with `[[audio_as_voice]]` + `MEDIA:<path>`

## Available Voices

After setup, download additional voices:

```bash
scripts/setup-piper.sh --voice en_US-ryan-high
scripts/setup-piper.sh --voice en_GB-northern_english_male-medium
```

Popular voices:
- `en_US-kusal-medium` — Clear male voice (default, recommended)
- `en_US-ryan-high` — High quality US male
- `en_US-hfc_male-medium` — US male
- `en_GB-northern_english_male-medium` — British male
- Browse all: https://huggingface.co/rhasspy/piper-voices

## Important Notes

- **Speed**: Local generation is ~0.5-1s. Much faster than cloud TTS.
- **No API keys**: Works completely offline after setup.
- **Platform**: macOS (Apple Silicon + Intel), Linux. Requires Python 3.9+.
- **Do NOT** set `messages.tts.auto: "always"` in OpenClaw config — it makes every response slow. Keep TTS on-demand.

Overview

This skill provides local text-to-speech using Piper to generate voice messages without cloud services. It converts text to MP3 voice messages quickly and delivers them as native voice/audio messages on channels like Telegram and Discord. Setup is simple and runs entirely offline after installation.

How this skill works

The skill wraps Piper, a fast local TTS engine, to synthesize speech from provided text and chosen voice models. A helper script generates an MP3 file and returns its path, which the skill then embeds in a channel-compatible voice message format. No external APIs, keys, or cloud latency are involved; generation typically completes within a second.

When to use it

User requests a spoken reply, audio message, or voice note.
Delivering notifications or short announcements as natural-sounding audio.
Creating voice responses for chatbots on Telegram, Discord, or other platforms supporting audio.
Producing quick TTS for demos, accessibility, or hands-free consumption.
When you need offline, cost-free speech synthesis with low latency.

Best practices

Keep TTS on-demand; avoid enabling automatic TTS for every message to prevent slowdowns.
Use short to medium-length text for optimal generation speed and natural prosody.
Preinstall recommended default voice (en_US-kusal-medium) for consistent results.
Test different voices for tone and clarity; some voices suit announcements while others suit conversational replies.
Run the setup script to download and manage additional voices locally.

Example use cases

User asks “Read this message aloud” — generate MP3 and deliver as a native voice message.
Send daily briefings or status updates as short audio clips in a group chat.
Create accessibility features that read interface text or alerts aloud for users with visual impairments.
Respond to a user request for a joke or short story by returning an audio version.
Prototype voice-enabled bot behaviour locally without cloud dependencies.

FAQ

Do I need an internet connection to use this?

No. After running the setup to install Piper and download voices, synthesis runs entirely offline with no cloud calls.

Which platforms are supported?

Piper runs on macOS (Apple Silicon and Intel) and Linux. The skill requires Python 3.9+ and delivers audio to any channel that accepts voice or audio files (Telegram, Discord, etc.).