home / skills / openclaw / skills / imessage-voice-reply

imessage-voice-reply skill

This skill generates and sends native iMessage voice replies using local Kokoro TTS, producing inline playable bubbles without attachments.

npx playbooks add skill openclaw/skills --skill imessage-voice-reply

Review the files below or copy the command above to add this skill to your agents.

Files (4)

SKILL.md

3.9 KB

---
name: imessage-voice-reply
version: 1.0.1
description: Send voice message replies in iMessage using local Kokoro-ONNX TTS. Generates native iMessage voice bubbles (CAF/Opus) that play inline with waveform — not file attachments. Use when receiving a voice message in iMessage and wanting to reply with voice, enabling voice-to-voice iMessage conversations, or sending audio responses. Zero cost — all TTS runs locally. Requires BlueBubbles channel configured in OpenClaw.
---

# iMessage Voice Reply

Generate and send native iMessage voice messages using local Kokoro TTS. Voice messages appear as inline playable bubbles with waveforms — identical to voice messages recorded in Messages.app.

## How It Works

```
Your text response → Kokoro TTS (local) → afconvert (native Apple encoder) → CAF/Opus → BlueBubbles → iMessage voice bubble
```

## Setup

```bash
bash ${baseDir}/scripts/setup.sh
```

Installs: kokoro-onnx, soundfile, numpy. Downloads Kokoro models (~136MB) to `~/.cache/kokoro-onnx/`.

Requires: BlueBubbles channel configured in OpenClaw (`channels.bluebubbles`).

## Generating and Sending a Voice Reply

### Step 1: Generate audio

Write the response text to a temp file, then pass it via `--text-file` to avoid shell injection:

```bash
echo "Your response text here" > /tmp/voice_text.txt
${baseDir}/.venv/bin/python ${baseDir}/scripts/generate_voice_reply.py --text-file /tmp/voice_text.txt --output /tmp/voice_reply.caf
```

Alternatively, pass text directly (ensure proper shell escaping):

```bash
${baseDir}/.venv/bin/python ${baseDir}/scripts/generate_voice_reply.py --text "Your response text here" --output /tmp/voice_reply.caf
```

Options:
- `--voice af_heart` — Kokoro voice (default: af_heart)
- `--speed 1.15` — Playback speed (default: 1.15)
- `--lang en-us` — Language code (default: en-us)

**Security note:** The Python script uses argparse and subprocess.run with list arguments (no shell=True). Input is handled safely within the script. When calling from a shell, prefer `--text-file` for untrusted input to avoid shell metacharacter issues.

### Step 2: Send via BlueBubbles

Use the `message` tool:

```json
{
  "action": "sendAttachment",
  "channel": "bluebubbles",
  "target": "+1XXXXXXXXXX",
  "path": "/tmp/voice_reply.caf",
  "filename": "Audio Message.caf",
  "contentType": "audio/x-caf",
  "asVoice": true
}
```

**Critical parameters for native voice bubble:**
- `filename` must be `"Audio Message.caf"`
- `contentType` must be `"audio/x-caf"`
- `asVoice` must be `true`

All three are required for iMessage to render the message as an inline voice bubble with waveform instead of a file attachment.

## Voice Options

| Language | Female | Male |
|----------|--------|------|
| English | af_heart ⭐ | am_puck |
| Spanish | ef_dora | em_alex |
| French | ff_siwis | — |
| Japanese | jf_alpha | jm_beta |
| Chinese | zf_xiaobei | zm_yunjian |

## When to Reply with Voice

Reply with a voice message when:
- The user sent you a voice message (voice-for-voice)
- The user explicitly asks for an audio/voice response

Always include a text reply alongside the voice message for accessibility.

## Audio Format

- **macOS:** CAF container, Opus codec, 48kHz mono, 32kbps — encoded by Apple's native `afconvert`. Identical to what Messages.app produces.
- **Fallback:** MP3 via ffmpeg (works but may not render as native voice bubble on all iMessage versions).

## Cost

$0. Kokoro TTS runs entirely locally. No API calls for voice generation.

## Troubleshooting

**Voice message shows as file attachment** — Ensure all three parameters are set: `filename="Audio Message.caf"`, `contentType="audio/x-caf"`, `asVoice=true`.

**First word clipped** — The script prepends 150ms silence automatically. If still clipped, increase the silence pad in the script.

**Kokoro model not found** — Run `bash ${baseDir}/scripts/setup.sh`.

**afconvert not found** — Only available on macOS. Script falls back to ffmpeg/MP3 on Linux.

Overview

This skill sends native iMessage voice-reply bubbles by generating local Kokoro-ONNX TTS audio and packaging it as CAF/Opus so Messages.app plays it inline with waveform. It produces identical voice-message bubbles (not attachments) and requires a BlueBubbles channel configured in OpenClaw. All TTS runs locally, so there are no external API costs.

How this skill works

You provide reply text → Kokoro-ONNX generates WAV audio locally → macOS afconvert (or ffmpeg fallback) encodes CAF with Opus at 48kHz mono → the message is sent via BlueBubbles with specific metadata so iMessage renders an inline voice bubble. The sender tool must set filename to "Audio Message.caf", contentType to "audio/x-caf", and asVoice to true for native rendering. If afconvert is unavailable on non-macOS hosts, the script falls back to MP3; this may not always render as a native bubble.

When to use it

Responding to an incoming iMessage voice message (voice-for-voice conversations).
Delivering an audio reply when a recipient explicitly requests a voice response.
Sending short spoken confirmations or status updates that benefit from tone and cadence.
Testing or archiving voice interactions for iMessage workflows using local TTS.

Best practices

Always send a brief text message alongside the voice bubble for accessibility and transcripts.
Ensure BlueBubbles channel is configured in OpenClaw before sending.
Set filename="Audio Message.caf", contentType="audio/x-caf", and asVoice=true to force inline rendering.
Use the provided 150ms silence pad to avoid clipping the first word; increase it only if needed.
Prefer macOS afconvert for CAF/Opus output; treat ffmpeg/MP3 as a fallback.

Example use cases

Auto-replying to a user who left a voice note with a spoken status update.
Building a bot that replies by voice for more natural, conversational iMessage interactions.
Producing spoken confirmations for appointment or delivery notifications via iMessage.
Archiving voice-based support exchanges while preserving native playback format.
Converting text responses to voice for accessibility-focused message flows.

FAQ

Do I need internet access or paid APIs to use this?

No. Kokoro-ONNX runs locally and the process uses local tools; there are no external TTS API calls or costs.

Why does the voice show as a file attachment instead of a bubble?

Make sure all three parameters are present: filename must be "Audio Message.caf", contentType must be "audio/x-caf", and asVoice must be true.

What if afconvert is not available on my server?

The script falls back to ffmpeg producing MP3, but MP3 may not always render as a native inline voice bubble on all iMessage versions.