home / skills / openclaw / skills / transcribe
This skill transcribes audio files locally using Docker and Whisper, returning plain text to help you convert voice messages quickly.
npx playbooks add skill openclaw/skills --skill transcribeReview the files below or copy the command above to add this skill to your agents.
---
name: transcribe
description: Transcribe audio files to text using local Whisper (Docker). Use when receiving voice messages, audio files (.mp3, .m4a, .ogg, .wav, .webm), or when asked to transcribe audio content.
---
# Transcribe
Local audio transcription using faster-whisper in Docker.
## Installation
```bash
cd /path/to/skills/transcribe/scripts
chmod +x install.sh
./install.sh
```
This builds the Docker image `whisper:local` and installs the `transcribe` CLI.
## Usage
```bash
transcribe /path/to/audio.mp3 [language]
```
- Default language: `es` (Spanish)
- Use `auto` for auto-detection
- Outputs plain text to stdout
## Examples
```bash
transcribe /tmp/voice.ogg # Spanish (default)
transcribe /tmp/meeting.mp3 en # English
transcribe /tmp/audio.m4a auto # Auto-detect
```
## Supported Formats
mp3, m4a, ogg, wav, webm, flac, aac
## When Receiving Voice Messages
1. Save the audio attachment to a temp file
2. Run `transcribe <path>`
3. Include the transcription in your response
4. Clean up the temp file
## Files
- `scripts/transcribe` - CLI wrapper (bash)
- `scripts/install.sh` - Installation script (includes Dockerfile inline)
## Notes
- Model: `small` (fast) - edit install.sh for `large-v3` (accurate)
- Fully local, no API key needed
This skill transcribes audio files to text using a locally hosted Whisper implementation running in Docker. It provides a simple CLI for fast, offline transcription with no API keys required. The default model is optimized for speed but can be changed during installation for higher accuracy.
The skill packages faster-whisper inside a Docker image and exposes a CLI that accepts common audio formats (.mp3, .m4a, .ogg, .wav, .webm, .flac, .aac). You run the transcribe command with a file path and an optional language parameter (default Spanish, or 'auto' for detection). It outputs plain text to stdout, so the transcription can be captured, included in responses, or saved to a file.
Which audio formats are supported?
mp3, m4a, ogg, wav, webm, flac, and aac are supported.
Do I need an API key or internet access?
No. The transcription runs locally in Docker, so no API key is required; internet is only needed to download models during installation if not already cached.
How do I change model accuracy vs speed?
Edit the install script to select a different Whisper model (for example 'large-v3' for higher accuracy) and rebuild the Docker image.