home / skills / openclaw / skills / transcribe

transcribe skill

/skills/javicasper/transcribe

This skill transcribes audio files locally using Docker and Whisper, returning plain text to help you convert voice messages quickly.

npx playbooks add skill openclaw/skills --skill transcribe

Review the files below or copy the command above to add this skill to your agents.

Files (3)
SKILL.md
1.3 KB
---
name: transcribe
description: Transcribe audio files to text using local Whisper (Docker). Use when receiving voice messages, audio files (.mp3, .m4a, .ogg, .wav, .webm), or when asked to transcribe audio content.
---

# Transcribe

Local audio transcription using faster-whisper in Docker.

## Installation

```bash
cd /path/to/skills/transcribe/scripts
chmod +x install.sh
./install.sh
```

This builds the Docker image `whisper:local` and installs the `transcribe` CLI.

## Usage

```bash
transcribe /path/to/audio.mp3 [language]
```

- Default language: `es` (Spanish)
- Use `auto` for auto-detection
- Outputs plain text to stdout

## Examples

```bash
transcribe /tmp/voice.ogg          # Spanish (default)
transcribe /tmp/meeting.mp3 en     # English
transcribe /tmp/audio.m4a auto     # Auto-detect
```

## Supported Formats

mp3, m4a, ogg, wav, webm, flac, aac

## When Receiving Voice Messages

1. Save the audio attachment to a temp file
2. Run `transcribe <path>`
3. Include the transcription in your response
4. Clean up the temp file

## Files

- `scripts/transcribe` - CLI wrapper (bash)
- `scripts/install.sh` - Installation script (includes Dockerfile inline)

## Notes

- Model: `small` (fast) - edit install.sh for `large-v3` (accurate)
- Fully local, no API key needed

Overview

This skill transcribes audio files to text using a locally hosted Whisper implementation running in Docker. It provides a simple CLI for fast, offline transcription with no API keys required. The default model is optimized for speed but can be changed during installation for higher accuracy.

How this skill works

The skill packages faster-whisper inside a Docker image and exposes a CLI that accepts common audio formats (.mp3, .m4a, .ogg, .wav, .webm, .flac, .aac). You run the transcribe command with a file path and an optional language parameter (default Spanish, or 'auto' for detection). It outputs plain text to stdout, so the transcription can be captured, included in responses, or saved to a file.

When to use it

  • When you receive voice messages or audio attachments that need text conversion.
  • For meeting notes, interviews, or recorded calls where an offline solution is required.
  • When compliance or privacy rules forbid sending audio to cloud services.
  • To batch-transcribe archived audio files stored locally.
  • During development or testing of voice-enabled features without API costs.

Best practices

  • Save incoming voice attachments to a temporary file before running the CLI and remove it afterward.
  • Use the default fast model for quick results; switch the Docker install to a larger model for better accuracy when needed.
  • Pass a language code for best accuracy, or use 'auto' if the language is unknown.
  • Preprocess noisy recordings (noise reduction, normalization) for improved transcription quality.
  • Capture stdout to a file or directly insert the text into responses to keep workflows automated.

Example use cases

  • Transcribe a WhatsApp voice note saved as voice.ogg and paste the text into a reply.
  • Batch process a folder of meeting recordings to generate searchable minutes.
  • Convert interview audio to text for faster analysis and tagging.
  • Run local transcriptions in environments with strict data privacy requirements.
  • Quickly check the contents of a voicemail audio file before forwarding.

FAQ

Which audio formats are supported?

mp3, m4a, ogg, wav, webm, flac, and aac are supported.

Do I need an API key or internet access?

No. The transcription runs locally in Docker, so no API key is required; internet is only needed to download models during installation if not already cached.

How do I change model accuracy vs speed?

Edit the install script to select a different Whisper model (for example 'large-v3' for higher accuracy) and rebuild the Docker image.