home / skills / trpc-group / trpc-agent-go / whisper
This skill transcribes audio to text using OpenAI Whisper, supporting multiple languages, models, and optional timestamps for versatile transcripts.
npx playbooks add skill trpc-group/trpc-agent-go --skill whisperReview the files below or copy the command above to add this skill to your agents.
---
name: whisper
description: Transcribe audio files to text using OpenAI Whisper
---
# Whisper Audio Transcription Skill
Transcribe audio files to text using OpenAI Whisper.
## Capabilities
- Transcribe audio files (MP3, WAV, M4A, FLAC, OGG, etc.) to text
- Support for 90+ languages with auto-detection
- Optional timestamp generation
- Multiple model sizes (tiny/base/small/medium/large)
- Output in plain text or JSON format
## Usage
### Basic Transcription
```bash
python3 scripts/transcribe.py <audio_file> <output_file>
```
### With Options
```bash
# Specify model size (default: base)
python3 scripts/transcribe.py audio.mp3 transcript.txt --model medium
# Specify language (improves accuracy)
python3 scripts/transcribe.py audio.mp3 transcript.txt --language zh
# Include timestamps
python3 scripts/transcribe.py audio.mp3 transcript.txt --timestamps
# JSON output with metadata
python3 scripts/transcribe.py audio.mp3 output.json --format json
```
## Parameters
- `audio_file` (required): Path to input audio file
- `output_file` (required): Path to output text/JSON file
- `--model`: Whisper model size (tiny/base/small/medium/large, default: base)
- `--language`: Language code (e.g., en, zh, es, fr, auto for detection)
- `--timestamps`: Include word-level timestamps in output
- `--format`: Output format (text/json, default: text)
## Model Sizes
| Model | Parameters | Speed | Accuracy | Memory |
|--------|------------|-------|----------|--------|
| tiny | 39M | ~32x | Good | ~1GB |
| base | 74M | ~16x | Better | ~1GB |
| small | 244M | ~6x | Great | ~2GB |
| medium | 769M | ~2x | Excellent| ~5GB |
| large | 1.5B | 1x | Best | ~10GB |
## Supported Audio Formats
MP3, WAV, M4A, FLAC, OGG, AAC, WMA, and more (via FFmpeg)
## Dependencies
- Python 3.8+
- openai-whisper
- ffmpeg
## Installation
```bash
pip install openai-whisper
sudo apt-get install ffmpeg # Ubuntu/Debian
```
This skill transcribes audio files to text using OpenAI Whisper models. It supports common audio formats, automatic language detection for 90+ languages, optional timestamps, multiple model sizes, and text or JSON output. It is optimized for practical transcription workflows with speed/accuracy trade-offs across model sizes.
You provide an audio file and target output path; the skill runs a selected Whisper model to produce a transcript. Options let you choose model size (tiny to large), force or auto-detect language, include word- or segment-level timestamps, and export plain text or structured JSON with metadata. FFmpeg is used to preprocess and normalize audio so many file types are accepted.
What audio formats are supported?
Most formats are supported (MP3, WAV, M4A, FLAC, OGG, AAC, WMA) because FFmpeg is used to decode and normalize audio.
Which model should I pick for accuracy vs. speed?
Use tiny/base for fastest, low-memory tasks; small for a balance; medium or large when you need higher accuracy on noisy or complex speech.