home / skills / openclaw / skills / assemblyai-transcriber
This skill transcribes audio with speaker diarization and language detection, delivering timestamps across 100+ languages for meetings, interviews, and
npx playbooks add skill openclaw/skills --skill assemblyai-transcriberReview the files below or copy the command above to add this skill to your agents.
---
name: assemblyai-transcriber
description: "Transcribe audio files with speaker diarization (who speaks when). Supports 100+ languages, automatic language detection, and timestamps. Use for meetings, interviews, podcasts, or voice messages. Requires AssemblyAI API key."
metadata:
openclaw:
requires:
env:
- ASSEMBLYAI_API_KEY
---
# AssemblyAI Transcriber 🎙️
Transcribe audio files with speaker diarization (who speaks when).
## Features
- ✅ Transcription in 100+ languages
- ✅ Speaker diarization (Speaker A, B, C...)
- ✅ Timestamps per utterance
- ✅ Automatic language detection
- ✅ Supports MP3, WAV, M4A, FLAC, OGG, WEBM
## Setup
1. Create AssemblyAI account: https://www.assemblyai.com/
2. Get API key (free tier: 100 min/month)
3. Set environment variable:
```bash
export ASSEMBLYAI_API_KEY="your-api-key"
```
Or save to config file:
```json
// ~/.assemblyai_config.json
{
"api_key": "YOUR_API_KEY"
}
```
## Usage
### Transcribe local audio
```bash
python3 scripts/transcribe.py /path/to/recording.mp3
```
### Transcribe from URL
```bash
python3 scripts/transcribe.py https://example.com/meeting.mp3
```
### Options
```bash
python3 scripts/transcribe.py audio.mp3 --no-diarization # Skip speaker labels
python3 scripts/transcribe.py audio.mp3 --json # Raw JSON output
```
## Output Format
```
## Transcript
*Language: EN*
*Duration: 05:32*
**Speaker A** [00:00]: Hello everyone, welcome to the meeting.
**Speaker B** [00:03]: Thanks! Happy to be here.
**Speaker A** [00:06]: Let's start with the first item...
```
## Pricing
- **Free Tier**: 100 minutes/month free
- **After**: ~$0.01/minute
## Tips
- For best speaker diarization: clear speaker changes, minimal overlap
- Background noise is filtered well
- Multi-language auto-detection works reliably
---
**Author**: xenofex7 | **Version**: 1.1.0
This skill transcribes audio with speaker diarization, timestamps, and automatic language detection using the AssemblyAI API. It supports 100+ languages and common audio formats (MP3, WAV, M4A, FLAC, OGG, WEBM). An AssemblyAI API key is required to run the tool.
The skill uploads local files or remote URLs to AssemblyAI, requests transcription with optional speaker diarization, and returns a structured transcript with speaker labels and timestamps. It can output human-friendly text or raw JSON for downstream processing. Language detection is automatic, and timestamps are provided per utterance.
Do I need an AssemblyAI account?
Yes. You must set an ASSEMBLYAI_API_KEY environment variable or config file with your API key.
Which audio formats are supported?
Common formats are supported, including MP3, WAV, M4A, FLAC, OGG, and WEBM.
Can it detect language automatically?
Yes. The skill uses AssemblyAI's automatic language detection for multi-language audio.