home / skills / openclaw / skills / assemblyai-transcriber

assemblyai-transcriber skill

/skills/xenofex7/assemblyai-transcriber

This skill transcribes audio with speaker diarization and language detection, delivering timestamps across 100+ languages for meetings, interviews, and

npx playbooks add skill openclaw/skills --skill assemblyai-transcriber

Review the files below or copy the command above to add this skill to your agents.

Files (4)
SKILL.md
1.8 KB
---
name: assemblyai-transcriber
description: "Transcribe audio files with speaker diarization (who speaks when). Supports 100+ languages, automatic language detection, and timestamps. Use for meetings, interviews, podcasts, or voice messages. Requires AssemblyAI API key."
metadata:
  openclaw:
    requires:
      env:
        - ASSEMBLYAI_API_KEY
---

# AssemblyAI Transcriber 🎙️

Transcribe audio files with speaker diarization (who speaks when).

## Features

- ✅ Transcription in 100+ languages
- ✅ Speaker diarization (Speaker A, B, C...)
- ✅ Timestamps per utterance
- ✅ Automatic language detection
- ✅ Supports MP3, WAV, M4A, FLAC, OGG, WEBM

## Setup

1. Create AssemblyAI account: https://www.assemblyai.com/
2. Get API key (free tier: 100 min/month)
3. Set environment variable:

```bash
export ASSEMBLYAI_API_KEY="your-api-key"
```

Or save to config file:

```json
// ~/.assemblyai_config.json
{
  "api_key": "YOUR_API_KEY"
}
```

## Usage

### Transcribe local audio

```bash
python3 scripts/transcribe.py /path/to/recording.mp3
```

### Transcribe from URL

```bash
python3 scripts/transcribe.py https://example.com/meeting.mp3
```

### Options

```bash
python3 scripts/transcribe.py audio.mp3 --no-diarization  # Skip speaker labels
python3 scripts/transcribe.py audio.mp3 --json            # Raw JSON output
```

## Output Format

```
## Transcript

*Language: EN*
*Duration: 05:32*

**Speaker A** [00:00]: Hello everyone, welcome to the meeting.
**Speaker B** [00:03]: Thanks! Happy to be here.
**Speaker A** [00:06]: Let's start with the first item...
```

## Pricing

- **Free Tier**: 100 minutes/month free
- **After**: ~$0.01/minute

## Tips

- For best speaker diarization: clear speaker changes, minimal overlap
- Background noise is filtered well
- Multi-language auto-detection works reliably

---

**Author**: xenofex7 | **Version**: 1.1.0

Overview

This skill transcribes audio with speaker diarization, timestamps, and automatic language detection using the AssemblyAI API. It supports 100+ languages and common audio formats (MP3, WAV, M4A, FLAC, OGG, WEBM). An AssemblyAI API key is required to run the tool.

How this skill works

The skill uploads local files or remote URLs to AssemblyAI, requests transcription with optional speaker diarization, and returns a structured transcript with speaker labels and timestamps. It can output human-friendly text or raw JSON for downstream processing. Language detection is automatic, and timestamps are provided per utterance.

When to use it

  • Transcribing meetings to capture who said what and when.
  • Converting interviews or podcasts into time-stamped text for editing or publishing.
  • Generating searchable transcripts for voice messages or call recordings.
  • Creating meeting notes, action items, or summaries from recorded sessions.
  • Processing dual- or multi-speaker audio where speaker separation matters.

Best practices

  • Provide high-quality audio with clear speaker separation for better diarization.
  • Use supported file formats or supply a stable URL for remote files.
  • Set the ASSEMBLYAI_API_KEY environment variable before running scripts.
  • Enable or disable diarization based on whether speaker labels are needed.
  • Export raw JSON when integrating transcripts into other tools or databases.

Example use cases

  • Run transcribe.py on a meeting.mp3 to produce a timestamped transcript with Speaker A/B labels.
  • Process podcast episodes to create chapter markers and show notes from speaker timestamps.
  • Transcribe interview recordings and export JSON to feed into an indexing search engine.
  • Convert customer support calls into searchable logs with speaker segmentation for agent vs. customer.

FAQ

Do I need an AssemblyAI account?

Yes. You must set an ASSEMBLYAI_API_KEY environment variable or config file with your API key.

Which audio formats are supported?

Common formats are supported, including MP3, WAV, M4A, FLAC, OGG, and WEBM.

Can it detect language automatically?

Yes. The skill uses AssemblyAI's automatic language detection for multi-language audio.