home / skills / aviz85 / claude-skills-library / transcribe

This skill transcribes audio or video into SRT subtitles using ElevenLabs Scribe v2, boosting accessibility and streamlining subtitle generation.

npx playbooks add skill aviz85/claude-skills-library --skill transcribe

Review the files below or copy the command above to add this skill to your agents.

Files (6)
skill.md
1.9 KB
---
name: transcribe
description: "Transcribe audio/video to SRT subtitles using ElevenLabs Scribe v2. Use for: transcription, subtitles, captions, SRT generation."
setup_complete: false
setup: "./SETUP.md"
---

# Transcribe

> **First time?** If `setup_complete: false` above, run `./SETUP.md` first, then set `setup_complete: true`.

Generate SRT subtitle files from audio/video using ElevenLabs Scribe v2.

## Quick Start

```bash
cd ~/.claude/skills/transcribe/scripts

# Basic transcription (auto-detect language)
npx ts-node transcribe.ts -i /path/to/video.mp4 -o /path/to/output.srt

# Specify language
npx ts-node transcribe.ts -i /path/to/video.mp4 -o /path/to/output.srt -l en

# Custom subtitle length (max words per entry)
npx ts-node transcribe.ts -i /path/to/video.mp4 -o /path/to/output.srt --max-words 6

# Custom max duration per subtitle
npx ts-node transcribe.ts -i /path/to/video.mp4 -o /path/to/output.srt --max-duration 4.0
```

## Options

| Option | Short | Default | Description |
|--------|-------|---------|-------------|
| `--input` | `-i` | (required) | Input audio/video file |
| `--output` | `-o` | (required) | Output SRT file path |
| `--language` | `-l` | auto | Language code (en, he, ar, etc.) |
| `--max-words` | | 5 | Max words per subtitle entry |
| `--max-duration` | | 3.0 | Max seconds per subtitle entry |
| `--max-chars` | | 70 | Max characters per subtitle entry |
| `--timing-offset` | | 0.25 | Timing offset in seconds |
| `--json` | | false | Also output raw transcript JSON |

## Language Codes

- `en` - English
- `he` - Hebrew
- `ar` - Arabic
- `es` - Spanish
- `fr` - French
- `de` - German
- `ru` - Russian
- `zh` - Chinese
- `ja` - Japanese
- (or omit for auto-detection)

## Output

The script generates:
1. `.srt` file - Standard subtitle file
2. `.json` file (optional) - Raw transcript with word-level timestamps

## Environment

API key stored in `scripts/.env`:
```
ELEVENLABS_API_KEY=your_key_here
```

Overview

This skill converts audio or video into SRT subtitle files using ElevenLabs Scribe v2. It produces timecoded subtitle entries and can also output a raw JSON transcript with word-level timestamps. The tool supports language auto-detection or explicit language codes and exposes controls for subtitle length and timing.

How this skill works

I ingest an input audio or video file and send it to ElevenLabs Scribe v2 to generate a transcription with timestamps. The script segments the transcript into SRT-format subtitle entries according to max words, max duration, and max characters, and applies a small timing offset to improve sync. Optionally I save the raw transcript as JSON for deeper word-level timing or post-processing.

When to use it

  • Create closed captions for videos to improve accessibility and searchability.
  • Generate SRT files quickly for social media clips, training videos, or webinars.
  • Produce raw transcript JSON for further NLP processing or subtitle editing workflows.
  • Auto-detect language when source language is unknown or specify a language code for higher accuracy.

Best practices

  • Provide clean audio with minimal background noise and clear speech for best accuracy.
  • Use explicit language codes for non-English audio to avoid misdetection.
  • Adjust max-words, max-duration, and max-chars to match viewing device and reading speed.
  • Keep the timing-offset small (default ~0.25s) and tweak if subtitles appear too early or late.
  • Save the optional JSON output if you plan to edit timestamps or run additional processing.

Example use cases

  • Transcribe a webinar MP4 into an SRT file for upload to video platforms.
  • Create captions for short social videos by lowering max-words and max-duration.
  • Export JSON transcripts with word timestamps to build searchable video indexes.
  • Batch-process lecture recordings to produce both SRT and raw transcript files for editing and archiving.

FAQ

What language codes are supported?

Common codes like en, he, ar, es, fr, de, ru, zh, ja are supported; omit to enable auto-detection.

Can I customize subtitle length and timing?

Yes — you can set max-words, max-duration, max-chars, and timing-offset to control segmentation and sync.

Where do I put my ElevenLabs API key?

The script reads the API key from an environment file used by the tool; ensure your key is available to the script process.