home / skills / openclaw / skills / extract-youtube-transcript

extract-youtube-transcript skill

safe

/skills/hushenglang/extract-youtube-transcript

This skill extracts plain-text YouTube transcripts using a local Python script, enabling easy retrieval, saving, and language-specific transcription for videos.

npx playbooks add skill openclaw/skills --skill extract-youtube-transcript

Review the files below or copy the command above to add this skill to your agents.

Files (4)

SKILL.md

2.2 KB

---
name: extract-youtube-transcript
version: 2.1.0
description: Extract plain-text transcripts from YouTube videos using a local Python script. Use when the user wants to fetch, extract, or get a transcript from a YouTube video URL, analyze YouTube video content as text, or needs subtitles/captions from a video.
---

# Extract YouTube Transcript

Fetches plain-text transcripts from YouTube videos using `extract_youtube_transcript.py` in this skill folder.

## Dependency

```bash
pip show youtube-transcript-api &>/dev/null || pip install youtube-transcript-api
```

## Quick Start

```bash
python extract_youtube_transcript.py "https://www.youtube.com/watch?v=VIDEO_ID"
```

Supported URL formats: `youtube.com/watch?v=`, `youtu.be/`, `/embed/`, `/live/`, `/shorts/`, or a raw 11-char video ID.

## Common Patterns

### Fetch with preferred language(s)

```bash
python extract_youtube_transcript.py "URL" --lang zh-Hant en
```

Pass languages in priority order. Falls back to any available transcript if none match.

### Save transcript to file

```bash
python extract_youtube_transcript.py "URL" --output transcript.txt
```

Text is printed to stdout and also written to the file.

### List available languages first

```bash
python extract_youtube_transcript.py "URL" --list-langs
```

Use this to discover what language codes are available before fetching.

## Language Codes

| Code | Language |
|------|----------|
| `en` | English |
| `zh-Hant` | Traditional Chinese |
| `zh-Hans` | Simplified Chinese |
| `ja` | Japanese |
| `ko` | Korean |
| `es` | Spanish |

## Error Handling

| Error | Cause | Recovery |
|-------|-------|----------|
| `TranscriptsDisabled` | Owner disabled captions | No transcript available |
| `NoTranscriptFound` | Requested lang not found | Run `--list-langs`, pick an available code |
| `VideoUnavailable` | Video is private/deleted | Verify the URL |
| `AgeRestricted` | Age-gated video | Auth not supported; no workaround |
| `InvalidVideoId` | Malformed URL or ID | Check the URL format |

## Workflow

1. Try a direct fetch first
2. If `NoTranscriptFound`, run `--list-langs` to see available codes, then re-fetch with `--lang <code>`
3. Save long transcripts to a file with `--output` for easier downstream processing

Overview

This skill extracts plain-text transcripts from YouTube videos using a local Python script. It accepts full YouTube URLs or raw 11-character IDs and prints the transcript to stdout or writes it to a file. Use language preferences or list available caption languages before fetching. It is a simple, offline wrapper around the youtube-transcript-api for fast transcript retrieval.

How this skill works

The script parses the provided YouTube URL or ID, queries the YouTube transcript service via the youtube-transcript-api Python package, and returns the caption text in plain form. You can pass one or more preferred language codes; the script will try them in order and fall back to any available transcript if none match. It supports listing available languages, saving output to a file, and common URL variants like youtu.be, embed, live, and shorts.

When to use it

You need the text of a YouTube video for analysis, summarization, or search indexing.
You want to extract subtitles/captions for translation or accessibility checks.
You need to save long transcripts for downstream processing or archive purposes.
You want to quickly check which caption languages are available for a video.
You need an offline/local command-line tool to batch-extract transcripts.

Best practices

Install the dependency with pip before running: pip install youtube-transcript-api.
List available languages first (--list-langs) when unsure which caption tracks exist.
Pass multiple language codes in priority order to prefer certain captions while allowing fallbacks.
Redirect long transcripts to a file with --output for easier editing and downstream processing.
Handle failures gracefully: detect NoTranscriptFound, TranscriptsDisabled, VideoUnavailable, AgeRestricted errors and act accordingly.

Example use cases

Quickly fetch the English transcript for a tutorial video and pipe it into a summarizer.
Batch-archive subtitles for a channel by looping over IDs and saving each transcript file.
Discover available caption languages for a foreign-language talk and then fetch the preferred version.
Extract plain text from a lecture to feed into a search index or note-taking workflow.
Use transcripts as input to translation or sentiment analysis pipelines.

FAQ

What URL formats are supported?

Full watch URLs, youtu.be short links, /embed/, /live/, /shorts/, or the raw 11-character video ID are all supported.

What if the requested language is not available?

Run the script with --list-langs to see available codes, then re-run with --lang using an available code. The script also falls back to any available transcript if no preferred languages match.

Which errors should I expect and how to recover?

Common errors include TranscriptsDisabled (captions turned off), NoTranscriptFound (requested lang missing), VideoUnavailable (private/deleted), AgeRestricted (no auth workaround), and InvalidVideoId (bad URL). Use --list-langs and verify the URL or video status to recover.