home / skills / qwenlm / qwen-code-examples / youtube-transcript-extractor

youtube-transcript-extractor skill

safe

This skill extracts timestamped YouTube transcripts for translation, summarization, and content creation, saving transcripts locally and showing timestamps.

npx playbooks add skill qwenlm/qwen-code-examples --skill youtube-transcript-extractor

Review the files below or copy the command above to add this skill to your agents.

Files (3)

SKILL.md

1.3 KB

---
name: YouTube Transcript Extractor
description: Extracts timestamped transcripts from YouTube videos for translation, summarization, and content creation.
---

# YouTube Transcript Extractor

## What You Get

- **Terminal Output**: Prints the transcript with `[HH:MM:SS]` timestamps.
- **Local File**: Writes `youtube_transcript_{video_id}.txt` in the **current working directory**.
  - Includes the source video URL and the full transcript content.

## Requirements

- Python 3.9+ (Recommended)
- Network environment that can access YouTube transcript APIs.
- Python Dependencies:
  - `youtube-transcript-api`

## Quick Start

### Step 1: (Optional) Create a Virtual Environment

```bash
python3 -m venv .venv
source .venv/bin/activate
python -m pip install -U pip
```

### Step 2: Install Dependencies

```bash
python -m pip install youtube-transcript-api
```

### Step 3: Extract Transcript

Run from the repository root (recommended for the clearest path):

```bash
python skills/youtube-transcript-extractor/scripts/get_youtube_transcript.py "https://www.youtube.com/watch?v=IDSAMqip6ms"
```

- The transcript will be saved to `youtube_transcript_{video_id}.txt` in the current working directory.

## Step 4: Convert Transcript to Reader-Friendly Markdown

Output `youtube_transcript_{video_id}.md`. Note: Ensure the content is not altered or truncated.

Overview

This skill extracts timestamped transcripts from YouTube videos and saves them as local files for translation, summarization, and content creation. It produces both terminal output with [HH:MM:SS] timestamps and a saved transcript file named youtube_transcript_{video_id}.txt in the current working directory. The saved file includes the source video URL and the full transcript content.

How this skill works

The extractor calls YouTube transcript services to retrieve available captions for a given video URL or ID. It formats each caption line with a [HH:MM:SS] timestamp, prints the result to the terminal, and writes a complete transcript file in the working directory. Optionally, the output can be converted into reader-friendly Markdown without altering the transcript content.

When to use it

You need a timestamped transcript for editing, quoting, or repurposing video content.
Preparing source material for machine translation or human translation workflows.
Feeding transcripts into summarization or indexing pipelines for search and discovery.
Creating chapter markers, show notes, or content snippets for social media.
Archiving spoken content or generating accessibility materials.

Best practices

Run the script from the directory where you want the transcript file saved to avoid moving files later.
Use the exact YouTube video URL or the canonical video ID to ensure correct retrieval.
Ensure network access to YouTube transcript endpoints and install the required dependency (youtube-transcript-api).
Verify language availability: some videos lack transcripts or only have auto-generated captions with errors.
Keep the raw transcript unaltered when feeding automated translation or summarization tools to preserve timestamps.

Example use cases

Export a lecture transcript to produce study notes or searchable archives.
Generate timestamped source text for machine translation before human post-editing.
Feed video transcripts into an LLM to create concise summaries or article drafts.
Produce show notes and chapter timestamps for podcasts repurposed from video.
Create social media quote cards with exact time references for clips.

FAQ

What dependencies and environment do I need?

You need Python 3.9+ (recommended) and the youtube-transcript-api package, plus network access to YouTube transcript endpoints.

Where is the transcript saved and what format is used?

A file named youtube_transcript_{video_id}.txt is written to the current working directory and includes the source URL and timestamped transcript lines.