home / skills / jykim / claude-obsidian-skills / markdown-video

markdown-video skill

safe

This skill converts markdown slides to a narrated presentation video using AI visuals and TTS audio, with delta updates to save time.

npx playbooks add skill jykim/claude-obsidian-skills --skill markdown-video

Review the files below or copy the command above to add this skill to your agents.

Files (11)

SKILL.md

10.5 KB

---
name: markdown-video
description: Convert Deckset-format markdown slides with speaker notes to presentation video with TTS narration. Use when user requests to create video from slides, generate presentation video, or convert slides to MP4 format.
allowed-tools:
  - Read
  - Write
  - Bash
  - Glob
license: MIT
---

# Markdown Video Skill

Convert markdown slides to presentation video with AI-generated visuals and TTS audio narration.

## When to Use This Skill

Activate this skill when the user:
- Asks to create video from markdown slides
- Requests to convert presentation to MP4 format
- Wants to generate narrated video from slides
- Needs automated slide-to-video conversion

## Key Features

- **Gemini AI-generated visuals**: High-quality slide images with full emoji and Korean support
- **OpenAI TTS narration**: Natural voice from speaker notes
- **Delta updates**: Only regenerates changed slides (saves time and API costs)
- **Multiple visual styles**: technical-diagram, professional, vibrant-cartoon, watercolor

## Input Requirements

- **Markdown file** with speaker notes marked with `^` prefix
- **GEMINI_API_KEY** environment variable for image generation
- **OPENAI_API_KEY** environment variable for TTS audio

## Output Specifications

- **MP4 video**: 1920x1080 (Full HD)
- **Duration**: Each slide displays for duration of its audio narration
- **File naming**: `{input_filename}.mp4`

---

## Workflow

### Step 1: Generate Audio Files

```bash
cd "{slides_directory}"
python /Users/lifidea/.claude/skills/markdown-video/generate_audio.py "{slides_filename}" --output-dir "audio"
```

**Delta update**: Only regenerates audio for slides with changed speaker notes.
- Use `--force` to regenerate all audio files

**Output**:
- `audio/slide_0.mp3`, `slide_1.mp3`, ... (0-indexed)
- Cache file: `audio/.audio_cache.json`

### Step 2: Generate Slide Images with Gemini

```bash
cd "{slides_directory}"
python /Users/lifidea/.claude/skills/markdown-video/create_slides_gemini.py "{slides_filename}" \
  --output-dir "slides-gemini" \
  --style "technical-diagram" \
  --auto-approve
```

**Delta update**: Only regenerates images for slides with changed content.
- Use `--force` to regenerate all slide images

**Style Options**:

| Style | Description | Best For |
|-------|-------------|----------|
| `technical-diagram` | Clean lines, infographic icons, muted blue/gray | Technical, education |
| `professional` | Minimalist, geometric shapes | Corporate, formal |
| `vibrant-cartoon` | Bright gradients, flat design | Marketing, startups |
| `watercolor` | Soft pastels, organic shapes | Creative, personal |

**Other Parameters**:
- `--model`: Gemini model (default: gemini-3-pro-image-preview)
- `--aspect-ratio`: 16:9 (default), 1:1, 9:16, 4:3, 3:4
- `--start-from N`: Resume from slide N
- `--dry-run`: Preview prompts without generating

**Output**:
- `slides-gemini/1.jpeg`, `2.jpeg`, ... (1-indexed)
- Cache file: `slides-gemini/.slides_cache.json`

### Step 3: Create Final Video

```bash
cd "{slides_directory}"
python /Users/lifidea/.claude/skills/markdown-video/slides_to_video.py \
  --slides-dir "slides-gemini" \
  --audio-dir "audio" \
  --output "{output_filename}.mp4"
```

---

## Delta Updates

Both audio and image generation support **delta updates** - only regenerating what changed.

### How It Works

1. **Content hashing**: Each slide's content is hashed (MD5)
2. **Cache storage**: Hashes stored in `.audio_cache.json` / `.slides_cache.json`
3. **Change detection**: On subsequent runs, only changed slides are regenerated
4. **File verification**: Also checks if output file exists

### Example Output

```
✅ Found 20 slides
   20 slides with speaker notes

✨ Delta update: 17 slides unchanged, 3 to regenerate

🎵 Generating 3 audio files...
Progress |████████████████████████████████████████| 3/3 (100.0%)

✅ Audio generation complete!
   Generated: 3/3 files
   Unchanged: 17 files (skipped)
```

### Force Regeneration

To ignore cache and regenerate everything:

```bash
# Force regenerate all audio
python generate_audio.py "slides.md" --output-dir "audio" --force

# Force regenerate all images
python create_slides_gemini.py "slides.md" --output-dir "slides-gemini" --force
```

---

## Quick Reference

### Full Workflow (First Run)

```bash
cd "{slides_directory}"

# Step 1: Generate audio
python /Users/lifidea/.claude/skills/markdown-video/generate_audio.py "slides.md" --output-dir "audio"

# Step 2: Generate slide images
python /Users/lifidea/.claude/skills/markdown-video/create_slides_gemini.py "slides.md" \
  --output-dir "slides-gemini" \
  --style "technical-diagram" \
  --auto-approve

# Step 3: Create video
python /Users/lifidea/.claude/skills/markdown-video/slides_to_video.py \
  --slides-dir "slides-gemini" \
  --audio-dir "audio" \
  --output "presentation.mp4"
```

### Update Workflow (After Changes)

Same commands - delta updates are automatic:

```bash
# Only regenerates changed slides
python generate_audio.py "slides.md" --output-dir "audio"
python create_slides_gemini.py "slides.md" --output-dir "slides-gemini" --auto-approve
python slides_to_video.py --slides-dir "slides-gemini" --audio-dir "audio" --output "presentation.mp4"
```

---

## Requirements

### System Dependencies
- **Python 3.7+**
- **ffmpeg**: `brew install ffmpeg`

### Python Packages
```bash
pip install Pillow requests google-genai
```

### Environment Variables
```bash
export OPENAI_API_KEY="sk-..."
export GEMINI_API_KEY="..."
```

---

## Cost Estimation

| Component | Cost | Example (20 slides) |
|-----------|------|---------------------|
| Gemini images | ~$0.04/slide | ~$0.80 |
| OpenAI TTS | ~$0.015/1K chars | ~$0.50 |
| **Total** | | ~$1.30 |

With delta updates, subsequent runs only cost for changed slides.

---

## Error Handling

### No speaker notes found
- Slides need `^` prefixed speaker notes for narration
- Example: `^ This is the speaker note for this slide.`

### Pronunciation problems
- Replace technical terms with phonetic equivalents in speaker notes
- Test with `--dry-run` first

### API errors
- Check API key environment variables
- Gemini rate limits: script includes 1-second delay between generations

---

## Quality Checklist

Before marking complete:

- [ ] OpenAI and Gemini API keys configured
- [ ] Markdown file has speaker notes with `^` prefix
- [ ] Audio files generated successfully
- [ ] Slide images generated successfully
- [ ] Video plays correctly with synced audio
- [ ] Resolution is 1920x1080

---

## Image Generation Mode

Two approaches for generating visuals:

| Mode | Script | Best For |
|------|--------|----------|
| **Slide-by-Slide** | `create_slides_gemini.py` | Standard presentations, precise control |
| **Section-based** | `generate_section_images.py` | Long presentations, infographic style |

### When to Use Section-Based

- Presentations with 20+ slides
- Content naturally groups into logical sections
- Prefer infographic overview per section vs. individual slides
- Want to reduce API costs (fewer images)

---

## Section-Based Workflow (Alternative)

For presentations with many slides, generate one infographic image per section instead of per slide.

### Comparison

| Aspect | Slide-by-Slide | Section-Based |
|--------|---------------|---------------|
| Images | 1 per slide | 1 per section |
| Audio | Per slide | Per slide → merged by section |
| Review | Direct in markdown | Video script document |
| Best for | Short presentations | Long presentations (20+ slides) |

### Step 1: Generate Audio Files

Same as standard workflow:

```bash
cd "{slides_directory}"
python /Users/lifidea/.claude/skills/markdown-video/generate_audio.py "{slides_filename}" --output-dir "audio"
```

### Step 2: Generate Section Infographic Images

```bash
cd "{slides_directory}"
python /Users/lifidea/.claude/skills/markdown-video/generate_section_images.py "{slides_filename}" \
  --output-dir "slides-section" \
  --style "infographic"
```

**Style Options**:

| Style | Description |
|-------|-------------|
| `infographic` | Clean professional with icons (default) |
| `professional` | Minimalist corporate design |
| `vibrant` | Bright gradients for marketing |
| `technical` | Flowcharts and technical diagrams |

**Other Parameters**:
- `--start-from N`: Resume from section N
- `--force`: Regenerate all images
- `--dry-run`: Preview sections without generating
- `--delay N`: Seconds between API calls (default: 2.0)

### Step 3: Create Video Script (Optional)

Generate a markdown document for reviewing narration:

```bash
cd "{slides_directory}"
python /Users/lifidea/.claude/skills/markdown-video/create_video_script.py "{slides_filename}" \
  --output "video_script.md" \
  --image-dir "slides-section"
```

The script document shows:
- Section images embedded
- Speaker notes for each slide
- Easy editing format

### Step 4: Review & Edit Narration

1. Open `video_script.md`
2. Review narration in blockquotes
3. Edit directly in the document
4. Update original markdown file with changes
5. Regenerate audio for changed slides:
   ```bash
   python generate_audio.py "slides.md" --output-dir "audio"
   ```

### Step 5: Create Section-Based Video

```bash
cd "{slides_directory}"
python /Users/lifidea/.claude/skills/markdown-video/create_section_video.py \
  --slides "{slides_filename}" \
  --audio-dir "audio" \
  --image-dir "slides-section" \
  --output "presentation.mp4"
```

**With config file** (for custom section mappings):

```bash
python create_section_video.py \
  --config "sections.json" \
  --audio-dir "audio" \
  --image-dir "slides-section" \
  --output "presentation.mp4"
```

Config file format:
```json
{
  "sections": [
    {"id": 0, "name": "title", "audio_slides": [0]},
    {"id": 1, "name": "introduction", "audio_slides": [1, 2, 3]},
    {"id": 2, "name": "main_content", "audio_slides": [4, 5, 6, 7]}
  ]
}
```

---

## Section-Based Quick Reference

```bash
cd "{slides_directory}"

# Step 1: Generate audio
python /Users/lifidea/.claude/skills/markdown-video/generate_audio.py "slides.md" --output-dir "audio"

# Step 2: Generate section images
python /Users/lifidea/.claude/skills/markdown-video/generate_section_images.py "slides.md" \
  --output-dir "slides-section" \
  --style "infographic"

# Step 3 (optional): Create review document
python /Users/lifidea/.claude/skills/markdown-video/create_video_script.py "slides.md" \
  --output "video_script.md" \
  --image-dir "slides-section"

# Step 4: Create final video
python /Users/lifidea/.claude/skills/markdown-video/create_section_video.py \
  --slides "slides.md" \
  --audio-dir "audio" \
  --image-dir "slides-section" \
  --output "presentation.mp4"
```

Overview

This skill converts Deckset-format markdown slides with speaker notes into a polished presentation video with AI-generated visuals and TTS narration. It produces a Full HD MP4 where each slide duration matches its narrated audio. Delta updates avoid reprocessing unchanged slides, saving time and API costs.

How this skill works

The skill parses markdown slides and extracts speaker notes marked with a ^ prefix to generate per-slide TTS audio. It uses Gemini image generation to render slide visuals in selectable styles, then combines slide images and audio with ffmpeg into a 1920x1080 MP4. Hash-based caches track slide content so only changed audio or images are regenerated unless forced.

When to use it

You want to convert markdown slides into an MP4 presentation with narration.
You need automated TTS narration from speaker notes for consistency and speed.
You want AI-generated slide visuals with selectable styles (technical, professional, cartoon, watercolor).
You need efficient updates for edited slides without redoing the whole video.

Best practices

Mark speaker notes with ^ at the start of the line for robust TTS extraction.
Set GEMINI_API_KEY and OPENAI_API_KEY environment variables before running scripts.
Use --dry-run to preview prompts and narration before incurring API costs.
Choose section-based image generation for long decks (20+ slides) to reduce cost.
Use --force only when you must regenerate all assets; rely on delta updates for routine edits.

Example use cases

Create a narrated product demo video directly from a markdown slide deck.
Convert lecture notes into a narrated MP4 for asynchronous teaching.
Produce marketing or pitch videos with vibrant AI-generated visuals and synced narration.
Generate section infographics instead of per-slide images for long reports to save cost and improve overview.
Quickly iterate on a presentation: edit speaker notes, regenerate audio for changed slides, and rebuild the MP4.

FAQ

What input format is required?

A Deckset-style markdown file where speaker notes are prefixed with ^. Slides are parsed from the markdown; notes become TTS input.

How do delta updates work?

Each slide is hashed and stored in cache files (.audio_cache.json and .slides_cache.json). On subsequent runs only slides whose hash changed are regenerated unless you use --force.

What dependencies are required?

Python 3.7+, ffmpeg, and Python packages Pillow, requests, and google-genai. Also export OPENAI_API_KEY and GEMINI_API_KEY.