home / skills / cdeistopened / opened-vault / video-caption-creation
This skill helps you craft on-screen captions and hooks for short-form videos using complementary text that adds meaning beyond audio, increasing viewer
npx playbooks add skill cdeistopened/opened-vault --skill video-caption-creationReview the files below or copy the command above to add this skill to your agents.
---
name: video-caption-creation
description: Create on-screen text hooks and captions for short-form video clips. Built around the Complementarity Principle - on-screen text should ADD to what the viewer hears, not repeat it. Includes podcast clip workflow, hook categories, and Triple Word Score optimization.
---
# Video Caption & On-Screen Hook Writer (v2)
On-screen text is the #1 visual element in short-form video. It does the PRIMARY work of stopping a scroll. A viewer sees the text and decides in <1 second whether to stop. This skill creates the text that makes them stop.
**Previous version:** `SKILL_v1_archive.md` (basic Triple Word Score, retired Feb 2026)
---
## The Complementarity Principle
**The central insight:** On-screen text should NOT label or repeat the audio. It should ADD context that makes the audio land harder. The gap between what you read and what you hear creates curiosity.
This works exactly like title + thumbnail on YouTube: together they create a fuller picture than either alone. Andrew Muto: *"You read the title, and then the thumbnail is offering you a little extra."*
**Bad (labeling):** On-screen says "Kids know." → Audio is about teacher dissatisfaction. Vague label, no gap.
**Good (complementary):** On-screen says "A kindergartner can tell" → Audio reveals that 55% of teachers want to quit, and every morning a five-year-old walks into a classroom and senses it. The childlike framing + the adult data = productive tension.
### The 6 Complementarity Patterns
| Pattern | On-Screen Text Does | Audio Does | Why It Works |
|---------|---------------------|------------|--------------|
| **Question → Answer** | Asks a question | Delivers the answer | Open loop the viewer must close |
| **Problem → Solution** | Names a pain point | Provides the fix | Viewer self-identifies, stays for relief |
| **Framework → Content** | Names a strategy/concept | Explains what it means | "What IS that?" curiosity |
| **Credential → Insight** | Establishes authority | Delivers the revelation | Trust signal makes content land harder |
| **Teaser → Payoff** | Hints at what's coming | Completes the thought | Narrative tension drives completion |
| **Reframe → Evidence** | Challenges an assumption | Provides the proof | Cognitive dissonance demands resolution |
### How Complementarity Works in Practice
**Audio says:** *"If we think that our public system's path to victory will be to trap families, you've lost the argument."*
The text should NOT repeat "you've lost the argument" or "trap families" as a label. Instead, it adds a new layer:
- **Question → Answer:** "Schools trap families?" - opens a loop the audio closes
- **Reframe → Evidence:** "You're defunding yourself" - adds a provocative angle the audio then explains
- **Credential → Insight:** "CEO of 80 microschools explains why..." - trust signal that makes the bold claim land harder
**Audio says:** *"In a big school, kids just feel lost. So they retreat to the screen. Then we ban the screen, and they're like, I'm still lost."*
Good complementarity adds a new frame:
- "Why phone bans don't work" - reframes around the cultural conversation
- "80 schools. Zero phone problems." - credential that implies they solved it
- "In a big school, kids just feel lost. So they retreat to the screen. Then we ban the screen, and they're like - I'm still lost." - sometimes the FULL quote IS the hook (see Length Variety below)
---
## The First-3-Words Test
The first 3 words someone reads do 80% of the work. Before finalizing any hook, check:
**What are the first 3 words?**
- "A kindergartner can" → Curiosity (can what?)
- "Schools trap families" → Shock (they do?)
- "55% of teachers" → Specificity (tells me this is data)
- "Why phone bans" → Promise (I'm about to learn something)
**Weak first 3 words:**
- "Here's what happens" → Generic, could be anything
- "Check this out" → Zero information
- "In this clip" → Meta, breaks immersion
- "You need to" → Commanding without earning attention
**Front-load the punch.** Move the most surprising, specific, or provocative word as close to position 1 as possible.
---
## Length Variety (Critical)
**Every set of 3-4 hook options MUST include different lengths.** This is non-negotiable. If all your hooks are 3-5 word punchy phrases, you're in a rut.
| Length | Word Count | When It Works | Example |
|--------|-----------|---------------|---------|
| **Punchy** | 2-4 words | Stats, named frameworks, provocative labels | "Homework is a scam" |
| **Statement** | 5-8 words | Reframes, challenges, credential leads | "His parents almost cut off his entire career" |
| **Narrative** | 9-15 words | Story setups, scenario hooks, before/after | "I tried every AI tool so you don't have to" |
| **Full quote** | 15+ words | When the quote itself IS the hook (rare, powerful) | "Kids learn to walk at different ages... so why do we make every kid learn algebra at the same age?" |
**The trap:** AI defaults to short and punchy because it pattern-matches "hook = headline." Fight this. Some of the best-performing on-screen text is conversational and long.
---
## Proven Hook Library
These are original examples from research into viral short-form content. They're organized by format/archetype. **Preserve the originals** - they have more texture than anything we'd rewrite. Use framework fitting at runtime to adapt them to specific content.
### Results and Transformation
| Framework | Proven Example | Lever |
|-----------|---------------|-------|
| "How I [result]" | "How I hit 100k views in 7 days" | Social proof + utility |
| "[Before] vs [After]" | "My $10M company vs my first $100 sale" | Rags-to-riches narrative |
| "The one [niche] tip that changed everything" | "The one lighting tip that fixed my videos" | High-ROI promise |
| "I did [action] so you don't have to" | "I tried every AI tool so you don't have to" | Utility + time-saving |
| "How to [goal] with zero [pain point]" | "How to build an app with zero coding" | Barrier removal |
### Curiosity Gap and Contrarian
| Framework | Proven Example | Lever |
|-----------|---------------|-------|
| "Stop doing [action] if you want [result]" | "Stop doing bench press if you want chest growth" | Pattern interrupt + urgency |
| "You've been [action] wrong your whole life" | "You've been folding your shirts all wrong" | Challenges assumptions |
| "What they don't want you to know about [topic]" | "What banks don't want you to know about debt" | Forbidden knowledge |
| "Everyone is wrong about [topic]" | "Everyone is wrong about passive income" | Cognitive dissonance |
| "This mistake cost me [amount]" | "This one mistake cost me $2,300" | Financial stakes |
### Authority and "Secret"
| Framework | Proven Example | Lever |
|-----------|---------------|-------|
| "No one talks about this in [industry]" | "No one talks about this in the creator economy" | Exclusivity |
| "I probably shouldn't share this, but..." | "I probably shouldn't share this airline loophole" | Insider access |
| "The biggest secret about [topic]" | "The biggest secret about starting a business" | Curiosity gap |
| "3 [tools/things] that feel illegal to know" | "3 AI tools that feel illegal to know" | Rule-breaking novelty |
| "I wish I knew this [time] ago" | "I wish I knew this before I turned 30" | Wisdom through regret |
### Question and Relatability
| Framework | Proven Example | Lever |
|-----------|---------------|-------|
| "If you're a [identity], watch this" | "If you're a SaaS founder, watch this" | Self-identification |
| "Do you struggle with [problem]?" | "Do you struggle with falling asleep?" | Pain point |
| "Can you guess what happens if you [action]?" | "Can you guess what happens if you quit sugar?" | Suspense |
| "Are you making this mistake in [action]?" | "Are you making this mistake in your morning routine?" | Fear of error |
| "If you've ever..." | "If you've ever unbuttoned your jeans at dinner..." | Hyper-specificity |
### Education/Parenting Originals (from Andrew Muto research)
These are proven hooks from education creators - closer to our audience:
**Short punchy:**
- "Homework is a scam"
- "Your kid's C+ is fine"
- "Your ADHD kid isn't broken"
**Statement length:**
- "Screen time isn't the problem. Boredom is."
- "Neurodivergent kids aren't the problem. The system is."
- "5 Signs Your Kid Doesn't Actually Hate Math"
**Narrative length:**
- "Kids don't hate math. They hate how it's taught."
- "Edison was kicked out of school for being 'too curious'"
- "They said homeschooled kids won't socialize. Have you met public schoolers?"
- "His kids skipped school for 100 days. Here's what happened."
- "Finland banned homework below age 10. Now they're #1 in education."
**Full quote / conversational:**
- "My kids socialize with humans of all ages, not just 24 kids born the same year"
- "Kids learn to walk at different ages... so why do we make every kid learn algebra at the same age?"
- "Things I don't explain to strangers anymore: Why my kids aren't in school"
### Podcast Clip Archetypes (from research report)
| Archetype | Role | Trigger | Example |
|-----------|------|---------|---------|
| **The Teacher** | Provide fast, actionable value | Utility + expertise | "3 ways to make your next video go viral" |
| **The Investigator** | Reveal a hidden truth | Exclusivity + FOMO | "No one's talking about this - but it changes everything" |
| **The Contrarian** | Challenge popular belief | Cognitive dissonance | "Hashtags don't actually help your reach. Here's why" |
| **The Fortuneteller** | Hint at transformation | Aspiration + reward | "Here's what happened when I quit caffeine for 30 days" |
| **The Experimenter** | Share trial results | Curiosity + relatability | "I posted 3 times a day for a week. Here's what happened" |
---
## Framework Fitting at Runtime
The Proven Hook Library above contains TEMPLATES. When generating hooks for a specific clip, use framework fitting:
1. **Read the clip transcript** - find the hookable moment
2. **Scan the library** - which framework fits this moment?
3. **Adapt the framework** to the specific content, preserving the format's proven structure
4. **Vary the length** - each set of 3-4 options should span at least 2 different length categories
**Example:** If the clip is about a microschool founder who discovered mixed-age classrooms work:
- Punchy: "The Trojan horse classroom" (Named Framework)
- Statement: "Teachers hate this. Then something flips." (Curiosity Gap)
- Narrative: "She put 3 grades in one room. Teachers hated it. Then something flipped." (Fortuneteller)
- Full quote: "The mixed-age classroom can be a Trojan horse to introduce a totally different way of teaching." (when the quote itself is gold)
---
## Podcast Clip On-Screen Hook Workflow
This is the step-by-step process for generating hooks from podcast transcript clips. Used by the podcast-production skill at Step 4.
### Step 1: Identify the Hookable Moment
Scan the clip transcript for one of these triggers (SURP framework):
- **S**urprising fact or statistic ("55% of teachers want to leave")
- **U**nexpected quote or bold statement ("you've lost the argument")
- **R**elatable problem ("kids feel lost, so they retreat to the screen")
- **P**rovocative opinion ("our path to victory will be to trap families")
The hookable moment is usually NOT the first sentence of the clip. Find the single line that would make someone stop scrolling if they saw it as a headline.
### Step 2: Determine the Complementarity Pattern
Ask: What would ADD the most to what the viewer hears?
| If the audio... | Best pattern | On-screen text should... |
|-----------------|-------------|------------------------|
| Makes a bold claim | Reframe → Evidence | Challenge the assumption the claim disproves |
| Tells a story | Teaser → Payoff | Hint at the ending without revealing it |
| Cites data/stats | Question → Answer | Ask the question the stat answers |
| Introduces a framework | Framework → Content | Name the framework |
| Gives advice | Problem → Solution | Name the problem the advice solves |
| Shares credentials | Credential → Insight | Lead with the credential |
### Step 3: Generate 3-4 Hook Options
For each clip, write 3-4 options with LENGTH VARIETY. Each set should span at least 2 different length categories (punchy, statement, narrative, full quote).
Scan the Proven Hook Library for a matching framework. Adapt the framework to this specific content while preserving the format's proven structure.
### Step 4: Apply Quality Gates
For each option, check:
- [ ] **First-3-Words Test:** First 3 words carry the punch?
- [ ] **McDonald's Test:** Someone at McDonald's understands instantly?
- [ ] **Scroll Test:** Would I personally stop scrolling?
- [ ] **Gap Test:** Text + audio create a gap, not a repeat?
- [ ] **Grandmother Test:** My grandmother gets what this is about?
### Step 5: Mark Recommended Pick
Select one recommended option and explain WHY in one sentence. The rationale should reference the complementarity - how the text and audio work together.
---
## Identifying Hookable Moments in Transcripts
When scanning a full transcript (before clips are selected), look for:
**Tier 1: Almost always hookable**
- Specific statistics ("55% of teachers want to leave")
- Named concepts or frameworks the speaker coined
- Moments that contradict popular belief
- Emotional peaks (speaker's voice changes, pace quickens)
- Universal pain points ("every parent knows...")
**Tier 2: Often hookable**
- Personal confessions or vulnerable moments
- "I should probably not say this, but..."
- Origin stories (how they started, what went wrong)
- Direct statements that take a side
**Tier 3: Rarely hookable (avoid)**
- Agreement moments ("yeah, totally, I think so too")
- Background context / scene-setting
- Nuanced, heavily-caveated statements
- Inside baseball (jargon-heavy industry talk)
**The whole transcript matters.** The strongest moment might be at minute 48. Don't mine just the first half.
---
## The Triple Word Score System
Four signals must align so algorithms AND humans immediately know "this is for me":
### 1. Audio Transcript (MOST IMPORTANT)
- What the speaker says out loud - algorithms auto-transcribe this
- Topic words must appear in first 10 seconds
- Core terminology repeated naturally throughout
### 2. On-Screen Text Hook
- Visual overlay using complementarity principle (above)
- NOT a repetition of the audio - an addition to it
- Lead with topic words in the hook
### 3. Caption Copy
- Post description with topic-relevant keywords
- Opens with a topic-relevant phrase
- Provides context the algorithm needs
### 4. Strategic Hashtags
- 10-12 total (optimal range)
- Broad → Mid → Specific → Niche → Audience → Platform
- Example: #Education #Parenting #Homeschool #Microschools #SchoolChoice #HomeschoolMom #Shorts
**When all four align, the algorithm recognizes the topic immediately and serves it to the right people.**
---
## Caption Writing
### Short-Form Platforms (Same Caption Everywhere)
**Applies to:** YouTube Shorts, Instagram Reels, TikTok, Facebook Reels
One caption per clip. Don't write platform-specific versions (waste of time, same audience).
**Format:**
```
[Verbatim quote or key insight from clip - 1-2 sentences]
[Context: who the speaker is and why it matters - 1 sentence]
#Hashtags (10-12)
```
**Optional X Variant:**
Shorter, more conversational. Include @handles. Drop the context sentence.
### Caption Quality Check
- [ ] Opens with the strongest line from the clip
- [ ] Includes guest name and title
- [ ] Hashtags span broad to niche
- [ ] Under 150 characters for the hook portion
- [ ] No emojis in body (rare exceptions for social captions)
---
## Output Format
When this skill is invoked (standalone or from podcast-production), produce this per clip:
```markdown
### [Clip Name]
**Timestamp range:** [MM:SS-MM:SS]
**On-screen hook options:**
1. "[Hook text]" *
2. "[Hook text]"
3. "[Hook text]"
4. "[Hook text]"
(* = recommended pick. List 3-4 options. Star goes on the strongest.)
**Caption (FB, TikTok, IG, LinkedIn):**
[Caption text]
**X variant:**
[Shorter caption with @handles]
```
**Keep it clean.** No category labels, no rationale paragraphs. The hooks should speak for themselves. Use the complementarity principle and quality gates internally when generating, but the output is just the options with a star.
---
## Sub-Agent Prompt Template
When podcast-production invokes this skill at Step 4, use this prompt:
```
You are generating on-screen text hooks for podcast clips.
Episode: [Guest Name]
Working directory: [path to prep/]
Read the following files:
- SOURCE.md (full transcript for context)
- EDITOR_HANDOFF.md (clips already selected in Sections 3 & 4)
For EACH clip in Sections 3 and 4, generate 3-4 on-screen hook options.
Mark the recommended pick with an asterisk (*).
Output format per clip:
1. "Hook text" *
2. "Hook text"
3. "Hook text"
4. "Hook text"
No category labels. No rationale paragraphs. Just the options with a star.
Internal rules (apply these but don't show them in output):
- On-screen text ADDS to audio, never repeats it (Complementarity Principle)
- Apply the First-3-Words Test to every option
- Apply McDonald's Test (instantly understandable)
- VARY THE LENGTH across options: at least one short punchy (2-5 words), one statement (6-8 words), and one narrative or full-quote (9+ words). Do NOT default to all short punchy hooks.
- Scan the Proven Hook Library in the skill for matching frameworks
- Framework fit the proven templates to the specific content
Write the hooks directly into the EDITOR_HANDOFF.md clip sections.
```
---
## Exemplar Channels (Study These)
### Podcast Clip Masters
- **Diary of a CEO** (@doac.clips) - Animated text, credential leads, emotional transitions. Opens with guest in chair + text hook. Text builds anticipation without giving away payoff.
- **My First Million** (clips channel) - Entrepreneurship hooks, "how to" + numbers format
- **Huberman Lab** (Essentials) - Dense science distilled to protocols. Direct promise hooks.
- **Lex Fridman Clips** - Topic-based text overlays for philosophical conversations
### Education/Parenting Creators
- **Dr. Becky Kennedy** (@drbeckyatgoodinside, 3M) - Named strategy hooks ("The REPAIR Strategy"), problem → solution format
- **Big Little Feelings** (@biglittlefeelings, 3.6M) - "When your toddler refuses to..." pain point hooks
- **Busy Toddler** (@busytoddler, 2.4M) - Action-oriented "This LEGO hack will..."
- **@thatcalteacherlife** (508K) - Working parents + homeschool
- **@deal_family** (352K) - Second-gen homeschool mom
- **@littlefenders** (132K) - "Redefining learning + parenting / Educating kids with AI / Unschooling / ADHD Mom"
### Key Patterns from Top Performers
1. **Text = Question, Audio = Answer** (Dr. Becky)
2. **Text = Named Framework, Audio = Explanation** (Dr. Becky)
3. **Text = Credential, Audio = Insight** (DOAC)
4. **Text = Problem, Audio = Solution** (Big Little Feelings)
5. **Text = Teaser, Audio = Payoff** (DOAC)
---
## Common Mistakes
- **All hooks the same length:** The #1 mistake. If every option is 3-5 word punchy headlines, you haven't explored the full range. Mix punchy with narrative with full-quote.
- **Labeling instead of complementing:** "Kids know." is a label. "A kindergartner can tell" is a hook.
- **Giving away the payoff in text:** If the text reveals the full insight, there's no reason to watch.
- **Generic first 3 words:** "Here's what happens" could be anything. "55% of teachers" is specific.
- **Only one hook per clip:** Always generate 3-4. The first idea is rarely the best.
- **Rewriting proven templates instead of adapting them:** The original examples have texture. Framework fit them to your content at runtime - don't pre-sanitize the library.
- **Hooks that need the clip to make sense:** The hook must work for a silent, scrolling viewer who hasn't heard a word yet.
- **Fancy vocabulary:** "Pedagogical paradigm shift" fails the McDonald's test. "Schools are broken" passes.
---
## Related Skills
- `podcast-production` - Invokes this skill at Step 4 (on-screen hook generation)
- `short-form-video` - Full production workflow (this skill handles text only)
- `text-content` - For text-only social posts (LinkedIn, X, not video)
- `youtube-title-creator` - Title + thumbnail (same complementarity principle, different application)
- `cold-open-creator` - Cold opens use [SWOOSH] transitions, not hooks
---
## Research References
- `Short-Form Video Text Hook Strategies.md` - Comprehensive report on hook psychology, typography, retention
- Andrew Muto notes: `Studio/_archive/Archive/Andrew Muto/` - Complementarity, McDonald's test, 15-min rule, hook categories
- Podcast-production skill improvement notes: `Studio/Podcast Studio/Amar-Kumar/prep/SKILL_IMPROVEMENT_NOTES.md`
---
*Rewritten Feb 6, 2026 after Amar Kumar podcast session. Incorporates complementarity principle, first-3-words test, 6 hook categories, SURP framework, and worked examples from real clips.*
This skill generates on-screen text hooks and captions for short-form video clips, built around the Complementarity Principle: on-screen text should add to the audio, not repeat it. It includes a podcast-clip workflow, hook categories, First-3-Words test, and Triple Word Score optimization to improve stop-rate and algorithmic relevance.
Scan the clip or transcript to find a hookable moment (surprising stat, provocative quote, emotional peak). Select a complementarity pattern (Question→Answer, Teaser→Payoff, Reframe→Evidence, etc.) and produce 3–5 short hook options (3–8 words). Run quality gates (First-3-Words, McDonald's Test, Gap Test) and mark one recommended pick. Then write a single short-form caption and an optional shorter X variant with handles and hashtags.
What length should on-screen hooks be?
Keep hooks to 3–8 words; the first three words must carry most of the punch.
How many hashtags should I use?
Use 10–12 hashtags spanning broad→mid→specific→niche→audience→platform for best discoverability.