home / skills / leegonzales / aiskills / slide-builder

slide-builder skill

/SlideBuilder/slide-builder

This skill converts essay-to-speech output into complete slide decks with multiple formats, enabling quick presentation creation from spoken content.

npx playbooks add skill leegonzales/aiskills --skill slide-builder

Review the files below or copy the command above to add this skill to your agents.

Files (10)
SKILL.md
8.9 KB
---
name: slide-builder
description: Transform essay-to-speech output into complete presentations with multiple output formats. Use when converting talk tracks to slides, generating presentation decks, or creating video-ready content from spoken word material.
---

# Slide Builder

Transform essay-to-speech output into complete, presentation-ready slide decks with multiple output format support (HTML, Remotion video, PowerPoint).

## When to Use

Invoke when user:
- Has essay-to-speech output and wants slides
- Says "create slides from this talk track"
- Needs to "build a presentation" from spoken content
- Wants to convert a talk track to video format
- Uses `/slide-builder` command
- Asks for "presentation slides" from transformed essay content

## Prerequisites

**Input required:** Output from the `essay-to-speech` skill containing:
- `### Original` sections (verbatim essay text)
- `### Talk Track` sections with semantic tags
- `### Images` sections with ratings (USE/ADAPT/RECREATE/SKIP)
- `### Slide Ideas` suggestions

## Core Process

### 1. Parse Essay-to-Speech Output

Extract structured data from each section:

```
Section → {
  title: string,
  original: string,
  talkTrack: TaggedContent[],
  images: ImageAssessment[],
  slideIdeas: string[]
}
```

**Semantic tags to identify:**
- `[HOOK]` - Opening attention-grabber → Title/hook slide
- `[KEY_POINT]` - Core argument → Statement slide
- `[EVIDENCE]` - Data/proof → Data visualization slide
- `[STORY]` - Narrative → Story/quote slide
- `[TRANSITION]` - Bridge → Section divider or no slide
- `[CALLBACK]` - Reference → Recap element
- `[LANDING]` - Conclusion → Summary slide
- `[CTA]` - Call to action → Action slide

### 2. Plan Slide Deck

Map semantic tags to slides:

| Tag | Slide Type | Typical Visual |
|-----|------------|----------------|
| `[HOOK]` | Title/Opening | Bold statement, striking image |
| `[KEY_POINT]` | Statement | Single phrase, minimal graphic |
| `[EVIDENCE]` | Data | Chart, statistic callout, comparison |
| `[STORY]` | Story | Photo, quote attribution, timeline |
| `[TRANSITION]` | Divider (optional) | Section title, progress indicator |
| `[CALLBACK]` | Recap | Reference to earlier slide |
| `[LANDING]` | Summary | Key takeaways, visual recap |
| `[CTA]` | Action | Contact info, next steps, QR code |

**Slide count heuristic:**
- 1-2 slides per `[KEY_POINT]`
- 1 slide per `[EVIDENCE]` block
- Section dividers are optional (skip for tight decks)
- Target: 1 slide per 45-60 seconds of speaking

### 3. Handle Images

Process image assessments from essay-to-speech:

| Rating | Action |
|--------|--------|
| `USE` | Include directly in slide |
| `ADAPT` | Note modifications needed (enlarge labels, crop, simplify) |
| `RECREATE` | Generate Nano Banana prompt for new visual |
| `SKIP` | Do not include |

**For RECREATE images:**
Generate a Nano Banana prompt following these guidelines:
- 16:9 aspect ratio for slides
- Clear, simple compositions
- Large readable text/labels
- Brand colors if specified

Example RECREATE prompt:
```
"Clean horizontal bar chart comparing 5 items, minimal style,
white background, teal (#557373) bars, large bold labels,
no gridlines, presentation-ready, 16:9 aspect ratio"
```

### 4. Generate Output

## Output Format: Talk Track v5

The primary output format for presentations.

### Structure

```yaml
# Talk Track v5 frontmatter
version: 5
title: "Presentation Title"
subtitle: "Optional Subtitle"
author: "Presenter Name"
date: "2025-01-15"
target_minutes: 15
audio_voice: "af_heart"
brand:
  primary: "#557373"
  background: "#F2EFEA"
  text: "#0D0D0D"
sections:
  - id: opening
    name: "Opening"
    color: "#557373"
  - id: problem
    name: "The Problem"
    color: "#6B8E6B"
  - id: solution
    name: "The Solution"
    color: "#C4785A"
  - id: closing
    name: "Closing"
    color: "#557373"
---

## Slides

| # | Slug | Title | Image | Section |
|---|------|-------|-------|---------|
| 1 | hook | The Question | hook.png | opening |
| 2 | problem-1 | What's Broken | problem-chart.png | problem |
| 3 | evidence | The Data | evidence.png | problem |
| 4 | solution | A New Approach | solution.png | solution |
| 5 | action | Your Next Step | cta.png | closing |

---

## [hook] The Question

![The Question](images/hook.png)

<!-- AUDIO -->
[HOOK] Let me ask you something that might change how you think about this entire problem...

What if everything you believed was based on outdated assumptions?
<!-- /AUDIO -->

**Speaker Notes:**
- Pause after the question
- Make eye contact with audience
- Let the tension build

---

## [problem-1] What's Broken

![What's Broken](images/problem-chart.png)

<!-- AUDIO -->
[KEY_POINT] The current approach fails in three critical ways.

[EVIDENCE] First, efficiency drops by 40% when teams scale past 10 people. Second, communication overhead grows exponentially. Third, institutional knowledge gets siloed.
<!-- /AUDIO -->

**Speaker Notes:**
- Point to chart as you mention each stat
- Emphasize "exponentially"

---
```

### Format Rules

1. **YAML Frontmatter** - Metadata, timing, voice, sections
2. **Slide Index Table** - Quick reference for all slides
3. **Individual Slides** - Each with:
   - H2 header: `## [slug] Title`
   - Image reference (if applicable)
   - `<!-- AUDIO -->` block with talk track
   - `**Speaker Notes:**` for presenter context

### Audio Block Format

Content between `<!-- AUDIO -->` and `<!-- /AUDIO -->` is:
- Read aloud by TTS engines
- Preserves semantic tags for timing hints
- Excludes speaker notes and visual descriptions

### Timing Calculation

Estimate duration based on word count:
- Speaking rate: 130-150 words/minute
- Add 2-3 seconds per slide transition
- Add pause time for `[PAUSE]` markers

## Voice Options

### Development: Kokoro TTS (via claude-speak)

- **Local, free, fast iteration**
- Invoke: `/claude-speak` skill with audio block text
- Voice: `af_heart` (default) or specify in frontmatter
- Output: Local audio file per slide or full presentation

### Production: ElevenLabs v3

- **Word-level timestamps** for precise video sync
- **Higher quality** for final distribution
- Requires ElevenLabs API key
- Output: Audio + JSON timing data

See `references/voice-options.md` for full configuration.

## Alternative Output Formats

### HTML Slide Engine

Static HTML presentation with:
- Keyboard navigation (arrows, space)
- Speaker notes toggle (N key)
- Timer display
- Print to PDF support

See `references/html-engine.md` for template.

### Remotion Video

Export to React-based video for:
- YouTube/social publishing
- Embedded animations
- Precise audio sync with timestamps

See `references/remotion-video.md` for project setup.

## Workflow

### Standard Flow

```
essay-to-speech output
        ↓
   [slide-builder]
        ↓
   Talk Track v5 (.md)
        ↓
   ┌─────┼─────┐
   ↓     ↓     ↓
  HTML  Video  Audio
```

### Quick Start

1. **Input:** Provide essay-to-speech output
2. **Review plan:** Claude proposes slide structure
3. **Confirm or adjust:** Modify slide count, sections, visuals
4. **Generate:** Claude outputs Talk Track v5 markdown
5. **Images:** Generate RECREATE images via Nano Banana
6. **Audio:** Generate voice via claude-speak or ElevenLabs
7. **Render:** Export to HTML, video, or both

## Best Practices

### Slide Design Principles

1. **One idea per slide** - Split dense content
2. **6 words or less** on screen - The rest is spoken
3. **High contrast** - Readable from back row
4. **Consistent visual language** - Same fonts, colors, style
5. **Images > bullet points** - Visual storytelling wins

### Talk Track Integration

1. **Audio is king** - Slides support speech, not replace it
2. **Match pacing** - Visual changes align with spoken transitions
3. **Build reveals** - Don't show everything at once
4. **Breathing room** - Not every sentence needs a slide change

### Image Guidance

For RECREATE images, always specify:
- Aspect ratio (16:9 for slides)
- Style (clean, minimal, professional)
- Key data to visualize
- What to AVOID (clutter, small text, decorative elements)

## What This Skill Does NOT Do

- Edit or create original essay content (that's essay-to-speech)
- Design custom graphics (use Nano Banana for that)
- Record actual audio (use claude-speak or ElevenLabs)
- Render final video (use Remotion or video editor)
- Create PowerPoint/Keynote files directly (exports markdown)

## Integration

**Upstream:**
- `essay-to-speech` - Provides structured input

**Downstream:**
- `nano-banana` - Generates RECREATE images
- `claude-speak` - Generates audio narration
- `veo3-prompter` - Creates video segments (if needed)

## References

- `references/talk-track-v5.md` - Complete format specification
- `references/html-engine.md` - Static HTML slide player
- `references/remotion-video.md` - React video export setup
- `references/voice-options.md` - TTS configuration and comparison
- `references/image-handling.md` - Full image processing workflow
- `references/examples.md` - Complete input→output examples

Overview

This skill transforms essay-to-speech output into presentation-ready slide decks and video-ready assets. It converts talk tracks, semantic tags, and image assessments into a Talk Track v5 markdown package and exportable outputs (HTML, Remotion video, PPT-ready assets). Use it to move from spoken content to polished slides quickly.

How this skill works

The skill parses essay-to-speech output by extracting sections (Original, Talk Track, Images, Slide Ideas) and recognizing semantic tags like [HOOK], [KEY_POINT], [EVIDENCE], and [CTA]. It maps tags to slide types, applies slide count heuristics and timing estimates, and processes image ratings (USE/ADAPT/RECREATE/SKIP) including generating Nano Banana prompts when needed. Final output is a Talk Track v5 markdown file with frontmatter, slide index, individual slides (with audio blocks) and timing guidance for TTS or video exports.

When to use it

  • You have essay-to-speech output and need a slide deck quickly
  • You want to convert a talk track or transcript into presentation slides
  • You need a video-ready presentation with synced narration
  • You want structured markdown (Talk Track v5) for downstream rendering
  • You need image prompts generated for visuals that must be recreated

Best practices

  • Feed complete essay-to-speech output including Images and Slide Ideas sections
  • Keep one idea per slide; use the tag-to-slide mapping to avoid dense slides
  • Aim for 45–60 seconds per slide for pacing and simple timing math
  • Use RECREATE prompts with 16:9, large labels, and brand colors for clarity
  • Review the generated slide plan before exporting to video or HTML

Example use cases

  • Turn a recorded keynote talk track into a speaker-ready slide deck and narration
  • Convert a training module transcript into timed slides for e-learning video
  • Generate presentation markdown and Nano Banana image prompts for designers
  • Produce HTML slides with speaker notes for in-person rehearsals
  • Export synchronized audio and timing JSON for Remotion video rendering

FAQ

What input do I need to run the skill?

Provide the essay-to-speech output that contains ### Original, ### Talk Track, ### Images, and ### Slide Ideas sections.

Can the skill produce final video files?

It outputs Talk Track v5 and assets for Remotion or HTML; rendering final video is done downstream (Remotion or video editor).