home / skills / leegonzales / aiskills / essay-to-speech

essay-to-speech skill

/EssayToSpeech/essay-to-speech

This skill converts essays into engaging spoken presentations, preserving meaning while producing a clear talk track ready for slides and delivery.

npx playbooks add skill leegonzales/aiskills --skill essay-to-speech

Review the files below or copy the command above to add this skill to your agents.

Files (6)
SKILL.md
11.2 KB
---
name: essay-to-speech
description: Transform written essays into spoken word presentations while preserving source material. Use when adapting essays for verbal delivery, creating talk tracks, or preparing content for presentation slides.
---

# Essay to Speech

Transform written essays into spoken word presentations. Outputs both the original text and talk track in connected chunks, ready for downstream slide generation.

## When to Use

Invoke when user:
- Wants to turn an essay into a presentation talk track
- Needs to adapt written content for verbal delivery
- Is preparing a speech from written material
- Uses `/essay-to-speech` command

## Transformation Intensity

Default to **full** transformation unless the essay is already conversational.

| Mode | When to Use | Approach |
|------|-------------|----------|
| **Full** (default) | Academic, formal, or dense prose | Aggressive rewrite for natural speech |
| **Light** | Already conversational, personal voice | Preserve author's voice, minimal changes |

**Auto-detect**: If the essay uses "I", contractions, and short sentences, use light mode. If it uses passive voice, complex clauses, and formal language, use full mode.

## Core Process

### 1. Segment the Essay

Break the essay into atomic chunks based on:
- **Existing structure first**: Honor headings, sections, paragraph breaks
- **Argument units**: Each chunk = one coherent point or idea
- **Slide-sized thinking**: Could this chunk support one slide?

Typical segmentation:
- Introduction → Opening hook chunk
- Each major section → 1-3 chunks depending on density
- Conclusion → Landing chunk

### 2. Transform Each Chunk

Convert written prose to spoken language:

| Written Pattern | Spoken Pattern |
|----------------|----------------|
| "This essay examines..." | "Today I want to share..." |
| "As previously mentioned..." | "Remember when I said..." |
| "It is important to note that..." | "Here's what matters..." |
| "In conclusion, this paper has demonstrated..." | "So what does this mean for you?" |
| Complex nested clauses | Shorter, punchier sentences |
| Passive voice | Active voice |
| Academic hedging | Confident assertions |
| Dense paragraphs | Breathing room, varied rhythm |

#### Statistical Notation → Plain English

**Critical**: Convert all statistical notation to spoken-friendly language. Nobody says "rho equals negative 0.91" on stage.

| Written (Academic) | Spoken (Natural) |
|-------------------|------------------|
| "ρ = -0.91" | "almost a perfect inverse—when one goes up, the other goes down" |
| "β = 0.27, p < 0.001" | "a significant effect—this isn't random chance" |
| "95% CI: 0.01-1.29" | "we can be confident this relationship is real" |
| "ΔELPD = 50.9 (SE = 10.2)" | "the statistical evidence was overwhelming" |
| "coefficient: 0.65" | "a strong positive relationship" |
| "r² = 0.73" | "this explains most of the variation" |
| "n = 667" | "nearly 700 people" |

Keep the *meaning* of statistics, drop the *notation*. The audience needs to understand the insight, not verify the math.

#### Rhythm and Breath

Vary sentence length deliberately. Pattern: **Long → Short → Medium**

**Too uniform (written)**:
> The researchers found that Theory of Mind predicted collaboration. The correlation was significant. The effect was strong.

**Varied rhythm (spoken)**:
> The researchers found something surprising about what predicts AI collaboration success. Theory of Mind. Not technical skill—a social skill.

**Breath points**: Insert natural pauses by breaking at:
- After a key insight (let it land)
- Before a contrast ("But here's the thing...")
- After rhetorical questions (let audience think)

### 3. Assess Images (if present)

For essays containing images, apply a **critical eye**—not everything works on a slide at 20 feet.

**Detection**: Identify all images referenced in the essay (inline, figures, diagrams)

**Assessment questions** (be skeptical):
- Can the audience read this from the back row?
- Does this make sense without the surrounding text?
- Is this a data dump or a clear visual?
- Would a simpler version communicate better?

**Common issues to flag**:
| Image Type | Common Problem | Likely Rating |
|------------|----------------|---------------|
| Scatter plots | Too many points, tiny labels | ADAPT or RECREATE |
| Tables | Text-heavy, not visual | RECREATE as chart |
| Screenshots | Low resolution, cluttered | ADAPT (crop) or SKIP |
| Flowcharts | Too many boxes, small text | ADAPT (simplify) |
| Bar/pie charts | Usually fine if not too busy | USE or ADAPT |
| Conceptual diagrams | Often good | USE |
| Decorative images | No information value | SKIP |

**Ratings**:
- `USE` - Genuinely presentation-ready (rare for academic figures)
- `ADAPT` - Good concept, needs work (most common)
- `RECREATE` - Valuable data, wrong format (tables, dense plots)
- `SKIP` - Doesn't add value to spoken presentation

**Be honest**: Rating everything "USE" isn't helpful. Most academic figures need adaptation.

### 4. Preserve the Connection

**Critical**: Output BOTH versions for each chunk. The original is never modified.

## Output Format

### Filename Convention

Generate a logical filename from the essay title:
- `{slugified-title}-presentation.md`
- Drop common stop words (the, a, an, in, of, and, or) for concise filenames
- Example: "The Future of Remote Work" → `future-remote-work-presentation.md`

### Document Structure

```markdown
# [Essay Title]: Presentation Version

**Source**: [Original essay title/description]
**Chunks**: [N sections]
**Generated**: [Date]
**Mode**: [Full/Light transformation]

---

## Section 1: [Descriptive Title]

### Original
[Verbatim essay text for this section - unchanged]

### Talk Track
[HOOK] Let me start with a question: what if everything you thought you knew was wrong?

[KEY_POINT] The real issue isn't what people say—it's what they actually do.

[EVIDENCE] A recent study found a 50% gap between reported intentions and actual behavior.

[TRANSITION] Now that we understand the problem, let's look at what we can do about it.

### Images
- `figure1.png`: USE - Clear diagram, good resolution
- `chart2.png`: ADAPT - Needs larger labels for projection

### Slide Ideas
- [Suggested visual, chart type, or slide concept]
- [Alternative approach or supporting visual]

---

## Section 2: [Descriptive Title]

### Original
[Verbatim essay text for this section]

### Talk Track
[Tagged spoken content...]

### Slide Ideas
- [Visual suggestions for this chunk]

---

[Continue for all sections...]
```

### Talk Track Semantic Tags

Use tags to mark **key structural moments**—not every sentence.

| Tag | Purpose | Example |
|-----|---------|---------|
| `[HOOK]` | Opening attention-grabber | `[HOOK] Let me ask you something...` |
| `[KEY_POINT]` | Core argument or insight | `[KEY_POINT] This changes everything about...` |
| `[EVIDENCE]` | Data, examples, proof | `[EVIDENCE] Nearly 700 people were tested...` |
| `[STORY]` | Narrative or anecdote | `[STORY] I met a manager who...` |
| `[TRANSITION]` | Bridge between ideas | `[TRANSITION] So that's the problem. Now let's talk solutions.` |
| `[CALLBACK]` | Reference to earlier point | `[CALLBACK] Remember that study I mentioned?` |
| `[LANDING]` | Section or final conclusion | `[LANDING] And that's why this matters.` |
| `[CTA]` | Call to action | `[CTA] Starting tomorrow, I want you to...` |

**Tagging guidelines**:
- 2-4 tags per section is typical
- Don't tag every paragraph—it becomes noise
- Untagged sentences flow naturally between tagged moments
- Tags help the slide-builder know where visual anchors belong

### Images Section (when applicable)

If the original essay contains images, each chunk includes a `### Images` section:

```markdown
### Images
- `fig1-market-share.png`: USE - Clean pie chart, large labels, works as-is
- `fig2-scatter.png`: ADAPT - Good data but axis labels too small, needs enlargement
- `fig3-table.png`: RECREATE - Important data but tables don't work on slides, convert to horizontal bar chart
- `header-decorative.jpg`: SKIP - Decorative only, no value for presentation
```

**No images?** Omit the `### Images` section entirely for text-only chunks.

### Slide Ideas Section

Each chunk includes a `### Slide Ideas` section with 1-3 suggestions:
- Chart/graph types with specific data to visualize
- Key quote or stat callouts
- Comparison frameworks (before/after, A vs B)
- Visual metaphors or imagery concepts
- "No slide needed" if the chunk is purely transitional

When original images exist, slide ideas should reference them:
- "Use `fig1-market-share.png` as primary visual"
- "Recreate `fig3-table.png` as horizontal bar chart showing top 5 only"

These are suggestions, not requirements—the slide-builder skill makes final decisions.

## Delivery Markup (Optional)

When user requests annotated output, add delivery cues:
- `[PAUSE]` - Breath/emphasis pause (after key insights, before contrasts)
- `*word*` - Vocal emphasis
- `[SLOW]` / `[FASTER]` - Pacing shifts
- `[LOOK UP]` - Eye contact moment

## What This Skill Does NOT Do

- Design slides or visuals (that's the next skill)
- Edit or modify images (assesses only, slide-builder handles modifications)
- Fundamentally rewrite arguments or add new content
- Coach on public speaking technique
- Change the essay's core message or stance

## Best Practices

1. **Preserve meaning**: The talk track conveys the same arguments, just spoken
2. **Respect structure**: Don't arbitrarily merge or split the author's sections
3. **Natural chunking**: Each section should feel like a complete thought
4. **Opening matters**: Transform bland thesis statements into hooks
5. **Landing matters**: End with impact, not "in conclusion"
6. **Kill the jargon**: Statistical notation, acronyms, and academic language don't belong in speech
7. **Vary the rhythm**: Mix sentence lengths—monotony kills engagement
8. **Be a critical friend on images**: Most academic figures need work; say so

## Example Transformation

**Original (written)**:
> The study found ToM strongly predicted collaborative ability with AI (coefficient: 0.65, 95% CI: 0.01-1.29). Translation: higher Theory of Mind scores meant better AI collaboration, and we can be 95% confident this relationship is real—the confidence interval stays positive.

**Talk Track (full transformation)**:
> [EVIDENCE] The study found a strong link between Theory of Mind and AI collaboration success. And this isn't a maybe—the statistics are clear. Higher Theory of Mind means better results with AI.
>
> [KEY_POINT] We can be confident this relationship is real.

**What changed**:
- "coefficient: 0.65, 95% CI: 0.01-1.29" → "a strong link" + "the statistics are clear"
- "confidence interval stays positive" → "we can be confident this relationship is real"
- Complex sentence broken into two
- Same meaning, speakable language

### Images
- `correlation-scatter.png`: ADAPT - Data points clear but axis labels "ToM Score" and "κ (Collaborative Ability)" need plain English labels

### Slide Ideas
- Single insight slide: "Theory of Mind → Better AI Collaboration"
- Visual: simplified scatter with trend line, labeled "Social Skill" → "AI Success"

## References

- `references/transformation-patterns.md` - Detailed written→spoken patterns
- `references/examples.md` - Full before/after essay transformations

Overview

This skill transforms written essays into spoken-word presentations while preserving the original source text. It produces paired chunks: the verbatim original and a conversational, slide-ready talk track. Outputs are organized for downstream slide generation, including image assessments and slide ideas.

How this skill works

The skill segments the essay into atomic, slide-sized chunks honoring headings and argument units. For each chunk it leaves the original text unchanged and generates a spoken-style talk track with semantic tags (HOOK, KEY_POINT, EVIDENCE, TRANSITION, LANDING, etc.). It also detects and rates images (USE, ADAPT, RECREATE, SKIP) and suggests slide concepts and delivery cues.

When to use it

  • Turning an essay into a presentation talk track
  • Adapting formal or academic prose for verbal delivery
  • Preparing a speech from written material for slides
  • Creating slide-friendly segments with image assessments
  • Generating paired source + spoken outputs for editors or presenters

Best practices

  • Default to full transformation for dense or formal prose; use light mode only if voice is already conversational
  • Preserve the original text verbatim alongside the talk track — never modify the source
  • Chunk by coherent idea and slide-sized thinking; keep each chunk a complete thought
  • Convert statistical notation to plain English while preserving meaning
  • Use 2–4 semantic tags per section and vary sentence rhythm for breath and emphasis
  • Be honest about image usability—most academic figures need adaptation

Example use cases

  • Convert a research paper into a 10-slide talk with speaker notes and slide suggestions
  • Prepare a conference talk from a thesis chapter, keeping original text for citations
  • Create a narrated slide script for executive briefings using clear, punchy language
  • Assess embedded figures in an essay and flag which to recreate or skip
  • Generate filenames and structured markdown ready for a slide-builder pipeline

FAQ

Does the skill change the essay’s arguments?

No. The talk track rewrites for speech but preserves the original meaning and argument; the original text is always included verbatim.

How are statistics handled?

All statistical notation is translated into plain spoken language that communicates the insight without math notation (e.g., "nearly 700 people" instead of "n = 667").