home / skills / leegonzales / aiskills / veo3-prompter

veo3-prompter skill

This skill crafts professional Veo 3.1 prompts with cinematic structure, audio direction, and timestamp choreography to produce polished AI videos.

npx playbooks add skill leegonzales/aiskills --skill veo3-prompter

Review the files below or copy the command above to add this skill to your agents.

Files (8)

SKILL.md

8.0 KB

---
name: veo3-prompter
description: Craft professional video prompts for Google Veo 3.1 using cinematic techniques, audio direction, and timestamp choreography. Use when generating AI videos, creating video prompts, or working with Veo 3.
---

# Veo 3.1 Video Prompter

Transform ideas into professional Veo 3.1 prompts using cinematic structure, audio direction, and multi-shot choreography.

## When to Use

Invoke when user:
- Says "create a video prompt" or "generate a Veo prompt"
- Wants to "make a video of..." or "animate this..."
- Asks for help with "video generation" or "AI video"
- Needs "Veo 3" or "Veo 3.1" prompt assistance
- Wants to create "multi-shot" or "cinematic" video sequences

## Core Prompt Formula

**[Cinematography] + [Subject] + [Action] + [Context] + [Style & Audio]**

Every prompt should address these five elements for maximum control.

## Prompt Density: Finding the Sweet Spot

Prompts fail in two directions:
- **Too sparse:** Model fills gaps unpredictably, you lose creative control
- **Too dense:** Model can't execute all instructions, produces confused output

### The Priority Framework

**Tier 1 - MUST INCLUDE (model needs these):**
- Shot size (wide/medium/close-up)
- Subject identity (who/what is in frame)
- Primary action (what happens)
- One dominant mood/style word

**Tier 2 - SHOULD INCLUDE (significant impact):**
- Camera movement OR angle (pick one, not both)
- Lighting quality (natural/dramatic/soft)
- One audio layer (dialogue OR SFX OR ambient)
- Setting/environment

**Tier 3 - NICE TO HAVE (diminishing returns):**
- Secondary audio layers
- Specific lens type
- Color palette details
- Film stock/grain texture
- Background action

**Rule of thumb:** Include all Tier 1, most of Tier 2, and 1-2 from Tier 3.

### Density Comparison

**TOO SPARSE (model guesses too much):**
> "A professor talking about philosophy"

**TOO DENSE (model overloaded):**
> "Medium close-up shot at eye level with a 50mm lens at f/1.8 creating shallow depth of field with bokeh highlights, of a 52-year-old female professor with silver-streaked auburn hair pulled back in a loose bun, wearing an olive tweed jacket with leather elbow patches over a cream silk blouse with a small pearl brooch, standing in a contemporary lecture hall with tiered mahogany seating and brass fixtures visible in the soft background, natural diffused daylight streaming through floor-to-ceiling windows on the left side creating soft rembrandt lighting on her face with a gentle fill from reflected light on the right..."

**OPTIMAL (directed but breathable):**
> "Medium close-up of a professor in her 50s, tweed jacket, standing in a university lecture hall. She gestures while speaking: 'Kant asked one question: could everyone do this?' Warm natural window light from left, soft academic atmosphere. SFX: marker on whiteboard."

### Calibration Signals

**Signs your prompt is too sparse:**
- Results vary wildly between generations
- Key elements missing or wrong
- Mood/tone inconsistent with intent

**Signs your prompt is too dense:**
- Model ignores some instructions entirely
- Unnatural or frozen-looking motion
- Conflicting elements appear (e.g., both day and night)
- Audio doesn't match visual action

### Iteration Strategy

1. **Start with Tier 1 only** - generate test
2. **Add Tier 2 elements** that matter most to your vision
3. **Add ONE Tier 3 detail** if something specific is missing
4. **Remove any element the model consistently ignores**

See `references/prompt-calibration.md` for detailed examples and troubleshooting.

## Cinematography Elements

### Shot Composition
- Wide shot, medium shot, close-up, extreme close-up
- Single shot, two shot, over-the-shoulder shot
- High angle, low angle, eye level, worm's eye, bird's eye

### Camera Movement
- Dolly (in/out), tracking shot, crane shot
- Pan (left/right), tilt (up/down), zoom
- Steadicam, handheld, aerial, POV

### Lens & Focus
- Shallow depth of field, deep focus
- Wide-angle lens, telephoto, macro lens
- Soft focus, rack focus, bokeh

## Audio Direction

Veo 3.1 generates synchronized sound. Direct it explicitly:

**Dialogue** (use quotes):
> "A man says, 'The storm is coming.'"

**Sound Effects** (label with SFX):
> "SFX: Thunder rumbles in the distance, rain patters on glass"

**Ambient Noise**:
> "Ambient noise: busy café chatter, clinking cups, soft jazz"

**Music**:
> "A swelling orchestral score begins to play"

## Timestamp Prompting

For multi-shot sequences within one generation (max 8 seconds):

```
[00:00-00:02] Medium shot of a detective at his desk, lighting a cigarette.
SFX: Match strike, paper rustling.

[00:02-00:04] Close-up of his eyes narrowing as he reads a letter.
Ambient: Rain against the window.

[00:04-00:06] Reverse shot of a shadowy figure in the doorway.
A woman's voice: "You shouldn't have looked."

[00:06-00:08] Wide shot as the detective stands, reaching for his gun.
SFX: Chair scraping, thunder crack.
```

## Style Keywords

**Visual Aesthetic:**
- Photorealistic, cinematic, documentary, animation
- Retro (sepia, grainy film, 1980s vaporwave)
- Noir, epic fantasy, sci-fi, romantic, horror

**Mood & Lighting:**
- Warm golden hour, cool blue tones, moody shadows
- Harsh fluorescent, soft morning light, dramatic chiaroscuro
- Neon-lit, candlelit, overcast diffused

**Film Grain Tip:**
> Add "slightly grainy, film-like" to avoid overly clean AI look

## Output Formats

**Quick Prompt:** Single sentence for simple shots
**Structured Prompt:** Multi-line with all five elements
**Timestamp Sequence:** Choreographed multi-shot within 8s
**Storyboard Mode:** Multiple prompts for full narrative

## Example Prompts

**Action Shot:**
> "Tracking shot following a parkour athlete sprinting across rooftops at sunset, warm orange light, urban cityscape background, cinematic, shallow depth of field. SFX: footsteps on concrete, wind rushing past."

**Dialogue Scene:**
> "Medium two-shot in a dimly lit bar, a woman in red leans toward a man in a suit. She says quietly, 'I know what you did.' Ambient: jazz music, glasses clinking. Moody noir aesthetic, warm tungsten lighting."

**Nature Documentary:**
> "Slow-motion close-up of a hummingbird drinking from a flower, macro lens with shallow focus, lush green garden background, soft morning light. SFX: gentle buzzing, birdsong."

## Technical Specs

- **Duration:** 4, 6, or 8 seconds
- **Resolution:** 720p or 1080p
- **Aspect Ratio:** 16:9 (landscape) or 9:16 (portrait)
- **Frame Rate:** Configurable (default: 24 FPS)

## Advanced API Options

When using Veo through API (not Flow), these additional parameters are available:

| Parameter | Description | Default |
|-----------|-------------|---------|
| `negativePrompt` | Elements to exclude from the video | - |
| `seed` | RNG seed for reproducible results (same prompt + seed = same video) | Random |
| `enhancePrompt` | Let the model rewrite your prompt for better results | false |
| `generateAudio` | Generate synchronized audio | true |
| `personGeneration` | Control person generation: `dont_allow` or `allow_adult` | - |
| `referenceImages` | Up to 3 asset images OR 1 style image for consistency | - |

### Negative Prompts

Explicitly exclude unwanted elements:
> "A forest at sunset" + negativePrompt: "people, animals, buildings"

### Seed for Consistency

Use the same seed to reproduce similar results:
> First generation: seed=12345 → video A
> Same prompt + seed=12345 → nearly identical video

Useful for:
- Iterating on a specific "look"
- Creating variations with controlled changes
- A/B testing different prompts

### Reference Images

Maintain visual consistency across shots using reference images:

**Asset References (up to 3):**
- Character appearances
- Locations/settings
- Props or products

**Style References (1):**
- Overall aesthetic
- Color palette
- Visual treatment

## References

- `references/prompt-calibration.md` - Finding the right detail level
- `references/cinematography-glossary.md` - Full camera terms
- `references/prompt-examples.md` - 20+ categorized examples
- `references/advanced-workflows.md` - Image-to-video, first/last frame

Overview

This skill crafts professional prompts for Google Veo 3.1 that combine cinematic structure, audio direction, and timestamp choreography. It translates a creative idea into a compact, high-control prompt format that Veo can execute reliably. The output emphasizes shot composition, prioritized detail, and synchronized sound for short AI-generated video clips.

How this skill works

I apply a five-part formula—Cinematography + Subject + Action + Context + Style & Audio—to produce prompts that balance guidance and freedom. Prompts are calibrated using a three-tier priority system so essential elements are always present while avoiding overload. For multi-shot sequences I produce timestamped choreography (up to 8 seconds) with explicit SFX, dialogue, and ambient layers. I also format quick one-liners, structured multi-line prompts, and storyboard-style sets for iterative work.

When to use it

Generating AI videos with Veo 3 or Veo 3.1
Creating multi-shot, cinematic or documentary-style short clips
Preparing timestamped choreography for synchronized sound and action
Prototyping scenes that need repeatable output with a fixed seed
Translating a script or concept into production-ready prompt instructions

Best practices

Always include Tier 1 elements: shot size, subject identity, primary action, and a dominant mood/style word
Add one camera movement OR one camera angle, plus one lighting note from Tier 2
Start with Tier 1 only for tests, then add Tier 2 items that matter most
Limit Tier 3 details to 1–2 items to avoid over-constraining the model
Use explicit audio labels: Dialogue in quotes, SFX:, Ambient:, Music:
When iterating, keep seed constant to reproduce results and change one variable at a time

Example use cases

Quick Prompt: one-line cinematic shot for a marketing teaser
Structured Prompt: multi-line scene with specified camera move and single SFX
Timestamp Sequence: 4–8s choreographed cut with per-shot audio cues
Storyboard Mode: multiple related prompts for sequential scene planning
Nature doc-style close-up with macro lens, ambient birdsong, and slow motion

FAQ

How long should a Veo prompt be?

Aim for concise prompts that include all Tier 1 elements and 1–2 Tier 2 items; single-line quick prompts or structured 3–6 line prompts work best.

What if the model ignores instructions?

Remove the least important detail, rerun with Tier 1+critical Tier 2, or use a seed and iterate changing one element at a time.

Can I generate longer scenes?

Veo 3.1 supports short outputs (4, 6, or 8 seconds); for longer narratives use storyboard mode or stitch multiple generations together.