home / skills / cdeistopened / opened-vault / video-generator
/.claude/skills/video-generator
This skill helps you generate short-form AI videos with synchronized audio using Veo 3.1, enabling fast social clips and promos.
npx playbooks add skill cdeistopened/opened-vault --skill video-generatorReview the files below or copy the command above to add this skill to your agents.
---
name: video-generator
description: Generate AI videos using Google VEO 3.1, OpenAI Sora, or Kling (via Fal.ai). Use this skill when content needs short-form video — social clips, promotional teasers, ambient backgrounds, or creative visual storytelling. Three providers for different strengths - VEO for native audio, Sora for visual quality, Kling for image-to-video animation and longer clips up to 15s.
---
# Video Generator
Generate professional short-form videos using Google VEO 3.1 (native audio), OpenAI Sora (visual quality, up to 12s), or Kling v3 Pro (image-to-video, up to 15s, native audio).
## Prerequisites & Setup
### API Keys
**VEO (Google):** Uses `GEMINI_API_KEY` (already in `~/.zshrc`).
**Sora (OpenAI):** Uses `OPENAI_API_KEY` (already in `OpenEd Vault/.env`).
**Kling (Fal.ai):** Uses `FAL_KEY` (in `~/.zshrc` and `OpenEd Vault/.env`).
```bash
export GEMINI_API_KEY=your_gemini_key_here
export OPENAI_API_KEY=your_openai_key_here
export FAL_KEY=your_fal_key_here
```
### Install Dependencies
```bash
pip install google-genai requests
```
### Available Models
| Provider | Model | CLI `--model` | Best For |
|----------|-------|---------------|----------|
| **VEO** | Veo 3.1 Standard | `standard` (default) | Quality, audio fidelity, final assets |
| **VEO** | Veo 3.1 Fast | `fast` | Drafts, iteration, quick previews |
| **Sora** | Sora 2 | `sora-2` (default) | Visual quality, creative motion |
| **Sora** | Sora 2 Pro | `sora-2-pro` | Highest Sora quality, slower |
| **Kling** | v3 Pro | `kling-v3-pro` (default) | Image-to-video, native audio, up to 15s |
| **Kling** | v3 Standard | `kling-v3-std` | Budget-friendly Kling |
| **Kling** | v2 Master | `kling-v2` | Stable, proven model |
| **Kling** | v1.5 Pro | `kling-v1.5` | Legacy, cheapest |
### When to Use Which
| Need | Use |
|------|-----|
| Native synchronized audio (dialogue, SFX) | **VEO** or **Kling v3** |
| Longest clips (up to 15 seconds) | **Kling v3** (VEO: 8s, Sora: 12s) |
| Image-to-video (animate a still) | **Kling** (best) or VEO |
| Higher visual fidelity / artistic styles | **Sora** - stronger on visual aesthetics |
| Fast iteration / drafts | **VEO Fast** - quickest turnaround |
| 4K resolution | **VEO** - Sora/Kling use fixed sizes |
| Negative prompts (exclude elements) | **VEO** or **Kling** - Sora doesn't support them |
| Square format (1:1) | **Kling** only |
### Video Parameters
| Parameter | VEO Options | Sora Options | Kling Options | Default |
|-----------|-------------|--------------|---------------|---------|
| **Duration** | 4, 6, 8s | 4, 8, 12s | v3: 3-15s, v2: 5/10s | 8s (VEO/Sora), 5s (Kling) |
| **Resolution** | 720p, 1080p, 4K | Fixed | Fixed | 720p |
| **Aspect Ratio** | 16:9, 9:16 | 16:9, 9:16 | 16:9, 9:16, 1:1 | 16:9 |
| **Count** | 1-4 | 1-4 | 1-4 | 1 |
| **Negative Prompt** | Yes | No | Yes | none |
| **Image Input** | Yes | No | Yes (best) | none |
| **Audio Generation** | Native | No | v3 only (`--audio`) | off |
### Kling Pricing (Fal.ai on-demand)
| Model | Rate |
|-------|------|
| v3 Pro (no audio) | ~$0.11/second ($0.56 for 5s) |
| v3 Pro (with audio) | ~$0.17/second ($0.84 for 5s) |
| v2 Master | ~$1.40 per 5s |
### Latency
Video generation is async — expect 11 seconds to 6 minutes depending on server load and provider. The script polls automatically and saves when ready.
---
## Workflow Overview
1. **Define the Concept** — What story does the video tell in 4-8 seconds?
2. **Storyboard the Shot** — Camera, motion, subject, environment
3. **Add Audio Direction** — Dialogue, sound effects, ambient sound
4. **Generate Video** — Run via API
5. **Iterate** — Adjust prompt based on results
## Prompt Rules (From Research)
Before diving into the workflow, internalize these rules from extensive testing across Veo, Runway, and Sora:
1. **150-300 characters is the sweet spot.** Under 100 = generic. Over 400 = the model drops elements unpredictably.
2. **One shot = one action.** Don't pack multiple scene changes or style shifts into one prompt. One camera move + one subject action.
3. **Describe what you want, not what you don't want.** Use the `--negative` flag for exclusions, not the main prompt.
4. **Treat audio as a separate layer.** Write audio cues in their own sentences, not mixed into visual descriptions.
5. **Use colon syntax for dialogue.** `A man says: "Hello!"` prevents subtitle artifacts. Without the colon, text may appear on screen.
6. **Keep dialogue under 7 words per line.** Longer speech causes lip-sync drift or rushed garbling.
7. **Start simple, then layer.** Begin with a basic prompt, evaluate, then add one variable at a time.
8. **Slow camera movements win.** Fast pans and spins break output. Use tight framing for perceived speed.
See `references/prompt-engineering-research.md` for the complete research.
---
## Step 1: Define the Concept
A good video prompt answers three questions:
- **What's happening?** (action/motion)
- **Where?** (environment/setting)
- **What does it sound like?** (audio landscape)
Unlike images, video is *temporal*. Think in terms of movement and change, not a static composition.
**Good concepts for 8-second clips:**
| Use Case | Example Concept |
|----------|----------------|
| Social teaser | A hand flipping through pages of a book, stopping on a highlighted passage |
| Ambient background | Rain falling on a window with city lights blurring behind it |
| Product reveal | Camera slowly orbits a product on a table, warm studio lighting |
| Podcast promo | A microphone in a cozy studio, coffee steam rising, morning light |
| Newsletter visual | A typewriter striking keys, with the sound of each keystroke |
## Step 2: Storyboard the Shot
Structure your prompt with cinematic language. Veo responds well to film terminology:
### Camera Language
| Term | Effect |
|------|--------|
| **Wide shot** | Shows full environment, establishes context |
| **Close-up** | Tight on a subject, emphasizes detail |
| **Tracking shot** | Camera follows subject movement |
| **Dolly in/out** | Camera moves toward or away from subject |
| **Static shot** | Locked camera, subject moves within frame |
| **Slow pan** | Camera rotates horizontally across scene |
| **Overhead / bird's eye** | Looking straight down |
| **Low angle** | Looking up at subject, adds drama |
### Motion Description
Be explicit about what moves and how:
| Don't | Do |
|-------|-----|
| "A dog in a park" | "A golden retriever runs toward camera through tall grass, ears bouncing" |
| "City at night" | "Camera slowly dollies through a neon-lit Tokyo alley as rain puddles reflect signs" |
| "Ocean" | "A single wave forms, curls, and crashes onto wet sand in slow motion" |
### Lighting & Atmosphere
| Term | Mood |
|------|------|
| Golden hour | Warm, nostalgic, cinematic |
| Overcast | Soft, even, contemplative |
| Neon / artificial | Urban, energetic, modern |
| Candlelight | Intimate, quiet |
| Hard shadows | Dramatic, high contrast |
## Step 3: Add Audio Direction
Veo 3.1 generates synchronized audio natively. This is a major differentiator — use it.
### Three Types of Audio Cues
**1. Dialogue** — Use colon syntax before quotes (prevents subtitle artifacts):
```
A barista says: "Here you go!" as she slides a latte across the counter.
```
Keep lines under 7 words for clean lip-sync. One sentence max per 8-second clip.
**2. Sound Effects** — Describe specific sounds:
```
The sound of a match striking, then a candle flame flickering to life.
```
**3. Ambient Sound** — Set the sonic environment:
```
Birds chirping in the background, distant traffic hum, morning atmosphere.
```
### Audio Tips
- Be specific: "the crunch of gravel underfoot" beats "footstep sounds"
- Layer audio: combine ambient + specific sounds for depth
- Match audio to motion: "a door creaks open" timed with the visual action
- Dialogue should be short — 1-2 sentences max for 8 seconds
## Step 4: Craft the Full Prompt
### Prompt Structure (5-Element Priority)
Structure prompts in this order of priority — you don't need all five every time:
1. **Shot Specification** — camera work, framing, movement
2. **Setting & Atmosphere** — location, time, weather, lighting
3. **Subject & Action** — who/what, described in beats
4. **Audio Layer** — dialogue, SFX, ambient (separate sentences)
5. **Style/Grade** — artistic treatment, lens, color
```
[Shot type + camera movement]. [Setting and lighting]. [Subject doing action].
[Audio: what you hear]. [Style/grade].
```
### Example Prompts
**Podcast promo (16:9):**
```
A close-up tracking shot of a vintage microphone in a warmly lit podcast studio.
Steam rises slowly from a coffee mug beside it. Morning sunlight filters through
blinds, casting soft stripes across the desk. The sound of a quiet room — a clock
ticking, the faint hum of equipment. Cinematic, intimate, inviting.
```
**Social teaser — vertical (9:16):**
```
A hand reaches into frame and opens a leather-bound journal on a wooden desk.
The pages flutter briefly before settling on a page covered in handwritten notes.
A pen is set down beside the book. The sound of pages rustling, a pen clicking,
and soft ambient music. Warm overhead lighting, shallow depth of field.
```
**Newsletter header — ambient loop (16:9):**
```
A static wide shot of rain falling on a large window. Behind the glass, a blurred
cityscape with warm lights. Water droplets slide slowly down the pane. The sound of
steady rain and distant muffled city noise. Moody, contemplative, cozy.
```
**Product reveal (16:9):**
```
Camera slowly orbits a pair of wireless headphones placed on a dark marble surface.
Dramatic studio lighting with a single warm key light from the left. The headphones
cast a sharp shadow. Subtle electronic ambient music. Premium, minimal, modern.
```
## Step 5: Generate via API
### Running the Script
```bash
# VEO: Basic generation (8s, 720p, 16:9)
python scripts/generate_video.py "Your prompt here"
# VEO: Fast draft for iteration
python scripts/generate_video.py "Your prompt here" --model fast
# VEO: High quality vertical video for social
python scripts/generate_video.py "Your prompt" --aspect 9:16 --resolution 1080p
# VEO: Multiple variations to choose from
python scripts/generate_video.py "Your prompt" --count 2 --output ./videos
# VEO: Short clip with specific settings
python scripts/generate_video.py "Your prompt" --duration 4 --resolution 4k --name "hero-clip"
# VEO: Exclude unwanted elements
python scripts/generate_video.py "Your prompt" --negative "text overlays, watermarks, blurry"
# Sora: Basic generation
python scripts/generate_video.py "A cat on a windowsill, warm light" --provider sora
# Sora: 12-second clip (longer than VEO allows)
python scripts/generate_video.py "A dog running through a meadow" --provider sora --duration 12
# Sora: Pro model, vertical
python scripts/generate_video.py "Latte art being poured" --provider sora --model sora-2-pro --aspect 9:16
# Sora: Multiple variations
python scripts/generate_video.py "Ocean waves at sunset" --provider sora --count 2 --output ./videos
# Kling: Text-to-video (v3 Pro, 5s default)
python scripts/generate_video.py "A golden retriever running through a field at sunset" --provider kling
# Kling: Longer clip with audio
python scripts/generate_video.py "Ocean waves crashing on rocks" --provider kling --duration 10 --audio
# Kling: Image-to-video (animate a still image)
python scripts/generate_video.py "Character slowly turns to camera and smiles" --provider kling --input photo.jpg
# Kling: Image-to-video from URL
python scripts/generate_video.py "Zoom slowly into the scene" --provider kling --input https://example.com/image.jpg
# Kling: Budget model, square format
python scripts/generate_video.py "Abstract patterns flowing" --provider kling --model kling-v3-std --aspect 1:1
# Kling: Vertical for Reels/TikTok
python scripts/generate_video.py "Coffee being poured" --provider kling --aspect 9:16 --duration 5
```
**Options:**
| Flag | Values | Default | Notes |
|------|--------|---------|-------|
| `--provider` | `veo`, `sora`, `kling` | `veo` | VEO for audio, Sora for visuals, Kling for image-to-video |
| `--model` | VEO: `standard`/`fast`, Sora: `sora-2`/`sora-2-pro`, Kling: `kling-v3-pro`/`kling-v3-std`/`kling-v2`/`kling-v1.5` | provider default | Provider-specific models |
| `--aspect` | `16:9`, `9:16`, `1:1` | `16:9` | 1:1 Kling only |
| `--resolution` | `720p`, `1080p`, `4k` | `720p` | VEO only |
| `--duration` | VEO: 4/6/8, Sora: 4/8/12, Kling v3: 3-15, Kling v2: 5/10 | 8 (VEO/Sora), 5 (Kling) | |
| `--negative` | text | none | VEO + Kling (ignored by Sora) |
| `--input` | path or URL | none | Reference image for image-to-video (VEO, Kling) |
| `--audio` | flag | off | Enable native audio (Kling v3 only) |
| `--count` | `1`-`4` | `1` | Generate variations |
| `--output` | path | `.` | Save directory |
| `--name` | text | none | Filename prefix |
**Output:** MP4 files with timestamp-based filenames.
## Step 6: Iterate
After reviewing generated video:
- **Motion wrong?** Be more explicit about direction, speed, and sequence
- **Audio off?** Add or refine audio cues — the model needs clear direction
- **Too much happening?** Simplify. One clear action per clip works best
- **Style drift?** Add a negative prompt to exclude unwanted aesthetics
- **Wrong mood?** Adjust lighting and atmosphere descriptors
### Iteration Strategy
1. Start with `--model fast` and `--duration 4` for quick drafts
2. Refine the prompt through 2-3 fast iterations
3. Switch to `--model standard` with full duration/resolution for the final take
4. Generate 2 variations of the final prompt and pick the best
---
## Negative Prompt Guide
Use `--negative` to steer away from common problems:
| Problem | Negative Prompt |
|---------|-----------------|
| Text/watermarks appearing | "text, watermarks, logos, subtitles" |
| Uncanny faces | "distorted faces, morphing features" |
| Jittery motion | "jerky motion, flickering, stuttering" |
| Over-saturated look | "oversaturated, HDR, neon colors" |
| Stock footage feel | "generic, corporate, stock footage aesthetic" |
---
## Prompting Principles
### Think in Shots, Not Scenes
8 seconds is one shot. Don't try to cram a narrative arc — describe a single continuous moment.
| Don't | Do |
|-------|-----|
| "A chef makes a meal from scratch and serves it" | "A chef's hands julienne carrots on a wooden cutting board, knife moving rhythmically" |
| "A day at the beach from sunrise to sunset" | "Waves gently lap at bare feet on sand, golden hour light, camera at ground level" |
### Be Specific About Motion
Vague motion descriptions produce vague results. Describe *what moves*, *how fast*, and *in which direction*.
### Layer Your Audio
Don't just describe one sound — create a soundscape:
```
The crackling of a vinyl record playing soft jazz,
a distant car horn outside the window,
the quiet clink of an ice cube in a glass.
```
### Use Negative Prompts Proactively
Always include `--negative "text, watermarks"` at minimum. The model occasionally generates unwanted text overlays.
---
## Use Cases by Content Type
### Social Media (9:16, 4-8s)
Short, punchy, loop-friendly. Favor close-ups and strong motion.
```bash
python scripts/generate_video.py "Close-up of coffee being poured into a ceramic mug, steam rising, warm morning light. The sound of liquid pouring and a soft sigh." \
--aspect 9:16 --duration 4 --resolution 1080p
```
### Podcast/Newsletter Headers (16:9, 8s)
Ambient, atmospheric. Favor wide shots and subtle motion.
```bash
python scripts/generate_video.py "A vintage radio on a wooden shelf, dial slowly turning. Warm tungsten light. Soft static transitioning into faint music." \
--resolution 1080p --name "podcast-header"
```
### Product/Brand (16:9, 6-8s)
Clean, controlled, premium feel. Studio lighting, slow orbits.
```bash
python scripts/generate_video.py "Camera slowly orbits a leather notebook on a dark wood desk. Single warm key light. The sound of pages turning gently." \
--resolution 4k --duration 6 --negative "text, watermarks, busy background"
```
---
## Multi-Clip Consistency
When generating multiple clips for a project (e.g. a social series, product launch, or multi-shot sequence):
### Lock Your Constants
Create a consistency block and repeat it verbatim across all prompts:
```
CHARACTER: A woman in her thirties with short silver hair and a black turtleneck
PALETTE: amber, cream, walnut brown, deep olive
LIGHTING: Soft key light from camera right, warm tungsten
STYLE: Cinematic, shallow depth of field, warm film grain
NEGATIVE: no subtitles, no on-screen text, no watermarks
```
### Frame Chaining
For sequential shots, use the last frame of clip N as the reference image for clip N+1. This preserves subject orientation, lighting continuity, and motion vectors.
### Consistency Checklist
- [ ] Same character description, word for word — never paraphrase between shots
- [ ] Same palette anchors (3-5 named colors)
- [ ] Same lighting direction and quality
- [ ] Same aspect ratio and resolution
- [ ] Same style/grade language
- [ ] "No subtitles, no on-screen text" included
- [ ] Simple wardrobe — solid colors and notable anchors (red jacket, silver pendant) are more consistent than busy patterns
---
## Related Skills
- **nano-banana-image-generator** — Static AI images (Gemini). Generate stills to animate with Kling's image-to-video.
- **youtube-title-creator** — Pair video content with optimized titles
- **text-on-broll** — Combine AI video with on-screen text (Remotion)
---
## Prompt Engineering References
Two references available:
- **`references/ai-video-prompt-engineering-guide.md`** — Comprehensive synthesis from 7 sources (Captions.ai, LTX Studio, Google Veo, Adobe Firefly, getimg.ai, community threads). Covers prompt anatomy, camera/lighting/audio language, reusable templates, common mistakes, use-case playbooks, and a quick-reference cheat sheet.
- **`references/prompt-engineering-research.md`** — Original VEO-focused research with testing results.
**Load the guide** when you need to craft a prompt and want the full reference. The 30-second checklist from the guide:
1. WHO/WHAT is the subject? (appearance details)
2. What are they DOING? (dynamic verb + pacing)
3. WHERE? (setting + time + weather)
4. CAMERA? (shot type + movement + lens)
5. LIGHT? (source + quality + color)
6. MOOD/STYLE? (genre + color grade)
7. What do we HEAR? (music + ambience + dialogue + SFX)
8. FORMAT? (aspect ratio + duration + platform)
This skill generates short-form AI videos with synchronized native audio using Google's Veo 3.1 API. It follows an iterative workflow: storyboard the shot, craft a concise prompt with separate audio cues, generate via the API, then refine until the result matches the concept. The tool supports fast drafts and high-quality final renders with configurable duration, aspect, and resolution.
You provide a single-shot prompt that specifies camera work, setting, subject action, audio layers, and optional style notes. The script calls Veo 3.1 (standard or fast) using your GEMINI_API_KEY, polls for the asynchronous result, and saves MP4 assets. Iteration is encouraged: run fast drafts to refine prompts and switch to the standard model for final outputs.
What API key do I need?
Use the GEMINI_API_KEY from Google AI Studio. The same key used for image generation works for Veo 3.1.
How long does generation take?
Generation is asynchronous and ranges from ~11 seconds (fast drafts) to several minutes depending on model, duration, and server load.
How do I avoid text or watermarks appearing?
Include a negative prompt such as "text, watermarks, logos, subtitles" and keep the visual prompt focused on a single action.