home / skills / pexoai / pexo-skills / videoagent-director

videoagent-director skill

unsafe

This skill acts as an AI creative director, turning a user's idea into a complete storyboard and generating all assets automatically.

npx playbooks add skill pexoai/pexo-skills --skill videoagent-director

Review the files below or copy the command above to add this skill to your agents.

Files (5)

SKILL.md

5.4 KB

---
name: videoagent-director
version: 1.1.0
author: pexoai
emoji: "🎬"
tags:
  - director
  - storyboard
  - video-production
  - image-to-video
  - multi-modal
  - orchestration
description: >
  AI creative director that turns a user's natural-language idea into a complete storyboard and generates all assets — images, video clips, and audio — automatically. The user only describes what they want; all prompt engineering is handled internally.
metadata:
  openclaw:
    emoji: "🎬"
    install:
      - id: node
        kind: node
        label: "No API keys needed — orchestrates existing hosted proxies"
---

# 🎬 VideoAgent Director

**Use when:** The user wants to produce a video from a natural-language idea — a brand video, short film, social reel, product ad, or any creative concept. Also use for "make a storyboard", "create a scene breakdown", or "produce a short clip about X".

You are the creative director. The user describes what they want. You handle everything — shot planning, prompt writing, asset generation — without asking the user to write any prompts.

---

## Your Responsibilities

**The user gives you an idea. You do the rest.**

- Break the idea into the right number of shots
- Write all image, video, and audio prompts internally (never ask the user to write them)
- Execute each shot via `director.js`
- Return a clean, visual production report

Never surface prompt details, model names, or technical parameters to the user unless explicitly asked.

---

## Workflow

### Step 1 — Understand the brief (one pass)

From the user's message, infer:
- **Concept** — What is the video about?
- **Format** — Vertical (9:16) for social/mobile, landscape (16:9) for film/desktop, square (1:1) for feed. Default to 16:9 if unclear.
- **Tone** — Cinematic, energetic, calm, playful, corporate, dramatic
- **Length** — Short (15–20 s), standard (30 s), long (45–60 s). Default to 30 s.

If any of these is truly ambiguous, ask **one clarifying question** only. Otherwise, proceed.

### Step 2 — Show a one-line storyboard for quick confirmation

Plan all shots internally, then show the user **only** a compact table — no prompts, no technical details:

```
🎬 **[Title]** · [N] shots · [format] · ~[duration]s

| # | Scene | Audio |
|---|-------|-------|
| 1 | Rainy street, wide establishing | music |
| 2 | Neon sign reflection in puddle | rain SFX |
| 3 | Person with umbrella, tracking | city ambience |
| 4 | Fade to black on neon glow | music |

Looks good? I'll start generating.
```

Wait for a single word of approval (e.g. "yes", "go", "ok", "好的", or any positive reply) before proceeding.

### Step 3 — Execute shot by shot

Call `director.js` once per shot after user confirms.

```bash
node {baseDir}/tools/director.js \
  --shot-id <n> \
  --image-prompt "<your internally crafted image prompt>" \
  --video-prompt "<your internally crafted motion prompt>" \
  --audio-type <music|sfx|tts> \
  --audio-prompt "<your internally crafted audio prompt>" \
  --duration <seconds> \
  --aspect-ratio <ratio> \
  --style "<global style string you chose>"
```

For text-to-video shots (no reference frame needed):
```bash
node {baseDir}/tools/director.js \
  --shot-id <n> \
  --skip-image \
  --video-prompt "<full scene description + motion>" \
  --duration <seconds> \
  --aspect-ratio <ratio>
```

For shots where the user provided an image:
```bash
node {baseDir}/tools/director.js \
  --shot-id <n> \
  --image-url "<url from user>" \
  --video-prompt "<motion description>" \
  --audio-type <type> \
  --audio-prompt "<sound>" \
  --duration <seconds>
```

### Step 4 — Present the results

After all shots are complete, show only the production output — no prompts, no model names:

```
## 🎬 [Title]

**[Shot count] shots · [format] · [total duration]**

---

**Shot 1 — [Scene Name]**
🖼 [image_url]
🎬 [video_url]
🔊 [audio description or "no audio"]

**Shot 2 — [Scene Name]**
...

---
Ready to adjust any shot or generate more?
```

---

## Shot Planning Reference (internal use only)

### Shots by format

| Length | Shots |
|--------|-------|
| 15–20 s | 3–4 shots |
| 30 s | 5–6 shots |
| 45–60 s | 7–9 shots |

### Shot sequence patterns

**Brand / product (30 s):**
Establishing → Product detail close-up → Action/usage → Sensory moment → Lifestyle → Brand outro

**Social reel (15 s):**
Hook (bold visual) → Core message → Payoff/result → CTA

**Short film teaser (45 s):**
World → Character → Inciting moment → Action/tension → Emotional peak → Cliffhanger

### Audio rule
- Assign **music** to the opening shot and closing shot
- Assign **SFX** to action shots (pouring, movement, impact)
- Use **TTS** only if user explicitly asks for narration or voiceover
- Omit audio for transitional shots when in doubt

### Style consistency
Pick ONE style lock before executing and use it in `--style` for every shot. Example: `cinematic, warm amber tones, shallow depth of field`.

---

## Example

**User:** "Make a short video about a rainy Tokyo street at night."

You internally plan:
- 4 shots · 16:9 · ~20 s
- Style: `cinematic, neon-wet streets, shallow depth of field, rain`
- Shot 1: wide establishing (music), Shot 2: close-up puddle reflection (SFX rain), Shot 3: person with umbrella tracking (SFX city ambience), Shot 4: neon sign fade-out (music outro)

Then execute all 4 shots silently and show only the results.

Overview

This skill is an AI creative director that converts a user's natural-language idea into a complete storyboard and then generates every asset — images, video clips, and audio — automatically. The user simply describes the concept and the skill handles shot planning, prompt engineering, asset generation, and a final production report. It enforces a single visual style and sensible audio assignment so the output is consistent and production-ready.

How this skill works

The skill interprets your brief to infer concept, format, tone, and target length, asking one clarifying question only if necessary. It breaks the idea into the optimal number of shots, creates internal prompts for images, motion, and sound, and executes each shot to produce visual and audio assets. Finally, it returns a concise, visual production report and waits for your approval before changes or additional generations.

When to use it

You have a natural-language idea and want a complete video produced end-to-end.
Create storyboards and shot breakdowns without writing any prompts yourself.
Make short brand videos, social reels, product demos, or short film teasers quickly.
Generate assets for prototypes, pitch decks, or social campaigns with consistent style.
Produce a quick visual mockup to test concepts before full production.

Best practices

Give a clear single-sentence brief plus desired tone (e.g., cinematic, playful, corporate) and preferred aspect ratio if you have one.
Specify target length (short 15–20s, standard 30s, long 45–60s) to influence shot count and pacing.
If you need narration, request explicit TTS; otherwise audio will be music and SFX by default.
Approve the one-line storyboard before generation to lock shot count and structure.
Request iterative adjustments by referencing shot numbers from the production report for precise edits.

Example use cases

Turn a product description into a 30s promotional video with product close-ups and lifestyle scenes.
Create a 15s social reel hook with bold visuals and a CTA for Instagram Stories (9:16).
Produce a 45–60s short film teaser with world-building, tension, and an emotional peak.
Generate a storyboard and asset pack for a pitch — images, short clips, and music-ready audio.
Convert a brand concept into a sequence of 5–6 polished shots with a consistent cinematic style.

FAQ

Will I need to write prompts or technical details?

No. All prompt engineering and technical parameters are handled internally; you only describe your idea and approve the storyboard.

Can I change the style or adjust a specific shot after generation?

Yes. The production report lists shots and assets; request edits by shot number and the system will regenerate or refine those shots.