home / skills / pexoai / pexo-skills / veo-3.2-prompter

veo-3.2-prompter skill

This skill crafts optimized Veo 3.2 prompts from multimodal assets, generating a structured JSON with final prompt, references, and settings for cinematic

npx playbooks add skill pexoai/pexo-skills --skill veo-3.2-prompter

Review the files below or copy the command above to add this skill to your agents.

Files (3)

SKILL.md

3.8 KB

---
name: veo-3.2-prompter
description: >
  Expert prompt engineering for Google Veo 3.2 (Artemis engine). Use when the user wants to generate a video with Veo 3.2, needs help crafting cinematic prompts, or mentions Veo, Google video generation, or Artemis engine.
version: 0.1.0
author: wells
tags: [video, generation, prompt, veo, google, artemis, cinematic]
---

# Veo 3.2 Prompt Designer Skill

This skill transforms a user's scattered multimodal assets (images, videos, audio) and creative intent into a structured, executable prompt for the Google Veo 3.2 video generation model (Artemis engine). It acts as an expert prompt engineer, ensuring the highest quality output from the underlying model.

## When to Use

- When the user provides assets (images, videos, audio) for video generation with Veo 3.2.
- When the user's request is complex and requires careful prompt construction for the Veo model.
- When using any Google Veo 3.x model for video generation.

## Core Function

This skill analyzes all user inputs and generates a single, optimized JSON object containing the final prompt and recommended parameters. The internal workflow (Recognition, Mapping, Construction) is handled automatically and should not be exposed to the user.

### Internal Workflow

1. **Phase 1: Recognition** — Analyze uploaded assets and user intent. Use the `atomic_element_mapping.md` to classify each asset into its atomic element role(s).
2. **Phase 2: Mapping** — For each atomic element, determine the optimal reference method (reference image, text prompt, or hybrid). Use the mapping table to decide.
3. **Phase 3: Construction** — Assemble the final prompt using the 5-Part Framework (Shot → Subject → Environment → Camera → Style) and attach reference images via the Gemini API's `RawReferenceImage` system.

## Usage Example

**User Request:** "Make a cinematic shot of this perfume bottle rotating on a dark surface, like a luxury commercial."
*User uploads `perfume.png`*

**Agent using `veo-3.2-prompter`:**
*The agent internally processes the request and assets, then outputs the final JSON to the next skill in the chain.*

**Final Output (for internal use):**
```json
{
  "final_prompt": "Hero shot, a frosted glass perfume bottle with gold cap rotating slowly on a reflective dark surface, three-point studio lighting with soft key and rim light creating subtle caustics, smooth 180-degree arc, hyper-realistic luxury commercial style with shallow depth of field. Crystalline chime, soft ambient pad.",
  "reference_images": [
    {
      "file": "perfume.png",
      "reference_type": "SUBJECT"
    }
  ],
  "recommended_parameters": {
    "model": "veo-3.2-generate",
    "duration_seconds": 8,
    "aspect_ratio": "16:9",
    "resolution": "1080p",
    "generate_audio": true
  }
}
```

## Veo 3.2 Key Differentiators

| Feature | Capability |
|---|---|
| Engine | Artemis — world-model physics simulation (not pixel prediction) |
| Max duration | ~30s native continuous generation |
| Audio | Native dialogue + synchronized SFX |
| Reference images | Up to 3 (`STYLE`, `SUBJECT`, `SUBJECT_FACE`) |
| Video extension | Chain clips via previous video input |
| First/last frame | Specify start and/or end keyframes |
| Resolutions | 720p, 1080p, 4K (with upscaling) |
| Aspect ratios | 16:9, 9:16 |

## Knowledge Base

This skill relies on an internal knowledge base to make informed decisions. The agent MUST consult these files during execution.

- **`references/atomic_element_mapping.md`**: **Core Knowledge**. Contains the "Asset Type → Atomic Element" and "Atomic Element → Optimal Reference Method" mapping tables, adapted for Veo 3.2's reference image system.
- **`references/veo_syntax_guide.md`**: Veo 3.2 Gemini API syntax reference, covering `RawReferenceImage`, `GenerateVideosConfig`, video extension, and first/last frame specification.

Overview

This skill transforms scattered user assets and creative intent into an optimized, executable prompt package for Google Veo 3.2 (Artemis engine). It acts as an expert prompt engineer, producing a single JSON payload that includes a polished natural-language prompt, reference image mappings, and recommended generation parameters. The output is ready to feed into the Veo generation pipeline.

How this skill works

The skill analyzes uploaded images, video clips, audio, and the user's textual direction, classifying each asset into atomic element roles (subject, style, environment, etc.). It chooses the best reference method for each element (reference image, text, or hybrid) and assembles a concise 5-part prompt: Shot → Subject → Environment → Camera → Style. Finally, it bundles the prompt with reference image descriptors and suggested Veo parameters (model, duration, aspect ratio, resolution, audio flags) in one JSON object.

When to use it

You have images, clips, or audio to incorporate into a Veo 3.2 video generation request.
You need a cinematic, production-ready prompt for Google Veo (Artemis) rather than a simple text cue.
You want optimal reference-image placement (STYLE, SUBJECT, SUBJECT_FACE) to influence Veo results.
You need recommended generation settings (duration, aspect ratio, resolution, audio) matched to the creative brief.
You plan to chain clips, define first/last keyframes, or use native Veo audio features.

Best practices

Upload clear, high-quality reference images for subjects and styles; label files with intent (e.g., subject.png, mood.jpg).
Provide concise creative intent: desired emotion, pacing, and any camera moves (e.g., 180° arc, slow dolly).
Specify hard constraints early: exact duration, aspect ratio, or required first/last frames to avoid rework.
Accept the skill's recommended parameters but iterate by adjusting one variable at a time (lighting, camera, or tempo).
Include any critical audio references or SFX descriptions if synchronized audio matters.

Example use cases

Create a luxury product reel: user supplies a product photo and asks for a rotating hero shot with cinematic lighting.
Turn a short interview clip and a location image into a polished 10s scene with matched environment and ambient audio.
Generate a social vertical ad: provide logo, brand mood image, 9:16 aspect ratio, and request punchy pacing and motion graphics.
Extend an existing Veo clip: provide the previous video and describe the continuation, plus desired transition frames.

FAQ

What does the final JSON include?

A polished natural-language prompt, mapped reference images with roles, and recommended Veo generation parameters (model, duration, aspect ratio, resolution, audio).

How many reference images should I provide?

Up to three targeted references are optimal (STYLE, SUBJECT, SUBJECT_FACE). Quality and relevance matter more than quantity.

Can the skill set audio and SFX?

Yes — it can recommend native audio generation and include brief SFX/music descriptors to sync with the visual prompt.