home / skills / cnemri / google-genai-skills / veo-build

veo-build skill

safe

This skill helps you generate and edit videos with Veo 2 and Veo 3 models using text prompts, image prompts, and advanced controls.

npx playbooks add skill cnemri/google-genai-skills --skill veo-build

Review the files below or copy the command above to add this skill to your agents.

Files (6)

SKILL.md

2.0 KB

---
name: veo-build
description: Create and edit videos using Google's Veo 2 and Veo 3 models. Supports Text-to-Video, Image-to-Video, Inpainting, and Advanced Controls.
---

# Veo Video Generation and Editing

This skill provides comprehensive workflows for using Google's Veo models (Veo 2 and Veo 3) via the `google-genai` Python SDK.

## Quick Start Setup

All Veo operations require the `google-genai` library and an authenticated client with Vertex AI enabled.

```python
from google import genai
from google.genai import types
import os

PROJECT_ID = os.environ.get("GOOGLE_CLOUD_PROJECT")
LOCATION = os.environ.get("GOOGLE_CLOUD_REGION", "us-central1")

client = genai.Client(vertexai=True, project=PROJECT_ID, location=LOCATION)
```

## Reference Materials

- **[Generation (Veo 3)](references/generation.md)**: Text-to-Video, Image-to-Video.
- **[Editing (Veo 2)](references/editing.md)**: Inpainting, Masking.
- **[Advanced Controls](references/advanced.md)**: Frame Interpolation, Video Extension, Reference Images.
- **[Prompting Guide](references/prompting.md)**: Camera angles, visual styles, and best practices.
- **[Source Code](references/source_code.md)**: Deep inspection of SDK internals (`models.py`, `types.py`).

## Available Workflows

### 1. Video Generation (Veo 3)
Create new videos from text or image prompts.
- **Text-to-Video**: Create videos from detailed text descriptions.
- **Image-to-Video**: Animate static images.
- **Prompt Engineering**: Optimization keywords for camera, lighting, and style.

### 2. Video Editing (Veo 2)
Modify existing videos using masks (Inpainting).
- **Remove Objects**: Erase dynamic or static objects.
- **Insert Objects**: Add new elements into a scene.

### 3. Advanced Controls (Veo 3)
Specialized generation tasks for precise control.
- **Frame Interpolation**: Generate video bridging two images (first & last frame).
- **Video Extension**: Extend the duration of an existing video clip.
- **Reference-to-Video**: Use specific asset images (subjects, products) to guide generation.

Overview

This skill lets you create and edit videos using Google's Veo 2 and Veo 3 models via the google-genai Python SDK. It supports Text-to-Video, Image-to-Video, inpainting/masking edits, and advanced controls like frame interpolation and video extension. The workflows focus on practical prompts, precise control, and repeatable pipelines for generation and editing.

How this skill works

You authenticate a Vertex AI client with the google-genai library and call Veo model endpoints to generate or edit video assets. Veo 3 is used primarily for text-to-video and image-to-video generation, while Veo 2 handles inpainting and mask-based edits. Advanced controls provide frame interpolation, extending clips, and guiding generation with reference images or detailed prompt attributes.

When to use it

Create marketing or product demo videos from text prompts rapidly
Animate a static illustration or photograph into motion
Remove or replace objects inside an existing clip via masks
Bridge two images into a smooth animated sequence (frame interpolation)
Extend the length or context of a short video clip
Generate consistent assets using reference images for subjects or products

Best practices

Authenticate with Vertex AI and set project/location environment variables before running workflows
Design prompts with camera, lighting, and style keywords for predictable results
Use high-quality reference images and tightly defined masks for precise edits
Test generation settings (fps, resolution, length) on short clips before scaling
Iterate: try multiple prompts and small variations to refine visual output

Example use cases

Text-to-Video: produce a 10–20 second product teaser from a single descriptive prompt
Image-to-Video: animate a brand illustration into a short looping social clip
Inpainting: remove a boom mic or passersby from a location shoot using a mask
Frame Interpolation: create a smooth video that transitions between two keyframes
Video Extension: expand a 6-second scene to a longer sequence while retaining style

FAQ

What do I need to run this skill?

A google-genai-enabled Python environment, Vertex AI access, and project/location environment variables configured.

Which model should I pick for generation vs editing?

Use Veo 3 for new video generation (text-to-video, image-to-video) and Veo 2 for mask-based edits and inpainting.