home / skills / xiangyu-cas / vision-skills / video-generation

video-generation skill

safe

This skill generates and extends Gemini Veo 3.1 videos from text or images using the Python SDK, with configurable resolution, duration, and prompts.

npx playbooks add skill xiangyu-cas/vision-skills --skill video-generation

Review the files below or copy the command above to add this skill to your agents.

Files (13)

SKILL.md

2.5 KB

---
name: video-generation
description: Gemini video generation with Veo 3.1 via the Python SDK. Use when generating videos from text or images, using reference images, first/last frame interpolation, or video extension, and when tuning Veo parameters (aspect ratio, resolution, duration, negative prompts, personGeneration, seed).
---

# Video Generation with Gemini (Veo 3.1)

Use this skill when the user asks to generate or extend videos with Gemini using the Python SDK.
Default to `veo-3.1-fast-generate-preview`, `resolution="720p"`, and `duration_seconds=4`, unless the user asks otherwise or the task requires different settings (e.g., extension, interpolation, reference images, 1080p/4k).

## Workflow

1) Identify the task type: text-to-video, image-to-video, reference images, first/last frames (interpolation), or video extension.
2) Ensure `GEMINI_API_KEY` is available (env or local `.env`), then use the Python SDK.
3) When using images, pass `types.Image(imageBytes=..., mimeType=...)` (not `PIL.Image` or `types.Part`) to avoid input type errors.
4) Call `client.models.generate_videos(...)` with the correct inputs/config (see references).
5) Poll the operation until `done`, then download and save the video.
6) If no videos are returned, surface a clear error and suggest checking the API key, model, and config.

## Use these references (by task type)

- Common setup and workflow: `references/overview.md`
- Parameters and constraints: `references/parameters.md`
- Model versions and limits: `references/model-versions-and-limitations.md`
- Prompting guidance: `references/prompt-guide.md`

### Task types

- Text-to-video: `examples/text-to-video.md`
- Image-to-video: `examples/image-to-video.md`
- Reference images: `examples/reference-images.md`
- First/last frames (interpolation): `examples/first-last-frames.md`
- Video extension: `examples/video-extension.md`

### Tuning examples

- Aspect ratio: `examples/aspect-ratio.md`
- Resolution (4k): `examples/resolution.md`
- Negative prompt: `examples/negative-prompt.md`

## Defaults and notes

- Default model: `veo-3.1-fast-generate-preview`.
- Default output: 720p, 4 seconds.
- For image inputs, always provide `imageBytes` + `mimeType` via `types.Image` to prevent `INVALID_ARGUMENT` errors.
- 1080p/4k, reference images, interpolation, and video extension require `duration_seconds=8`.
- Video extension is limited to 720p inputs and requires a video from a previous Veo generation.
- Video generation can take minutes; allow longer timeouts when running commands.

Overview

This skill enables video generation and extension using Gemini Veo 3.1 via the Python SDK. It provides a clear workflow and sensible defaults for text-to-video, image-to-video, reference-image guidance, interpolation between first/last frames, and video extension tasks. Use it to tune resolution, aspect ratio, duration, negative prompts, person generation, and seed for repeatable outputs.

How this skill works

The skill inspects the requested task type and configures the Gemini Veo 3.1 generation call with appropriate inputs and parameters. It ensures image inputs are passed as raw bytes with mime types, calls client.models.generate_videos(...), polls the long-running operation until completion, and saves returned videos. If generation fails or returns no outputs, it surfaces actionable errors and troubleshooting tips.

When to use it

Generate videos from text prompts (text-to-video).
Create videos from single or multiple images (image-to-video).
Guide generation using reference images or interpolate between first/last frames.
Extend an existing Veo-generated video with additional frames (video extension).
Tune generation parameters: resolution, aspect ratio, duration, negative prompts, personGeneration, or seed.

Best practices

Default to model veo-3.1-fast-generate-preview, 720p resolution, 4 seconds unless higher settings are requested or required.
Always pass images as types.Image(imageBytes=..., mimeType=...) to avoid INVALID_ARGUMENT errors.
Use duration_seconds=8 when requesting 1080p/4k, interpolation, reference-image guidance, or video extension.
Ensure GEMINI_API_KEY is available in environment or local .env and validate it before long runs.
Allow several minutes and longer timeouts for generation; polling may take time depending on settings and resolution.

Example use cases

Turn a short story prompt into a 4–8 second animated clip for social media.
Create a smooth interpolation between two key frames provided as reference images.
Extend a previously generated Veo video to add new scenes or longer runtime.
Generate a 4k promo clip by increasing resolution and duration, while adjusting negative prompts to remove unwanted elements.
Produce consistent variations by setting a fixed seed and toggling personGeneration for character control.

FAQ

What input format must images use?

Provide images as raw bytes with a mimeType via types.Image(imageBytes=..., mimeType=...) rather than PIL.Image or other wrappers.

When should I use duration_seconds=8?

Use 8 seconds for higher resolutions (1080p/4k), reference-image workflows, interpolation, and video extension—these tasks require the longer duration.