home / skills / xiangyu-cas / vision-skills / image-generation

image-generation skill

This skill generates and edits images with Gemini via Python SDK, supporting text-to-image, edits, and multi-reference composition.

npx playbooks add skill xiangyu-cas/vision-skills --skill image-generation

Review the files below or copy the command above to add this skill to your agents.

Files (3)

SKILL.md

1.4 KB

---
name: image-generation
description: Gemini image generation and editing skill for text-to-image, image-to-image edits, multi-reference composition, and Google Search grounding. Use when creating or modifying images via Gemini (default model gemini-3-pro-image-preview) with the Python SDK.
---

# Image Generation with Gemini

Use this skill when the user asks to generate or edit images with Gemini using the Python SDK. Default to `gemini-3-pro-image-preview`, and mention `gemini-2.5-flash-image` only as an optional faster/cheaper alternative.

## Workflow

1) Identify task type (text-to-image, edit, or multi-reference).
2) Ensure `GEMINI_API_KEY` is available (env or stored in `.env`), then use the Python SDK. This will make network requests to the Gemini API
3) Choose model + output (`response_modalities=["IMAGE"]` if image-only) and run. Generation can take ~30 seconds; allow 30–60 seconds before retrying.
4) Save returned images with `part.as_image()`; if none, report a clear error.

## Use these references

- `references/python.md` for Python SDK usage

## Response handling (Python SDK)

Use `part.as_image()` to access image outputs and save them. If no image parts are returned, surface a clear error and suggest checking the API key, model name, and response modalities.

## Timing note

Image generation may take around 30 seconds. When running commands via the shell tool, set a longer timeout (e.g., 60–120 seconds) to avoid premature timeouts.

Overview

This skill enables image generation and editing with Gemini using the Python SDK. It defaults to the gemini-3-pro-image-preview model and supports text-to-image, image-to-image edits, multi-reference composition, and Google Search grounding. Use it to create or modify images programmatically and to handle returned image parts reliably.

How this skill works

The skill detects the task type (text-to-image, edit, or multi-reference) and issues requests to Gemini via the Python SDK. It sets the appropriate model and response modalities (use response_modalities=["IMAGE"] for image-only outputs), waits for the response (generation can take ~30 seconds), and saves outputs using part.as_image(). If no images are returned, the skill surfaces a clear error and troubleshooting hints.

When to use it

Generate new images from text prompts (text-to-image).
Edit existing images or run image-to-image transformations.
Compose scenes from multiple reference images (multi-reference composition).
Ground image generation with Google Search references for accurate visual context.
Automate image creation in scripts or agent workflows using the Python SDK.

Best practices

Default to gemini-3-pro-image-preview; use gemini-2.5-flash-image as a faster/cheaper alternative when quality trade-offs are acceptable.
Ensure GEMINI_API_KEY is set in environment or .env before running requests.
Set response_modalities=["IMAGE"] when you only need images to reduce noise in responses.
Allow 30–60 seconds for generation; increase shell/tool timeouts to 60–120 seconds to avoid premature failures.
Always call part.as_image() for returned image parts and handle the no-image case with clear error messages and troubleshooting tips.

Example use cases

Create marketing visuals from text prompts for social media campaigns.
Edit a product photo to change background, color, or add branding.
Compose a scene using multiple reference images for a concept board.
Quickly prototype UI assets or character concepts from text descriptions.
Automate batch image generation in a content pipeline using the Python SDK.

FAQ

What model should I use?

Use gemini-3-pro-image-preview by default for best quality; gemini-2.5-flash-image is an optional faster and cheaper alternative.

How do I access returned images?

Inspect response parts and call part.as_image() to save or manipulate image outputs. If no image parts exist, report an error and verify API key, model name, and response modalities.

Why is generation slow or timing out?

Image generation can take ~30 seconds. Increase command or tool timeouts to 60–120 seconds and retry after a short wait if necessary.