home / skills / xiangyu-cas / vision-skills / image-generation
This skill generates and edits images with Gemini via Python SDK, supporting text-to-image, edits, and multi-reference composition.
npx playbooks add skill xiangyu-cas/vision-skills --skill image-generationReview the files below or copy the command above to add this skill to your agents.
---
name: image-generation
description: Gemini image generation and editing skill for text-to-image, image-to-image edits, multi-reference composition, and Google Search grounding. Use when creating or modifying images via Gemini (default model gemini-3-pro-image-preview) with the Python SDK.
---
# Image Generation with Gemini
Use this skill when the user asks to generate or edit images with Gemini using the Python SDK. Default to `gemini-3-pro-image-preview`, and mention `gemini-2.5-flash-image` only as an optional faster/cheaper alternative.
## Workflow
1) Identify task type (text-to-image, edit, or multi-reference).
2) Ensure `GEMINI_API_KEY` is available (env or stored in `.env`), then use the Python SDK. This will make network requests to the Gemini API
3) Choose model + output (`response_modalities=["IMAGE"]` if image-only) and run. Generation can take ~30 seconds; allow 30–60 seconds before retrying.
4) Save returned images with `part.as_image()`; if none, report a clear error.
## Use these references
- `references/python.md` for Python SDK usage
## Response handling (Python SDK)
Use `part.as_image()` to access image outputs and save them. If no image parts are returned, surface a clear error and suggest checking the API key, model name, and response modalities.
## Timing note
Image generation may take around 30 seconds. When running commands via the shell tool, set a longer timeout (e.g., 60–120 seconds) to avoid premature timeouts.
This skill enables image generation and editing with Gemini using the Python SDK. It defaults to the gemini-3-pro-image-preview model and supports text-to-image, image-to-image edits, multi-reference composition, and Google Search grounding. Use it to create or modify images programmatically and to handle returned image parts reliably.
The skill detects the task type (text-to-image, edit, or multi-reference) and issues requests to Gemini via the Python SDK. It sets the appropriate model and response modalities (use response_modalities=["IMAGE"] for image-only outputs), waits for the response (generation can take ~30 seconds), and saves outputs using part.as_image(). If no images are returned, the skill surfaces a clear error and troubleshooting hints.
What model should I use?
Use gemini-3-pro-image-preview by default for best quality; gemini-2.5-flash-image is an optional faster and cheaper alternative.
How do I access returned images?
Inspect response parts and call part.as_image() to save or manipulate image outputs. If no image parts exist, report an error and verify API key, model name, and response modalities.
Why is generation slow or timing out?
Image generation can take ~30 seconds. Increase command or tool timeouts to 60–120 seconds and retry after a short wait if necessary.