home / skills / everyinc / compound-engineering-plugin / gemini-imagegen

This skill helps you generate and edit images using the Gemini API, enabling prompts-driven visuals, style transfers, and multi-turn refinements.

npx playbooks add skill everyinc/compound-engineering-plugin --skill gemini-imagegen

Review the files below or copy the command above to add this skill to your agents.

Files (7)
SKILL.md
6.3 KB
---
name: gemini-imagegen
description: This skill should be used when generating and editing images using the Gemini API (Nano Banana Pro). It applies when creating images from text prompts, editing existing images, applying style transfers, generating logos with text, creating stickers, product mockups, or any image generation/manipulation task. Supports text-to-image, image editing, multi-turn refinement, and composition from multiple reference images.
---

# Gemini Image Generation (Nano Banana Pro)

Generate and edit images using Google's Gemini API. The environment variable `GEMINI_API_KEY` must be set.

## Default Model

| Model | Resolution | Best For |
|-------|------------|----------|
| `gemini-3-pro-image-preview` | 1K-4K | All image generation (default) |

**Note:** Always use this Pro model. Only use a different model if explicitly requested.

## Quick Reference

### Default Settings
- **Model:** `gemini-3-pro-image-preview`
- **Resolution:** 1K (default, options: 1K, 2K, 4K)
- **Aspect Ratio:** 1:1 (default)

### Available Aspect Ratios
`1:1`, `2:3`, `3:2`, `3:4`, `4:3`, `4:5`, `5:4`, `9:16`, `16:9`, `21:9`

### Available Resolutions
`1K` (default), `2K`, `4K`

## Core API Pattern

```python
import os
from google import genai
from google.genai import types

client = genai.Client(api_key=os.environ["GEMINI_API_KEY"])

# Basic generation (1K, 1:1 - defaults)
response = client.models.generate_content(
    model="gemini-3-pro-image-preview",
    contents=["Your prompt here"],
    config=types.GenerateContentConfig(
        response_modalities=['TEXT', 'IMAGE'],
    ),
)

for part in response.parts:
    if part.text:
        print(part.text)
    elif part.inline_data:
        image = part.as_image()
        image.save("output.png")
```

## Custom Resolution & Aspect Ratio

```python
from google.genai import types

response = client.models.generate_content(
    model="gemini-3-pro-image-preview",
    contents=[prompt],
    config=types.GenerateContentConfig(
        response_modalities=['TEXT', 'IMAGE'],
        image_config=types.ImageConfig(
            aspect_ratio="16:9",  # Wide format
            image_size="2K"       # Higher resolution
        ),
    )
)
```

### Resolution Examples

```python
# 1K (default) - Fast, good for previews
image_config=types.ImageConfig(image_size="1K")

# 2K - Balanced quality/speed
image_config=types.ImageConfig(image_size="2K")

# 4K - Maximum quality, slower
image_config=types.ImageConfig(image_size="4K")
```

### Aspect Ratio Examples

```python
# Square (default)
image_config=types.ImageConfig(aspect_ratio="1:1")

# Landscape wide
image_config=types.ImageConfig(aspect_ratio="16:9")

# Ultra-wide panoramic
image_config=types.ImageConfig(aspect_ratio="21:9")

# Portrait
image_config=types.ImageConfig(aspect_ratio="9:16")

# Photo standard
image_config=types.ImageConfig(aspect_ratio="4:3")
```

## Editing Images

Pass existing images with text prompts:

```python
from PIL import Image

img = Image.open("input.png")
response = client.models.generate_content(
    model="gemini-3-pro-image-preview",
    contents=["Add a sunset to this scene", img],
    config=types.GenerateContentConfig(
        response_modalities=['TEXT', 'IMAGE'],
    ),
)
```

## Multi-Turn Refinement

Use chat for iterative editing:

```python
from google.genai import types

chat = client.chats.create(
    model="gemini-3-pro-image-preview",
    config=types.GenerateContentConfig(response_modalities=['TEXT', 'IMAGE'])
)

response = chat.send_message("Create a logo for 'Acme Corp'")
# Save first image...

response = chat.send_message("Make the text bolder and add a blue gradient")
# Save refined image...
```

## Prompting Best Practices

### Photorealistic Scenes
Include camera details: lens type, lighting, angle, mood.
> "A photorealistic close-up portrait, 85mm lens, soft golden hour light, shallow depth of field"

### Stylized Art
Specify style explicitly:
> "A kawaii-style sticker of a happy red panda, bold outlines, cel-shading, white background"

### Text in Images
Be explicit about font style and placement:
> "Create a logo with text 'Daily Grind' in clean sans-serif, black and white, coffee bean motif"

### Product Mockups
Describe lighting setup and surface:
> "Studio-lit product photo on polished concrete, three-point softbox setup, 45-degree angle"

## Advanced Features

### Google Search Grounding
Generate images based on real-time data:

```python
response = client.models.generate_content(
    model="gemini-3-pro-image-preview",
    contents=["Visualize today's weather in Tokyo as an infographic"],
    config=types.GenerateContentConfig(
        response_modalities=['TEXT', 'IMAGE'],
        tools=[{"google_search": {}}]
    )
)
```

### Multiple Reference Images (Up to 14)
Combine elements from multiple sources:

```python
response = client.models.generate_content(
    model="gemini-3-pro-image-preview",
    contents=[
        "Create a group photo of these people in an office",
        Image.open("person1.png"),
        Image.open("person2.png"),
        Image.open("person3.png"),
    ],
    config=types.GenerateContentConfig(
        response_modalities=['TEXT', 'IMAGE'],
    ),
)
```

## Important: File Format & Media Type

**CRITICAL:** The Gemini API returns images in JPEG format by default. When saving, always use `.jpg` extension to avoid media type mismatches.

```python
# CORRECT - Use .jpg extension (Gemini returns JPEG)
image.save("output.jpg")

# WRONG - Will cause "Image does not match media type" errors
image.save("output.png")  # Creates JPEG with PNG extension!
```

### Converting to PNG (if needed)

If you specifically need PNG format:

```python
from PIL import Image

# Generate with Gemini
for part in response.parts:
    if part.inline_data:
        img = part.as_image()
        # Convert to PNG by saving with explicit format
        img.save("output.png", format="PNG")
```

### Verifying Image Format

Check actual format vs extension with the `file` command:

```bash
file image.png
# If output shows "JPEG image data" - rename to .jpg!
```

## Notes

- All generated images include SynthID watermarks
- Gemini returns **JPEG format by default** - always use `.jpg` extension
- Image-only mode (`responseModalities: ["IMAGE"]`) won't work with Google Search grounding
- For editing, describe changes conversationally—the model understands semantic masking
- Default to 1K resolution for speed; use 2K/4K when quality is critical

Overview

This skill provides a practical guide for generating and editing images using the Gemini API (Nano Banana Pro). It covers default model choices, resolution and aspect-ratio options, image editing, multi-turn refinement, and important file-format details. Use it to produce photorealistic scenes, stylized art, logos, stickers, and product mockups with repeatable settings.

How this skill works

The skill uses the gemini-3-pro-image-preview model as the default for text-to-image, image editing, and composition from multiple reference images. You send prompts and optional image files to the API, specify image_config for resolution and aspect ratio, and save returned inline image data (Gemini returns JPEG by default). It also supports chat-based multi-turn refinement and optional Google Search grounding for real-time data.

When to use it

  • Create new images from text prompts (logos, stickers, art, photorealistic scenes).
  • Edit or augment existing images by passing the original image with an instruction.
  • Iteratively refine images through a chat-style multi-turn workflow.
  • Compose scenes from multiple reference images (up to 14 inputs).
  • Produce product mockups or marketing visuals at different resolutions and aspect ratios.

Best practices

  • Always default to model gemini-3-pro-image-preview unless explicitly requested otherwise.
  • Use 1K for quick previews, 2K for balanced results, and 4K for maximum detail when needed.
  • Specify aspect ratio explicitly for final output (1:1, 16:9, 9:16, 21:9, etc.).
  • Include camera and lighting details for photorealism; state art style and linework for stylized images.
  • Save generated images with .jpg extension because Gemini returns JPEG by default; convert to PNG only when you explicitly re-save in PNG format.
  • Use conversational prompts for edits and take advantage of chat multi-turn refinement for iterative changes.

Example use cases

  • Design a clean sans-serif logo reading 'Daily Grind' with coffee-bean motif, export at 2K square.
  • Edit a product photo: add studio lighting and a polished concrete surface for mockups.
  • Generate a kawaii sticker of a red panda with bold outlines and white background.
  • Compose a group photo by combining multiple headshots into a cohesive office scene.
  • Create a weather infographic for Tokyo using Google Search grounding and save the visual.

FAQ

What model and settings should I choose?

Use gemini-3-pro-image-preview by default. Start at 1K for speed, move to 2K/4K for higher quality, and pick the aspect ratio required for your use case.

Why must I use .jpg when saving images?

Gemini returns JPEG data by default. Saving with a .png extension can create media-type mismatches; convert explicitly to PNG if needed.