home / skills / cnemri / google-genai-skills / nano-banana-build

nano-banana-build skill

safe

This skill generates and edits high-quality images using Gemini Nano Banana models, enabling text-to-image, style transfer, and precision edits.

npx playbooks add skill cnemri/google-genai-skills --skill nano-banana-build

Review the files below or copy the command above to add this skill to your agents.

Files (7)

SKILL.md

2.1 KB

---
name: nano-banana-build
description: Generate and edit high-quality images using Gemini 2.5 Flash Image and Gemini 3 Pro Image (Nano Banana). Supports Text-to-Image, Style Transfer, Virtual Try-On, and Character Consistency.
---

# Nano Banana Image Generation Skill

Use this skill to generate and edit images using the `google-genai` Python SDK with Gemini's specialized image models (Nano Banana).

## Quick Start Setup

```python
from google import genai
from google.genai import types
from PIL import Image
import io

client = genai.Client()
```

## Reference Materials

- **[Model Capabilities](references/model_capabilities.md)**: Comparison of Gemini 2.5 vs 3 Pro, resolutions, and token costs.
- **[Image Generation](references/image_generation.md)**: Text-to-Image, Interleaved Text/Image.
- **[Image Editing](references/image_editing.md)**: Subject Customization, Style Transfer, Multi-turn Editing.
- **[Thinking Process](references/thinking_process.md)**: Understanding thoughts and signatures (Gemini 3 Pro).
- **[Recipes](references/recipes.md)**: Extensive collection of examples (Logos, Stickers, Mockups, Comics, etc.).
- **[Source Code](references/source_code.md)**: Deep inspection of SDK internals.

## Available Models

- **`gemini-2.5-flash-image` (Nano Banana)**: Fast, high-quality generation and editing. Best for most use cases.
- **`gemini-3-pro-image-preview` (Nano Banana Pro)**: Highest fidelity, supports `2K` and `4K` resolution, complex prompt adherence, and grounding.

## Common Workflows

### 1. Fast Generation
```python
response = client.models.generate_content(
    model='gemini-2.5-flash-image',
    contents='A cute robot eating a banana',
    config=types.GenerateContentConfig(
        response_modalities=['IMAGE']
    )
)
```

### 2. High-Quality Editing
```python
response = client.models.generate_content(
    model='gemini-3-pro-image-preview',
    contents=[
        types.Part.from_uri(file_uri='gs://.../shoe.jpg', mime_type='image/jpeg'),
        "Change the color of the shoe to neon green."
    ],
    config=types.GenerateContentConfig(response_modalities=['IMAGE'])
)
```

Overview

This skill lets you generate and edit high-quality images using Gemini Nano Banana models (Gemini 2.5 Flash Image and Gemini 3 Pro Image). It supports text-to-image, image-based prompts, style transfer, virtual try-on, and maintaining character consistency across edits. The skill is implemented with the google-genai Python SDK for fast iteration and production-ready outputs.

How this skill works

The skill calls Gemini image models to produce or modify images from textual prompts and input images. You can send plain text prompts, combine images with instructions, or provide image URIs for targeted edits. Gemini 2.5 Flash Image is optimized for fast generation, while Gemini 3 Pro Image offers higher fidelity and larger resolutions for complex edits.

When to use it

Create concept art, product mockups, or social graphics from text prompts.
Edit existing photos to change color, style, or specific objects using image-guided instructions.
Apply consistent character poses or appearances across multiple frames or variations.
Perform style transfer to match a target aesthetic or brand identity.
Generate high-resolution assets (2K/4K) when fidelity and fine detail matter.

Best practices

Start with a clear, concrete prompt: mention subject, style, mood, and any constraints (color, angle, background).
Use Gemini 2.5 Flash for rapid prototyping and Gemini 3 Pro for final, high-fidelity renders or large resolutions.
When editing, include the source image and a precise instruction for the change to avoid ambiguous results.
For character consistency, provide reference images and describe distinguishing features to preserve across edits.
Iterate with small prompt adjustments and review outputs at target resolution before batching large runs.

Example use cases

Text-to-image: generate hero images or marketing banners from short prompts.
Image editing: change shoe color or remove background using an input image URI and a brief instruction.
Style transfer: convert product photos to match a campaign aesthetic or artist reference.
Virtual try-on: composite apparel or accessories onto supplied model photos with pose-aware edits.
Character consistency: produce multiple poses of the same character while maintaining facial and costume details.

FAQ

Which model should I pick for speed vs. quality?

Use gemini-2.5-flash-image for fast generation and iteration; choose gemini-3-pro-image-preview for highest fidelity, complex prompt adherence, and 2K/4K outputs.

How do I preserve a subject across multiple edits?

Supply clear reference images and describe the specific attributes to keep (colors, markings, proportions). Use consistent prompts and include the reference image each turn.