home / skills / akrindev / google-studio-skills / gemini-image

gemini-image skill

This skill generates high-quality images from prompts using Google's Gemini and Imagen models, offering multiple outputs, aspect ratios, and resolutions.

npx playbooks add skill akrindev/google-studio-skills --skill gemini-image

Review the files below or copy the command above to add this skill to your agents.

Files (2)

SKILL.md

10.7 KB

---
name: gemini-image
description: Generate images using Google Gemini and Imagen models via scripts/. Use for AI image generation, text-to-image, creating visuals from prompts, generating multiple images, custom aspect ratios, and high-resolution output up to 4K. Triggers on "generate image", "create image", "imagen", "text to image", "AI art", "nano banana".
license: MIT
version: 1.0.0
keywords: image generation, imagen, gemini-3-pro, gemini-2.5, text-to-image, AI art, nano banana, 4K resolution, aspect ratio
---

# Gemini Image Generation

Generate high-quality images from text prompts using Google's Gemini and Imagen models through executable scripts.

## When to Use This Skill

Use this skill when you need to:
- Create visual content from text descriptions
- Generate multiple image variations
- Create images at specific resolutions (1K, 2K, 4K)
- Produce images for different aspect ratios (social media, banners, etc.)
- Generate photorealistic images or artistic visuals
- Create images with person generation controls
- Batch generate multiple images at once
- Combine with text generation for complete content creation

## Available Scripts

### scripts/generate_image.py
**Purpose**: Generate images using Gemini 3 Pro Image or Imagen 4 models

**When to use**:
- Any image generation task
- Multiple image generation (1-4 per request)
- Custom resolution and aspect ratio needs
- Professional asset creation
- Photorealistic or artistic image generation

**Key parameters**:
| Parameter | Description | Example |
|-----------|-------------|---------|
| `prompt` | Text description (required) | `"A futuristic city at sunset"` |
| `--model`, `-m` | Model to use | `gemini-3-pro-image-preview` |
| `--output-dir`, `-o` | Output directory for images | `images/` |
| `--name`, `-n` | Base name for output files | `artwork` |
| `--no-timestamp` | Disable auto timestamp | Flag |
| `--aspect`, `-a` | Aspect ratio | `16:9` |
| `--size`, `-s` | Resolution | `2K` or `4K` |
| `--num` | Number of images (1-4) | `4` |
| `--person` | Person generation policy | `allow_adult` |

**Output**: List of saved PNG file paths

## Workflows

### Workflow 1: Basic Image Generation
```bash
python scripts/generate_image.py "A futuristic city at sunset with flying cars"
```
- Best for: Quick image generation, prototypes
- Model: `gemini-3-pro-image-preview` (default, highest quality)
- Output: `images/generated_image_YYYYMMDD_HHMMSS.png`

### Workflow 2: Social Media (Instagram, Facebook)
```bash
python scripts/generate_image.py "Minimalist coffee shop interior" --aspect 1:1 --size 2K --name coffee-shop
```
- Best for: Instagram posts, profile pictures
- Aspect: 1:1 (square format)
- Resolution: 2K (2048x2048)
- Output: `images/coffee-shop_YYYYMMDD_HHMMSS.png`

### Workflow 3: YouTube Thumbnails (16:9)
```bash
python scripts/generate_image.py "Tech gadget review thumbnail with vibrant colors" --aspect 16:9 --size 2K --name thumbnail
```
- Best for: YouTube, video thumbnails
- Aspect: 16:9 (widescreen)
- Resolution: 2K (2752x1536)
- Output: `images/thumbnail_YYYYMMDD_HHMMSS.png`

### Workflow 4: Multiple Variations
```bash
python scripts/generate_image.py "Abstract geometric patterns in blue and gold" --num 4 --name abstract
```
- Best for: A/B testing, design options
- Generates: 4 distinct variations
- Output: `images/abstract_YYYYMMDD_HHMMSS_0.png`, `images/abstract_YYYYMMDD_HHMMSS_1.png`, etc.

### Workflow 5: Custom Output Directory
```bash
python scripts/generate_image.py "Detailed architectural rendering of modern museum" --aspect 16:9 --size 4K --output-dir ./professional/ --name museum
```
- Best for: Print materials, high-end assets, organized projects
- Model: `gemini-3-pro-image-preview` only (for 4K)
- Resolution: 4K (5504x3072 for 16:9)
- Directory created automatically if it doesn't exist

### Workflow 6: Photorealistic Images (Imagen 4)
```bash
python scripts/generate_image.py "Robot holding a red skateboard in urban setting" --model imagen-4.0-generate-001 --aspect 16:9 --size 2K --num 2 --name robot-skate
```
- Best for: Realistic photos, product shots
- Model: `imagen-4.0-generate-001` (photorealistic)
- Notes: English prompts only
- Max 4 images per request

### Workflow 7: Blog Post Featured Image
```bash
python scripts/generate_image.py "Serene mountain lake at sunrise with reflections" --aspect 16:9 --size 2K --output-dir ./blog-images/ --name featured-image
```
- Best for: Blog headers, article images
- Combines well with: gemini-text for blog content generation

### Workflow 8: Content Creation Pipeline (Text + Image)
```bash
# 1. Generate content (gemini-text skill)
python skills/gemini-text/scripts/generate.py "Write a product description for smart home device"

# 2. Generate product image (this skill)
python scripts/generate_image.py "Sleek modern smart home device on white background" --aspect 4:3 --size 2K --name product

# 3. Create social media post
```
- Best for: E-commerce, marketing campaigns
- Combines with: gemini-text, gemini-batch for batch production

### Workflow 9: Disable Timestamp
```bash
python scripts/generate_image.py "Fixed filename image" --name my-image --no-timestamp
```
- Best for: When you want complete control over filename
- Output: `images/my-image.png` (no timestamp)
- Use when: Generating files for specific naming schemes or automated pipelines

## Parameters Reference

### Model Selection

| Model | Nickname | Quality | Max Size | Best For |
|-------|----------|---------|----------|----------|
| `gemini-3-pro-image-preview` | Nano Banana Pro | Highest | 4K | Professional assets, advanced text rendering |
| `gemini-2.5-flash-image` | Nano Banana | Good | 2K | High-volume, low-latency |
| `imagen-4.0-generate-001` | Imagen 4 | Photorealistic | 2K | Realistic photos, product shots |

### Aspect Ratios

| Ratio | Use Case | 1K Size | 2K Size |
|-------|----------|----------|----------|
| 1:1 | Instagram, avatars | 1024x1024 | 2048x2048 |
| 16:9 | YouTube, presentations | 1376x768 | 2752x1536 |
| 9:16 | Instagram Stories, TikTok | 768x1376 | 1536x2752 |
| 4:3 | Traditional displays | 1024x768 | 2048x1536 |
| 3:4 | Portrait orientation | 768x1024 | 1536x2048 |
| 21:9 | Ultrawide | - | 5504x2400 |

Note: 4K resolution only available with `gemini-3-pro-image-preview`

### Resolution Guide

| Size | Use Case | Best Model |
|------|----------|-------------|
| 1K (1024px) | Web thumbnails, previews | Any model |
| 2K (2048px) | Standard web, social media | Any model |
| 4K (4096px) | Print, high-end assets | gemini-3-pro only |

### Person Generation Policy

| Policy | Description | Restrictions |
|---------|-------------|----------------|
| `dont_allow` | No people in images | None |
| `allow_adult` | Adults only | Recommended default |
| `allow_all` | All ages | Restricted in EU, UK, CH, MENA |

## Output Interpretation

### File Naming
- Default format: `{name}_YYYYMMDD_HHMMSS.png` (auto timestamp)
- Single image example: `artwork_20260130_031643.png`
- Multiple images: `{name}_YYYYMMDD_HHMMSS_0.png`, `{name}_YYYYMMDD_HHMMSS_1.png`, etc.
- Without timestamp (`--no-timestamp`): `{name}.png`
- Script prints: "Saved: /path/to/file.png"

### Image Quality
- All images include SynthID watermark for authenticity
- PNG format for lossless quality
- Can be converted to JPEG/WEBP if needed
- 4K images are significantly larger file sizes

### Error Messages
- "Model not available": Check model name spelling
- "Unsupported size": Verify size/model combination
- "Aspect ratio error": Use supported ratios for selected model

## Common Issues

### "google-genai or pillow not installed"
```bash
pip install google-genai pillow
```

### "Image generation failed"
- Check prompt length (too verbose can fail)
- Try simpler, more focused prompts
- Verify model availability in your region
- Check API quota limits

### "Unsupported aspect ratio"
- Check if ratio is supported by selected model
- Imagen 4 has fewer ratio options than Gemini
- Use 16:9 or 1:1 for best compatibility

### "4K not supported"
- 4K only works with `gemini-3-pro-image-preview`
- Use `--size 2K` for other models
- Try `--model gemini-3-pro-image-preview --size 4K`

### "Imagen prompt language error"
- Imagen models support English prompts only
- Use `gemini-3-pro-image-preview` for other languages
- Translate prompt to English for Imagen

### File too large for storage
- Use `--size 1K` for smaller files
- Compress images after generation
- Convert PNG to JPEG for web use

## Best Practices

### Prompt Engineering
- Be specific and descriptive
- Include style descriptors (e.g., "photorealistic", "digital art")
- Mention lighting, mood, and composition
- Use analogies for complex concepts
- Avoid negative prompts (describe what you want, not what to avoid)

### Model Selection
- Use `gemini-3-pro-image-preview` for: High quality, text rendering, 4K
- Use `gemini-2.5-flash-image` for: Speed, high volume
- Use `imagen-4.0-generate-001` for: Photorealism, product shots

### Performance Optimization
- Generate multiple images at once with `--num`
- Use lower resolution for previews
- Batch requests for high-volume needs (gemini-batch skill)
- Cache results for repeated requests

### Quality Tips
- Use 2K resolution for most web uses
- 4K only when maximum detail is needed
- Combine specific prompts with style guidance
- Test prompts with `--num 1` before generating batches

### Cost Management
- Use flash models for cost efficiency
- 4K generation costs significantly more
- Batch multiple requests when possible
- Generate at 1K for testing, 2K/4K for final

## Related Skills

- **gemini-text**: Generate text content alongside images
- **gemini-tts**: Create audio for image-based content
- **gemini-batch**: Process multiple image requests efficiently
- **gemini-embeddings**: Generate image embeddings for similarity search

## Quick Reference

```bash
# Basic
python scripts/generate_image.py "Your prompt"

# Social media (1:1)
python scripts/generate_image.py "Prompt" --aspect 1:1 --size 2K --name social-post

# YouTube thumbnail (16:9)
python scripts/generate_image.py "Prompt" --aspect 16:9 --size 2K --name thumbnail

# 4K high quality
python scripts/generate_image.py "Prompt" --aspect 16:9 --size 4K --name high-res

# Multiple variations
python scripts/generate_image.py "Prompt" --num 4 --name variations

# Custom directory
python scripts/generate_image.py "Prompt" --output-dir ./my-images/ --name custom

# Photorealistic
python scripts/generate_image.py "Prompt" --model imagen-4.0-generate-001 --aspect 16:9 --size 2K --name photo

# No timestamp
python scripts/generate_image.py "Prompt" --name fixed-name --no-timestamp
```

## Reference

- See `references/` for model documentation (if available)
- Get API key: https://aistudio.google.com/apikey
- Documentation: https://ai.google.dev/gemini-api/docs/image-generation
- SynthID: https://deepmind.google/technologies/synthid/

Overview

This skill generates high-quality images from text prompts using Google Gemini and Imagen models via executable scripts. It supports custom aspect ratios, multiple image variations, and high-resolution outputs up to 4K for professional assets. Use it to produce photorealistic or artistic visuals, batch images, and integrate images into content pipelines.

How this skill works

Run the provided image-generation script with a text prompt and optional flags (model, size, aspect, number, output directory, and person policy). The script selects the requested Gemini or Imagen model, renders 1–4 PNG images, and saves them with predictable filenames (timestamped by default). It prints saved file paths and basic error messages for issues like unsupported sizes or model availability.

When to use it

Create visual assets from descriptive text for marketing, blogs, or product pages
Generate multiple design variations for A/B testing or creative exploration
Produce images in specific resolutions and aspect ratios (social, thumbnail, print)
Create photorealistic product shots using Imagen 4 or high-detail 4K assets with Gemini 3 Pro
Batch-produce images as part of an automated content pipeline

Best practices

Write concise, descriptive prompts including style, lighting, and composition details
Start with 1K/2K previews before committing to 4K to save time and cost
Choose gemini-3-pro-image-preview for maximum quality and 4K, imagen-4.0 for photorealism, and gemini-2.5-flash for speed
Use --num to generate variations and test prompts iteratively
Respect person generation policies and regional restrictions when enabling person controls

Example use cases

Generate a 1:1 2K image for an Instagram post: quick social media assets
Create a 16:9 2K YouTube thumbnail with vibrant composition and readable text
Produce four variations of a concept for design review and A/B testing
Render a 4K architectural visualization for print or high-end presentations
Combine with text generation to produce product descriptions and matching product images

FAQ

Which model should I pick for the highest quality?

Use gemini-3-pro-image-preview for the highest quality and 4K outputs; imagen-4.0 is best for photorealistic product shots.

How many images can I generate per request?

The script supports generating 1–4 images per request using the --num flag.

Why did generation fail with 'unsupported size'?

Some models restrict aspect ratios and max sizes; switch to a supported size or use gemini-3-pro-image-preview for 4K.