home / skills / openclaw / skills / image-generation

image-generation skill

safe

This skill helps you generate and refine AI images by guiding prompt crafting, provider selection, and production-ready outputs.

npx playbooks add skill openclaw/skills --skill image-generation

Review the files below or copy the command above to add this skill to your agents.

Files (16)

SKILL.md

5.9 KB

---
name: AI Image Generation
slug: image-generation
version: 1.0.3
homepage: https://clawic.com/skills/image-generation
description: Create AI images with GPT Image, Gemini Nano Banana, FLUX, Imagen, and top providers using prompt engineering, style control, and smart editing.
changelog: Updated for 2026 with benchmark-backed model selection and clearer guidance for modern image generation stacks.
metadata: {"clawdbot":{"emoji":"🎨","requires":{"bins":[],"env.optional":["OPENAI_API_KEY","GEMINI_API_KEY","BFL_API_KEY","GOOGLE_CLOUD_PROJECT","REPLICATE_API_TOKEN","LEONARDO_API_KEY","IDEOGRAM_API_KEY"],"config":["~/image-generation/"]},"os":["linux","darwin","win32"]}}
---

## Setup

On first use, read `setup.md`.

## When to Use

User needs AI-generated visuals, edits, or consistent image sets.
Use this skill to pick the right model, write stronger prompts, and avoid outdated model choices.

## Architecture

User preferences persist in `~/image-generation/`. See `memory-template.md` for setup.

```
~/image-generation/
├── memory.md      # Preferred providers, project context, winning recipes
└── history.md     # Optional generation log
```

## Quick Reference

| Topic | File |
|-------|------|
| Initial setup | `setup.md` |
| Memory template | `memory-template.md` |
| Migration guide | `migration.md` |
| Benchmark snapshots | `benchmarks-2026.md` |
| Prompt techniques | `prompting.md` |
| API handling | `api-patterns.md` |
| GPT Image (OpenAI) | `gpt-image.md` |
| Gemini and Imagen (Google) | `gemini.md` |
| FLUX (Black Forest Labs) | `flux.md` |
| Midjourney | `midjourney.md` |
| Leonardo | `leonardo.md` |
| Ideogram | `ideogram.md` |
| Replicate | `replicate.md` |
| Stable Diffusion | `stable-diffusion.md` |

## Core Rules

### 1. Resolve aliases to official model IDs first

Community names shift quickly. Before calling an API, map the nickname to the provider model ID.

| Community label | Official model ID to try first | Notes |
|-----------------|--------------------------------|-------|
| Nano Banana | `gemini-2.5-flash-image-preview` | Common nickname, not an official Google model ID |
| Nano Banana 2 / Pro | Verify provider docs | Usually a provider preset over Gemini image models |
| GPT Image 1.5 | `gpt-image-1.5` | Current OpenAI high-tier image model |
| GPT Image mini / iMini | `gpt-image-1-mini` | Budget/faster OpenAI variant |
| FLUX 2 Pro / Max | `flux-pro` / `flux-ultra` | Many platforms rename these SKUs |

### 2. Pick models by task, not by hype

| Task | First choice | Backup |
|------|--------------|--------|
| Exact text in image | `gpt-image-1.5` | Ideogram |
| Multi-turn edits | `gemini-2.5-flash-image-preview` | `flux-kontext-pro` |
| Photoreal hero shots | `imagen-4.0-ultra-generate-001` | `flux-ultra` |
| Fast low-cost drafts | `gpt-image-1-mini` | `imagen-4.0-fast-generate-001` |
| Character/product consistency | `flux-kontext-max` | `gpt-image-1.5` with references |
| Local no-API workflows | `flux-schnell` | SDXL |

### 3. Use benchmark tables as dated snapshots

Benchmarks drift weekly. Use `benchmarks-2026.md` as a starting point, then recheck current rankings when quality is critical.

### 4. Draft cheap, finish expensive

Start with 1-4 low-cost drafts, pick one, then upscale or rerender only the winner.

### 5. Keep a fallback chain

If the preferred model is unavailable, fallback by tier:
1) same provider lower tier, 2) cross-provider equivalent, 3) local/open model.

### 6. Treat DALL-E as legacy

OpenAI lists DALL-E 2/3 as legacy. Do not use them as default for new projects.

## Common Traps

- Using vendor nicknames as model IDs -> API errors and wasted retries
- Assuming "Nano Banana Pro" or "FLUX 2" are universal IDs -> provider mismatch
- Copying old DALL-E prompt habits -> weaker output vs modern GPT/Gemini image models
- Comparing text-to-image and image-editing scores as if they were the same benchmark
- Optimizing every draft at max quality -> cost spikes without quality gain

## Security & Privacy

**Data that leaves your machine:**
- Prompt text
- Reference images when editing or style matching

**Data that stays local:**
- Provider preferences in `~/image-generation/memory.md`
- Optional local history file

**This skill does NOT:**
- Store API keys
- Upload files outside chosen provider requests
- Persist generated images unless user asks to save them

## External Endpoints

| Provider | Endpoint | Data Sent | Purpose |
|----------|----------|-----------|---------|
| OpenAI | `api.openai.com` | Prompt text, optional input images | GPT Image generation/editing |
| Google Gemini API | `generativelanguage.googleapis.com` | Prompt text, optional input images | Gemini image generation/editing |
| Google Vertex AI | `aiplatform.googleapis.com` | Prompt text, optional input images | Imagen 4 generation |
| Black Forest Labs | `api.bfl.ai` | Prompt text, optional input images | FLUX generation/editing |
| Replicate | `api.replicate.com` | Prompt text, optional input images | Hosted third-party image models |
| Midjourney | `discord.com` | Prompt text | Midjourney generation via Discord workflows |
| Leonardo | `cloud.leonardo.ai` | Prompt text, optional input images | Leonardo generation/editing |
| Ideogram | `api.ideogram.ai` | Prompt text | Typography-focused image generation |

No other data is sent externally.

## Migration

If upgrading from a previous version, read `migration.md` before updating local memory structure.

## Trust

This skill may send prompts and reference images to third-party AI providers.
Only install if you trust those providers with your content.

## Related Skills
Install with `clawhub install <slug>` if user confirms:
- `image-edit` - Specialized inpainting, outpainting, and mask workflows
- `video-generation` - Convert image concepts into video pipelines
- `colors` - Build palettes for visual consistency across assets
- `ffmpeg` - Post-process image sequences and exports

## Feedback

- If useful: `clawhub star image-generation`
- Stay updated: `clawhub sync`

Overview

This skill helps you create and refine AI-generated images with optimized prompts, style control, and production-ready outputs. It guides you through provider selection, draft-to-final workflows, and practical fixes for common generation failures. The goal is fast iteration, consistent results, and cost-effective production.

How this skill works

I guide you to choose the right generation mode (text-to-image, image editing, style transfer, or upscaling) and recommend providers and model types for each goal. I help craft subject-first prompts, suggest style keywords and negative prompts, and set draft resolutions for quick validation before upscaling to production sizes. I also outline iteration loops: generate multiple variations, pick the best, refine, and then upscale with dedicated tools.

When to use it

Creating photorealistic product shots or cinematic portraits
Editing source images or performing inpainting and precise retouches
Transferring a reference style to new content while preserving subject details
Rapid concept exploration where fast, cheap iterations are needed
Producing final high-resolution assets for web, print, or advertising

Best practices

Start with a draft resolution (512–1024px) to validate prompts before upscaling
Put the subject first, add specific style and lighting keywords, and use negative prompts to exclude unwanted elements
Batch similar prompts and use faster models for iteration, switching to quality models only for final renders
Use aspect ratio intentionally (1:1 for portraits, 16:9 for landscapes, 9:16 for mobile)
Upscale with dedicated tools (Real-ESRGAN, Topaz) and export PNG for transparency or JPEG/WebP for photos

Example use cases

Generate multiple product photo variants for an e-commerce catalog, pick the best, and upscale to 2048px+
Perform targeted inpainting to correct hands or facial artifacts using masks and low img2img strength
Create social ad visuals in 9:16 with cinematic lighting and brand-consistent color grading
Transfer an oil-painting style from a reference image onto new subject photos while controlling style strength
Iterate character concepts quickly using cached seeds to maintain consistency across poses and expressions

FAQ

Which provider should I pick for text-heavy images like posters?

Use models optimized for text rendering; some providers handle glyphs and typography better—if text fidelity matters, plan to add final text in post-production.

What resolution should I use for final outputs?

Draft at 512–1024px for iterations, then upscale winners to 2048px or higher using dedicated upscalers for production.