home / skills / openclaw / skills / image-gen-compare

image-gen-compare skill

safe

This skill compares image generation results across multiple models from same prompt, logs costs, times, and quality to help choose the best model.

npx playbooks add skill openclaw/skills --skill image-gen-compare

Review the files below or copy the command above to add this skill to your agents.

Files (3)

SKILL.md

1.5 KB

---
name: image-gen-compare
version: 1.0.0
description: Side-by-side comparison of paid vs local image generation models — DALL-E 3, FLUX.1-schnell, Gemini Imagen, and others. Generates images from the same prompt, logs metadata, and stores run history. Use when evaluating which image model to use for a project.
metadata:
  {"openclaw": {"emoji": "🖼️", "requires": {"bins": ["python3"], "env": ["OPENAI_API_KEY"]}, "primaryEnv": "OPENAI_API_KEY", "network": {"outbound": true, "reason": "Calls OpenAI DALL-E API for paid image generation. Local models (FLUX via mflux) run on-device."}}}
---

# Image Gen Compare

Generate images from the same prompt across multiple models and compare results. Tracks costs, generation time, and quality for informed model selection.

## Supported Models

| Model | Type | Cost | Speed (M4) |
|---|---|---|---|
| DALL-E 3 | Cloud (OpenAI) | ~$0.04-0.08/img | 5-10s |
| FLUX.1-schnell | Local (mflux) | Free | ~105s |
| Gemini Imagen 4.0 | Cloud (Google) | $0.04-0.13/img | 3-8s |
| SDXL-Turbo | Local (diffusers) | Free | ~15s (512px) |

## Usage

```bash
python3 scripts/image_gen_compare.py --prompt "cyberpunk alley at night"
python3 scripts/image_gen_compare.py --model dalle3  # Single model
python3 scripts/image_gen_compare.py --list           # Previous runs
```

## Key Lesson

Gemini (Imagen 4.0) beats fine-tuned SD 1.5 with zero training. Use commercial APIs for production quality; local models for experimentation, privacy, and offline use.

## Files

- `scripts/image_gen_compare.py` — Comparison script with metadata logging

Overview

This skill runs side-by-side image generation across paid and local models so you can compare output, cost, and speed. It automates same-prompt generations, captures metadata, and stores a searchable run history. Use it to evaluate trade-offs between cloud APIs and local models for production or experimentation.

How this skill works

You provide a single prompt and the script invokes multiple backends (e.g., DALL-E 3, Gemini Imagen, FLUX.1-schnell, SDXL-Turbo) to generate images from the same prompt. It logs generation time, costs (when available), model identifiers, and output files, then records each run in a history store for later inspection. Optional flags select single models, list past runs, or export metadata for analysis.

When to use it

Choosing an image model for production vs experimental use
Benchmarking quality, latency, and cost across providers
Privacy-sensitive projects where local models are preferred
Evaluating new models after updates or fine-tuning
Comparing outputs for the same creative brief across vendors

Best practices

Keep prompts identical across models to ensure fair comparisons
Record and review metadata: cost per image, latency, seed, and model version
Run multiple seeds or variations to observe consistency and failure modes
Use cloud APIs for grayscale production quality; use local models for privacy and offline work
Store run history and outputs with clear naming and timestamps for reproducibility

Example use cases

Compare DALL-E 3 vs Gemini Imagen for a product marketing image and factor cost into provider choice
Validate that a locally hosted SDXL-Turbo instance produces acceptable results before committing to on-prem deployment
Measure generation latency for real-time applications and pick a model that meets response-time requirements
Archive model outputs and metadata after A/B testing different prompts for a campaign
Audit differences between commercial API outputs and local open-source models for privacy compliance

FAQ

Can I run only one model at a time?

Yes. Use the single-model flag to invoke one backend and still record metadata and store the output.

Does it track cost for cloud models?

Yes. When cost data is available from the provider or local pricing settings, the script logs estimated cost per image alongside time and model version.

Is run history searchable?

Run history is stored with prompt, timestamp, model, and output references so you can list and filter previous comparisons.