home / skills / jimliu / baoyu-skills / baoyu-image-gen

baoyu-image-gen skill

/skills/baoyu-image-gen

This skill generates AI images using OpenAI, Google, and DashScope with prompts, references, and aspect ratios.

npx playbooks add skill jimliu/baoyu-skills --skill baoyu-image-gen

Review the files below or copy the command above to add this skill to your agents.

Files (7)
SKILL.md
8.0 KB
---
name: baoyu-image-gen
description: AI image generation with OpenAI, Google and DashScope APIs. Supports text-to-image, reference images, aspect ratios. Sequential by default; parallel generation available on request. Use when user asks to generate, create, or draw images.
---

# Image Generation (AI SDK)

Official API-based image generation. Supports OpenAI, Google and DashScope (阿里通义万象) providers.

## Script Directory

**Agent Execution**:
1. `SKILL_DIR` = this SKILL.md file's directory
2. Script path = `${SKILL_DIR}/scripts/main.ts`

## Preferences (EXTEND.md)

Use Bash to check EXTEND.md existence (priority order):

```bash
# Check project-level first
test -f .baoyu-skills/baoyu-image-gen/EXTEND.md && echo "project"

# Then user-level (cross-platform: $HOME works on macOS/Linux/WSL)
test -f "$HOME/.baoyu-skills/baoyu-image-gen/EXTEND.md" && echo "user"
```

┌──────────────────────────────────────────────────┬───────────────────┐
│                       Path                       │     Location      │
├──────────────────────────────────────────────────┼───────────────────┤
│ .baoyu-skills/baoyu-image-gen/EXTEND.md          │ Project directory │
├──────────────────────────────────────────────────┼───────────────────┤
│ $HOME/.baoyu-skills/baoyu-image-gen/EXTEND.md    │ User home         │
└──────────────────────────────────────────────────┴───────────────────┘

┌───────────┬───────────────────────────────────────────────────────────────────────────┐
│  Result   │                                  Action                                   │
├───────────┼───────────────────────────────────────────────────────────────────────────┤
│ Found     │ Read, parse, apply settings                                               │
├───────────┼───────────────────────────────────────────────────────────────────────────┤
│ Not found │ Use defaults                                                              │
└───────────┴───────────────────────────────────────────────────────────────────────────┘

**EXTEND.md Supports**: Default provider | Default quality | Default aspect ratio | Default image size | Default models

Schema: `references/config/preferences-schema.md`

## Usage

```bash
# Basic
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A cat" --image cat.png

# With aspect ratio
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A landscape" --image out.png --ar 16:9

# High quality
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A cat" --image out.png --quality 2k

# From prompt files
npx -y bun ${SKILL_DIR}/scripts/main.ts --promptfiles system.md content.md --image out.png

# With reference images (Google multimodal or OpenAI edits)
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "Make blue" --image out.png --ref source.png

# With reference images (explicit provider/model)
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "Make blue" --image out.png --provider google --model gemini-3-pro-image-preview --ref source.png

# Specific provider
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A cat" --image out.png --provider openai

# DashScope (阿里通义万象)
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "一只可爱的猫" --image out.png --provider dashscope
```

## Options

| Option | Description |
|--------|-------------|
| `--prompt <text>`, `-p` | Prompt text |
| `--promptfiles <files...>` | Read prompt from files (concatenated) |
| `--image <path>` | Output image path (required) |
| `--provider google\|openai\|dashscope` | Force provider (default: google) |
| `--model <id>`, `-m` | Model ID (`--ref` with OpenAI requires GPT Image model, e.g. `gpt-image-1.5`) |
| `--ar <ratio>` | Aspect ratio (e.g., `16:9`, `1:1`, `4:3`) |
| `--size <WxH>` | Size (e.g., `1024x1024`) |
| `--quality normal\|2k` | Quality preset (default: 2k) |
| `--imageSize 1K\|2K\|4K` | Image size for Google (default: from quality) |
| `--ref <files...>` | Reference images. Supported by Google multimodal and OpenAI edits (GPT Image models). If provider omitted: Google first, then OpenAI |
| `--n <count>` | Number of images |
| `--json` | JSON output |

## Environment Variables

| Variable | Description |
|----------|-------------|
| `OPENAI_API_KEY` | OpenAI API key |
| `GOOGLE_API_KEY` | Google API key |
| `DASHSCOPE_API_KEY` | DashScope API key (阿里云) |
| `OPENAI_IMAGE_MODEL` | OpenAI model override |
| `GOOGLE_IMAGE_MODEL` | Google model override |
| `DASHSCOPE_IMAGE_MODEL` | DashScope model override (default: z-image-turbo) |
| `OPENAI_BASE_URL` | Custom OpenAI endpoint |
| `GOOGLE_BASE_URL` | Custom Google endpoint |
| `DASHSCOPE_BASE_URL` | Custom DashScope endpoint |

**Load Priority**: CLI args > EXTEND.md > env vars > `<cwd>/.baoyu-skills/.env` > `~/.baoyu-skills/.env`

## Provider Selection

1. `--ref` provided + no `--provider` → auto-select Google first, then OpenAI
2. `--provider` specified → use it (if `--ref`, must be `google` or `openai`)
3. Only one API key available → use that provider
4. Multiple available → default to Google

## Quality Presets

| Preset | Google imageSize | OpenAI Size | Use Case |
|--------|------------------|-------------|----------|
| `normal` | 1K | 1024px | Quick previews |
| `2k` (default) | 2K | 2048px | Covers, illustrations, infographics |

**Google imageSize**: Can be overridden with `--imageSize 1K|2K|4K`

## Aspect Ratios

Supported: `1:1`, `16:9`, `9:16`, `4:3`, `3:4`, `2.35:1`

- Google multimodal: uses `imageConfig.aspectRatio`
- Google Imagen: uses `aspectRatio` parameter
- OpenAI: maps to closest supported size

## Generation Mode

**Default**: Sequential generation (one image at a time). This ensures stable output and easier debugging.

**Parallel Generation**: Only use when user explicitly requests parallel/concurrent generation.

| Mode | When to Use |
|------|-------------|
| Sequential (default) | Normal usage, single images, small batches |
| Parallel | User explicitly requests, large batches (10+) |

**Parallel Settings** (when requested):

| Setting | Value |
|---------|-------|
| Recommended concurrency | 4 subagents |
| Max concurrency | 8 subagents |
| Use case | Large batch generation when user requests parallel |

**Agent Implementation** (parallel mode only):
```
# Launch multiple generations in parallel using Task tool
# Each Task runs as background subagent with run_in_background=true
# Collect results via TaskOutput when all complete
```

## Error Handling

- Missing API key → error with setup instructions
- Generation failure → auto-retry once
- Invalid aspect ratio → warning, proceed with default
- Reference images with unsupported provider/model → error with fix hint (switch to Google multimodal or OpenAI GPT Image edits)

## Extension Support

Custom configurations via EXTEND.md. See **Preferences** section for paths and supported options.

Overview

This skill provides AI image generation using OpenAI, Google, and DashScope APIs with support for text-to-image, reference-image edits, and configurable aspect ratios. It defaults to sequential generation for stability, with optional parallel mode for large batch runs. The CLI-driven tool accepts prompts, prompt files, reference images, and quality/size presets to produce image files directly.

How this skill works

The tool selects a provider based on CLI flags, available API keys, and reference-image requirements, preferring Google when multiple providers are available. It accepts prompt text or prompt files, optional reference images for multimodal edits, aspect ratio and size options, and outputs images to a specified path. Sequential generation runs each job one at a time; parallel generation launches multiple subagents when explicitly requested for large batches.

When to use it

  • Create single or small batches of images from text prompts (illustrations, covers, icons).
  • Edit or refine images using reference files with Google multimodal or OpenAI GPT Image editing.
  • Generate images with specific aspect ratios for web, print, or social formats.
  • Produce large batches quickly when you explicitly request parallel/concurrent generation.
  • Override provider or model when a specific API capability is required (e.g., DashScope for regional support).

Best practices

  • Prefer default sequential mode for predictable results and easier debugging.
  • Provide clear, detailed prompts and optional prompt files for consistent outputs.
  • Use reference images only with providers that support edits (Google or OpenAI GPT Image models).
  • Select quality presets (normal or 2k) based on use case: previews vs. final assets.
  • Request parallel generation only for large batches (10+), and limit concurrency to recommended settings.

Example use cases

  • Generate a 16:9 landscape illustration for a blog hero image using the default Google provider.
  • Produce four variations of a product concept in 2048px quality and save each to a file.
  • Apply color adjustments to an existing photo by passing it as a reference image with an OpenAI GPT Image model.
  • Create mobile wallpapers in 9:16 aspect ratio in parallel when needing dozens of variants quickly.
  • Use DashScope as the provider for region-specific projects where that API is preferred.

FAQ

How does provider selection work if I don't specify one?

If you omit --provider, the tool picks based on reference usage and available API keys: it tries Google for refs first, falls back to OpenAI, and will use the only available provider if just one key exists.

When should I use parallel generation?

Only request parallel mode for large batch jobs (10+ images) to save time; keep concurrency to the recommended 4 subagents (max 8) to avoid instability.