home / skills / smallnest / langgraphgo / baoyu-image-gen

baoyu-image-gen skill

safe

/examples/comic_skill_example/skills/baoyu-image-gen

This skill generates high-quality images using OpenAI and Google APIs with prompts, references, aspect ratios, and quality presets.

npx playbooks add skill smallnest/langgraphgo --skill baoyu-image-gen

Review the files below or copy the command above to add this skill to your agents.

Files (4)

SKILL.md

7.1 KB

---
name: baoyu-image-gen
description: AI SDK-based image generation using official OpenAI and Google APIs. Supports text-to-image, reference images, aspect ratios, and quality presets.
tools:
  - name: generate_comic_image
    script: scripts/main.ts
    description: 生成单张漫画图像（需要提示词和路径）
    parameters:
      prompt:
        type: string
        description: 图像生成提示词
        required: true
      path:
        type: string
        description: 输出文件路径
        required: true
      ar:
        type: string
        description: 宽高比
        required: false
      quality:
        type: string
        description: 质量预设
        required: false
---

# Image Generation (AI SDK)

Official API-based image generation via AI SDK. Supports OpenAI (DALL-E, GPT Image) and Google (Imagen, Gemini multimodal).

## Script Directory

**Important**: All scripts are located in the `scripts/` subdirectory of this skill.

**Agent Execution Instructions**:
1. Determine this SKILL.md file's directory path as `SKILL_DIR`
2. Script path = `${SKILL_DIR}/scripts/<script-name>.ts`
3. Replace all `${SKILL_DIR}` in this document with the actual path

**Script Reference**:
| Script | Purpose |
|--------|---------|
| `scripts/main.ts` | CLI entry point for image generation |

## Quick Start

```bash
# Basic generation (auto-detect provider)
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A cat" --image cat.png

# With aspect ratio
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A landscape" --image landscape.png --ar 16:9

# High quality (2k)
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A cat" --image cat.png --quality 2k

# Specific provider
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A cat" --image cat.png --provider openai

# From prompt files
npx -y bun ${SKILL_DIR}/scripts/main.ts --promptfiles system.md content.md --image out.png

# With reference images (Google multimodal only)
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "Make blue" --image out.png --ref source.png
```

## Commands

### Basic Image Generation

```bash
# Generate with prompt
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A sunset over mountains" --image sunset.png

# Shorthand
npx -y bun ${SKILL_DIR}/scripts/main.ts -p "A cute robot" --image robot.png
```

### Aspect Ratios

```bash
# Common ratios: 1:1, 16:9, 9:16, 4:3, 3:4, 2.35:1
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A portrait" --image portrait.png --ar 3:4

# Or specify exact size
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "Banner" --image banner.png --size 1792x1024
```

### Reference Images (Google Multimodal)

```bash
# Image editing with reference
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "Make it blue" --image blue.png --ref original.png

# Multiple references
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "Combine these styles" --image out.png --ref a.png b.png
```

### Quality Presets

```bash
# Normal quality (default)
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A cat" --image cat.png --quality normal

# High quality (2k resolution)
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A cat" --image cat.png --quality 2k
```

### Output Formats

```bash
# Plain output (prints saved path)
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A cat" --image cat.png

# JSON output
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A cat" --image cat.png --json
```

## Options

| Option | Description |
|--------|-------------|
| `--prompt <text>`, `-p` | Prompt text |
| `--promptfiles <files...>` | Read prompt from files (concatenated) |
| `--image <path>` | Output image path (required) |
| `--provider google\|openai` | Force provider (default: google) |
| `--model <id>`, `-m` | Model ID |
| `--ar <ratio>` | Aspect ratio (e.g., `16:9`, `1:1`, `4:3`) |
| `--size <WxH>` | Size (e.g., `1024x1024`) |
| `--quality normal\|2k` | Quality preset (default: normal) |
| `--ref <files...>` | Reference images (Google multimodal only) |
| `--n <count>` | Number of images |
| `--json` | JSON output |
| `--help`, `-h` | Show help |

## Environment Variables

| Variable | Description | Default |
|----------|-------------|---------|
| `OPENAI_API_KEY` | OpenAI API key | - |
| `GOOGLE_API_KEY` | Google API key | - |
| `OPENAI_IMAGE_MODEL` | OpenAI model | `gpt-image-1.5` |
| `GOOGLE_IMAGE_MODEL` | Google model | `gemini-3-pro-image-preview` |
| `OPENAI_BASE_URL` | Custom OpenAI endpoint | - |
| `GOOGLE_BASE_URL` | Custom Google endpoint | - |

**Load Priority**: CLI args > `process.env` > `<cwd>/.baoyu-skills/.env` > `~/.baoyu-skills/.env`

## Provider & Model Strategy

### Auto-Selection

1. If `--provider` specified → use it
2. If only one API key available → use that provider
3. If both available → default to Google (multimodal LLMs more versatile)

### API Selection by Model Type

| Model Category | API Function | Example Models |
|----------------|--------------|----------------|
| Google Multimodal | `generateText` | `gemini-2.0-flash-exp-image-generation` |
| Google Imagen | `experimental_generateImage` | `imagen-3.0-generate-002` |
| OpenAI | `experimental_generateImage` | `gpt-image-1`, `dall-e-3` |

### Available Models

**Google**:
- `gemini-3-pro-image-preview` - Default, multimodal generation
- `gemini-2.0-flash-exp-image-generation` - Gemini 2.0 Flash
- `imagen-3.0-generate-002` - Imagen 3

**OpenAI**:
- `gpt-image-1.5` - Default, GPT Image 1.5
- `gpt-image-1` - GPT Image 1
- `dall-e-3` - DALL-E 3

## Quality Presets

| Preset | OpenAI | Google | Use Case |
|--------|--------|--------|----------|
| `normal` | 1024x1024 | Default | Covers, illustrations |
| `2k` | 2048x2048 | "2048px" in prompt | Infographics, slides |

## Aspect Ratio Handling

- **Multimodal LLMs**: Embedded in prompt (e.g., `"... aspect ratio 16:9"`)
- **Image-only models**: Uses `aspectRatio` or `size` parameter
- **Common ratios**: 1:1, 16:9, 9:16, 4:3, 3:4, 2.35:1

## Examples

### Generate Cover Image

```bash
npx -y bun ${SKILL_DIR}/scripts/main.ts \
  --prompt "A minimalist tech illustration with blue gradients" \
  --image cover.png --ar 2.35:1 --quality 2k
```

### Generate Social Media Post

```bash
npx -y bun ${SKILL_DIR}/scripts/main.ts \
  --prompt "Instagram post about coffee" \
  --image post.png --ar 1:1
```

### Edit Image with Reference

```bash
npx -y bun ${SKILL_DIR}/scripts/main.ts \
  --prompt "Change the background to sunset" \
  --image edited.png --ref original.png --provider google
```

### Batch Generation from Prompt File

```bash
# Create prompt file with detailed instructions
npx -y bun ${SKILL_DIR}/scripts/main.ts \
  --promptfiles style-guide.md scene-description.md \
  --image scene.png
```

## Error Handling

- **Missing API key**: Clear error with setup instructions
- **Generation failure**: Auto-retry once, then error
- **Invalid aspect ratio**: Warning, proceed with default
- **Reference images with image-only model**: Warning, ignore refs

## Extension Support

Custom configurations via EXTEND.md.

**Check paths** (priority order):
1. `.baoyu-skills/baoyu-image-gen/EXTEND.md` (project)
2. `~/.baoyu-skills/baoyu-image-gen/EXTEND.md` (user)

If found, load before workflow. Extension content overrides defaults.

Overview

This skill provides CLI-driven image generation using official OpenAI and Google APIs via an AI SDK. It supports text-to-image, reference-image editing, aspect ratios, size presets, and quality levels for quick, reproducible image outputs. The tool auto-selects provider based on available keys but allows explicit provider and model selection.

How this skill works

Scripts in the scripts/ directory expose a main CLI that accepts prompt text, prompt files, reference images, output path, and generation options. The CLI picks a provider (OpenAI or Google) based on flags and available API keys, maps quality/aspect settings to provider parameters, calls the respective SDK image endpoints, saves images, and can print JSON metadata. It retries once on failure and warns/ignores incompatible options like references with image-only models.

When to use it

Generate single or batch images from plain prompts or prompt files.
Create variants with specific aspect ratios or exact sizes for cover art, banners, and social posts.
Produce higher-resolution outputs (2k) for presentations or infographics.
Edit images using reference files when using Google multimodal models.
Integrate image generation into agent workflows where OpenAI or Google APIs are available.

Best practices

Prepare prompts in dedicated files for complex or repeatable scenes and pass them with --promptfiles for consistency.
Specify --image to set the exact output path; use --json for machine-readable results in pipelines.
Pick quality presets: use normal for quick iterations and 2k for final assets requiring more detail.
When both API keys exist, set --provider explicitly if you need a specific model behavior.
Provide --ref only when using Google multimodal models; the CLI will warn and ignore refs for image-only models.

Example use cases

Generate a 2.35:1 cover image for an article with --ar 2.35:1 and --quality 2k.
Create square social media images with --ar 1:1 for Instagram posts.
Edit a photo by supplying --ref original.png and a targeted prompt with --provider google.
Batch-produce illustrated scenes by supplying multiple prompt files via --promptfiles.
Run CI tasks that emit JSON metadata by adding --json for automated asset ingestion.

FAQ

Which provider is used if I don't specify one?

If you don't set --provider, the tool uses the provider selection rules: use the only available API key, or default to Google when both keys exist.

How do I include multiple reference images?

Pass multiple files to --ref (e.g., --ref a.png b.png). Reference images are supported for Google multimodal models; other providers will ignore them with a warning.