home / skills / smallnest / langgraphgo / baoyu-image-gen

This skill generates high-quality images using OpenAI and Google APIs with prompts, references, aspect ratios, and quality presets.

npx playbooks add skill smallnest/langgraphgo --skill baoyu-image-gen

Review the files below or copy the command above to add this skill to your agents.

Files (4)
SKILL.md
7.1 KB
---
name: baoyu-image-gen
description: AI SDK-based image generation using official OpenAI and Google APIs. Supports text-to-image, reference images, aspect ratios, and quality presets.
tools:
  - name: generate_comic_image
    script: scripts/main.ts
    description: 生成单张漫画图像(需要提示词和路径)
    parameters:
      prompt:
        type: string
        description: 图像生成提示词
        required: true
      path:
        type: string
        description: 输出文件路径
        required: true
      ar:
        type: string
        description: 宽高比
        required: false
      quality:
        type: string
        description: 质量预设
        required: false
---

# Image Generation (AI SDK)

Official API-based image generation via AI SDK. Supports OpenAI (DALL-E, GPT Image) and Google (Imagen, Gemini multimodal).

## Script Directory

**Important**: All scripts are located in the `scripts/` subdirectory of this skill.

**Agent Execution Instructions**:
1. Determine this SKILL.md file's directory path as `SKILL_DIR`
2. Script path = `${SKILL_DIR}/scripts/<script-name>.ts`
3. Replace all `${SKILL_DIR}` in this document with the actual path

**Script Reference**:
| Script | Purpose |
|--------|---------|
| `scripts/main.ts` | CLI entry point for image generation |

## Quick Start

```bash
# Basic generation (auto-detect provider)
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A cat" --image cat.png

# With aspect ratio
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A landscape" --image landscape.png --ar 16:9

# High quality (2k)
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A cat" --image cat.png --quality 2k

# Specific provider
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A cat" --image cat.png --provider openai

# From prompt files
npx -y bun ${SKILL_DIR}/scripts/main.ts --promptfiles system.md content.md --image out.png

# With reference images (Google multimodal only)
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "Make blue" --image out.png --ref source.png
```

## Commands

### Basic Image Generation

```bash
# Generate with prompt
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A sunset over mountains" --image sunset.png

# Shorthand
npx -y bun ${SKILL_DIR}/scripts/main.ts -p "A cute robot" --image robot.png
```

### Aspect Ratios

```bash
# Common ratios: 1:1, 16:9, 9:16, 4:3, 3:4, 2.35:1
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A portrait" --image portrait.png --ar 3:4

# Or specify exact size
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "Banner" --image banner.png --size 1792x1024
```

### Reference Images (Google Multimodal)

```bash
# Image editing with reference
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "Make it blue" --image blue.png --ref original.png

# Multiple references
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "Combine these styles" --image out.png --ref a.png b.png
```

### Quality Presets

```bash
# Normal quality (default)
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A cat" --image cat.png --quality normal

# High quality (2k resolution)
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A cat" --image cat.png --quality 2k
```

### Output Formats

```bash
# Plain output (prints saved path)
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A cat" --image cat.png

# JSON output
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A cat" --image cat.png --json
```

## Options

| Option | Description |
|--------|-------------|
| `--prompt <text>`, `-p` | Prompt text |
| `--promptfiles <files...>` | Read prompt from files (concatenated) |
| `--image <path>` | Output image path (required) |
| `--provider google\|openai` | Force provider (default: google) |
| `--model <id>`, `-m` | Model ID |
| `--ar <ratio>` | Aspect ratio (e.g., `16:9`, `1:1`, `4:3`) |
| `--size <WxH>` | Size (e.g., `1024x1024`) |
| `--quality normal\|2k` | Quality preset (default: normal) |
| `--ref <files...>` | Reference images (Google multimodal only) |
| `--n <count>` | Number of images |
| `--json` | JSON output |
| `--help`, `-h` | Show help |

## Environment Variables

| Variable | Description | Default |
|----------|-------------|---------|
| `OPENAI_API_KEY` | OpenAI API key | - |
| `GOOGLE_API_KEY` | Google API key | - |
| `OPENAI_IMAGE_MODEL` | OpenAI model | `gpt-image-1.5` |
| `GOOGLE_IMAGE_MODEL` | Google model | `gemini-3-pro-image-preview` |
| `OPENAI_BASE_URL` | Custom OpenAI endpoint | - |
| `GOOGLE_BASE_URL` | Custom Google endpoint | - |

**Load Priority**: CLI args > `process.env` > `<cwd>/.baoyu-skills/.env` > `~/.baoyu-skills/.env`

## Provider & Model Strategy

### Auto-Selection

1. If `--provider` specified → use it
2. If only one API key available → use that provider
3. If both available → default to Google (multimodal LLMs more versatile)

### API Selection by Model Type

| Model Category | API Function | Example Models |
|----------------|--------------|----------------|
| Google Multimodal | `generateText` | `gemini-2.0-flash-exp-image-generation` |
| Google Imagen | `experimental_generateImage` | `imagen-3.0-generate-002` |
| OpenAI | `experimental_generateImage` | `gpt-image-1`, `dall-e-3` |

### Available Models

**Google**:
- `gemini-3-pro-image-preview` - Default, multimodal generation
- `gemini-2.0-flash-exp-image-generation` - Gemini 2.0 Flash
- `imagen-3.0-generate-002` - Imagen 3

**OpenAI**:
- `gpt-image-1.5` - Default, GPT Image 1.5
- `gpt-image-1` - GPT Image 1
- `dall-e-3` - DALL-E 3

## Quality Presets

| Preset | OpenAI | Google | Use Case |
|--------|--------|--------|----------|
| `normal` | 1024x1024 | Default | Covers, illustrations |
| `2k` | 2048x2048 | "2048px" in prompt | Infographics, slides |

## Aspect Ratio Handling

- **Multimodal LLMs**: Embedded in prompt (e.g., `"... aspect ratio 16:9"`)
- **Image-only models**: Uses `aspectRatio` or `size` parameter
- **Common ratios**: 1:1, 16:9, 9:16, 4:3, 3:4, 2.35:1

## Examples

### Generate Cover Image

```bash
npx -y bun ${SKILL_DIR}/scripts/main.ts \
  --prompt "A minimalist tech illustration with blue gradients" \
  --image cover.png --ar 2.35:1 --quality 2k
```

### Generate Social Media Post

```bash
npx -y bun ${SKILL_DIR}/scripts/main.ts \
  --prompt "Instagram post about coffee" \
  --image post.png --ar 1:1
```

### Edit Image with Reference

```bash
npx -y bun ${SKILL_DIR}/scripts/main.ts \
  --prompt "Change the background to sunset" \
  --image edited.png --ref original.png --provider google
```

### Batch Generation from Prompt File

```bash
# Create prompt file with detailed instructions
npx -y bun ${SKILL_DIR}/scripts/main.ts \
  --promptfiles style-guide.md scene-description.md \
  --image scene.png
```

## Error Handling

- **Missing API key**: Clear error with setup instructions
- **Generation failure**: Auto-retry once, then error
- **Invalid aspect ratio**: Warning, proceed with default
- **Reference images with image-only model**: Warning, ignore refs

## Extension Support

Custom configurations via EXTEND.md.

**Check paths** (priority order):
1. `.baoyu-skills/baoyu-image-gen/EXTEND.md` (project)
2. `~/.baoyu-skills/baoyu-image-gen/EXTEND.md` (user)

If found, load before workflow. Extension content overrides defaults.

Overview

This skill provides CLI-driven image generation using official OpenAI and Google APIs via an AI SDK. It supports text-to-image, reference-image editing, aspect ratios, size presets, and quality levels for quick, reproducible image outputs. The tool auto-selects provider based on available keys but allows explicit provider and model selection.

How this skill works

Scripts in the scripts/ directory expose a main CLI that accepts prompt text, prompt files, reference images, output path, and generation options. The CLI picks a provider (OpenAI or Google) based on flags and available API keys, maps quality/aspect settings to provider parameters, calls the respective SDK image endpoints, saves images, and can print JSON metadata. It retries once on failure and warns/ignores incompatible options like references with image-only models.

When to use it

  • Generate single or batch images from plain prompts or prompt files.
  • Create variants with specific aspect ratios or exact sizes for cover art, banners, and social posts.
  • Produce higher-resolution outputs (2k) for presentations or infographics.
  • Edit images using reference files when using Google multimodal models.
  • Integrate image generation into agent workflows where OpenAI or Google APIs are available.

Best practices

  • Prepare prompts in dedicated files for complex or repeatable scenes and pass them with --promptfiles for consistency.
  • Specify --image to set the exact output path; use --json for machine-readable results in pipelines.
  • Pick quality presets: use normal for quick iterations and 2k for final assets requiring more detail.
  • When both API keys exist, set --provider explicitly if you need a specific model behavior.
  • Provide --ref only when using Google multimodal models; the CLI will warn and ignore refs for image-only models.

Example use cases

  • Generate a 2.35:1 cover image for an article with --ar 2.35:1 and --quality 2k.
  • Create square social media images with --ar 1:1 for Instagram posts.
  • Edit a photo by supplying --ref original.png and a targeted prompt with --provider google.
  • Batch-produce illustrated scenes by supplying multiple prompt files via --promptfiles.
  • Run CI tasks that emit JSON metadata by adding --json for automated asset ingestion.

FAQ

Which provider is used if I don't specify one?

If you don't set --provider, the tool uses the provider selection rules: use the only available API key, or default to Google when both keys exist.

How do I include multiple reference images?

Pass multiple files to --ref (e.g., --ref a.png b.png). Reference images are supported for Google multimodal models; other providers will ignore them with a warning.