home / skills / jst-well-dan / skill-box / baoyu-image-gen

baoyu-image-gen skill

unsafe

This skill generates images using official OpenAI and Google APIs, supporting prompts, references, aspect ratios, and quality presets.

npx playbooks add skill jst-well-dan/skill-box --skill baoyu-image-gen

Review the files below or copy the command above to add this skill to your agents.

Files (5)

SKILL.md

6.2 KB

---
name: baoyu-image-gen
description: Generates images using official OpenAI and Google APIs via AI SDK. Supports text-to-image, reference images, aspect ratios, and quality presets. Use when user asks to "generate image with API", "use official API for images", "create image with OpenAI/Google", or needs API-based generation instead of browser-based.
---

# Image Generation (AI SDK)

Official API-based image generation. Supports OpenAI and Google providers.

## Script Directory

**Agent Execution**:
1. `SKILL_DIR` = this SKILL.md file's directory
2. Script path = `${SKILL_DIR}/scripts/main.ts`

## Preferences (EXTEND.md)

Use Bash to check EXTEND.md existence (priority order):

```bash
# Check project-level first
test -f .baoyu-skills/baoyu-image-gen/EXTEND.md && echo "project"

# Then user-level (cross-platform: $HOME works on macOS/Linux/WSL)
test -f "$HOME/.baoyu-skills/baoyu-image-gen/EXTEND.md" && echo "user"
```

┌──────────────────────────────────────────────────┬───────────────────┐
│                       Path                       │     Location      │
├──────────────────────────────────────────────────┼───────────────────┤
│ .baoyu-skills/baoyu-image-gen/EXTEND.md          │ Project directory │
├──────────────────────────────────────────────────┼───────────────────┤
│ $HOME/.baoyu-skills/baoyu-image-gen/EXTEND.md    │ User home         │
└──────────────────────────────────────────────────┴───────────────────┘

┌───────────┬───────────────────────────────────────────────────────────────────────────┐
│  Result   │                                  Action                                   │
├───────────┼───────────────────────────────────────────────────────────────────────────┤
│ Found     │ Read, parse, apply settings                                               │
├───────────┼───────────────────────────────────────────────────────────────────────────┤
│ Not found │ Use defaults                                                              │
└───────────┴───────────────────────────────────────────────────────────────────────────┘

**EXTEND.md Supports**: Default provider | Default quality | Default aspect ratio

## Usage

```bash
# Basic
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A cat" --image cat.png

# With aspect ratio
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A landscape" --image out.png --ar 16:9

# High quality
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A cat" --image out.png --quality 2k

# From prompt files
npx -y bun ${SKILL_DIR}/scripts/main.ts --promptfiles system.md content.md --image out.png

# With reference images (Google multimodal only)
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "Make blue" --image out.png --ref source.png

# Specific provider
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A cat" --image out.png --provider openai
```

## Options

| Option | Description |
|--------|-------------|
| `--prompt <text>`, `-p` | Prompt text |
| `--promptfiles <files...>` | Read prompt from files (concatenated) |
| `--image <path>` | Output image path (required) |
| `--provider google\|openai` | Force provider (default: google) |
| `--model <id>`, `-m` | Model ID |
| `--ar <ratio>` | Aspect ratio (e.g., `16:9`, `1:1`, `4:3`) |
| `--size <WxH>` | Size (e.g., `1024x1024`) |
| `--quality normal\|2k` | Quality preset (default: 2k) |
| `--imageSize 1K\|2K\|4K` | Image size for Google (default: from quality) |
| `--ref <files...>` | Reference images (Google multimodal only) |
| `--n <count>` | Number of images |
| `--json` | JSON output |

## Environment Variables

| Variable | Description |
|----------|-------------|
| `OPENAI_API_KEY` | OpenAI API key |
| `GOOGLE_API_KEY` | Google API key |
| `OPENAI_IMAGE_MODEL` | OpenAI model override |
| `GOOGLE_IMAGE_MODEL` | Google model override |
| `OPENAI_BASE_URL` | Custom OpenAI endpoint |
| `GOOGLE_BASE_URL` | Custom Google endpoint |

**Load Priority**: CLI args > env vars > `<cwd>/.baoyu-skills/.env` > `~/.baoyu-skills/.env`

## Provider Selection

1. `--provider` specified → use it
2. Only one API key available → use that provider
3. Both available → default to Google

## Quality Presets

| Preset | Google imageSize | OpenAI Size | Use Case |
|--------|------------------|-------------|----------|
| `normal` | 1K | 1024px | Quick previews |
| `2k` (default) | 2K | 2048px | Covers, illustrations, infographics |

**Google imageSize**: Can be overridden with `--imageSize 1K|2K|4K`

## Aspect Ratios

Supported: `1:1`, `16:9`, `9:16`, `4:3`, `3:4`, `2.35:1`

- Google multimodal: uses `imageConfig.aspectRatio`
- Google Imagen: uses `aspectRatio` parameter
- OpenAI: maps to closest supported size

## Error Handling

- Missing API key → error with setup instructions
- Generation failure → auto-retry once
- Invalid aspect ratio → warning, proceed with default
- Reference images with non-multimodal model → warning, ignore refs

## Extension Support

Custom configurations via EXTEND.md. See **Preferences** section for paths and supported options.

Overview

This skill generates images using official OpenAI and Google APIs via an AI SDK. It supports text-to-image generation, reference images (Google multimodal), selectable aspect ratios, and quality presets for quick previews or high-resolution outputs. Use it when you need programmatic, API-driven image creation rather than browser-based tools.

How this skill works

The script invokes Google or OpenAI image endpoints based on CLI flags, available API keys, or default settings. It accepts prompts (inline or from files), optional reference images for Google multimodal, aspect ratio or explicit size, quality presets, and saves one or more output files. The tool validates inputs, retries once on failures, and falls back to defaults when configurations are missing.

When to use it

Generate images programmatically from the command line using official OpenAI or Google APIs.
Create illustrations, covers, or infographics with specific aspect ratios or quality presets.
Include reference images to guide Google multimodal generation.
Automate image production in CI scripts, content pipelines, or local tooling.
Prefer API-based generation when you require reproducible, scriptable outputs instead of a web UI.

Best practices

Set the appropriate API key environment variable (OPENAI_API_KEY or GOOGLE_API_KEY) and verify access before running.
Choose a quality preset (normal or 2k) that balances speed and resolution for your use case.
Use prompt files for long or complex system+content prompts to keep commands readable.
Specify aspect ratio or explicit size when output dimensions matter for layout or print.
Provide reference images only with Google multimodal models; the tool warns and ignores refs for unsupported models.

Example use cases

Generate a 16:9 landscape cover image for an article using the 2k preset.
Batch-create product illustrations from a directory of prompts in an automated pipeline.
Create a high-resolution mascot illustration with OpenAI by forcing the provider flag.
Use a reference photo with Google multimodal to recolor or restyle an existing image.
Produce mobile-ready 9:16 story images quickly with the normal quality preset for previews.

FAQ

How does the skill choose between OpenAI and Google?

It uses the --provider flag if provided. If not, it picks the single available API key, or defaults to Google when both keys are present.

What happens if an API key is missing or a request fails?

Missing keys trigger a clear error with setup guidance. Generation failures auto-retry once before reporting an error.