home / skills / pexoai / pexo-skills / videoagent-image-studio

videoagent-image-studio skill

/skills/videoagent-image-studio

This skill lets you generate any image with one command using multiple AI models, handling prompts and model selection automatically.

npx playbooks add skill pexoai/pexo-skills --skill videoagent-image-studio

Review the files below or copy the command above to add this skill to your agents.

Files (6)
SKILL.md
6.6 KB
---
name: videoagent-image-studio
version: 2.0.0
author: "wells"
emoji: "🎨"
tags:
  - video
  - image-generation
  - midjourney
  - flux
  - gemini
  - fal
  - ideogram
  - recraft
description: >
  Tired of juggling 8 API keys? This skill gives you one-command access to Midjourney, Flux, Ideogram, and more, with zero setup. Use when you want to generate any image without worrying about API keys.
homepage: https://github.com/pexoai/image-studio-skill
metadata:
  openclaw:
    emoji: "🎨"
    install:
      - id: node
        kind: node
        label: "No dependencies needed — all calls go through the hosted proxy"
---

# 🎨 VideoAgent Image Studio

**Use when:** User asks to generate, draw, create, or make any kind of image, photo, illustration, icon, logo, or artwork.

Generate images with 8 state-of-the-art AI models. This skill automatically picks the best model for the job and handles all the complexity — including Midjourney's async polling — so you can focus on the conversation.

---

## Quick Reference

| User Intent | Model | Speed |
|---|---|---|
| Artistic, cinematic, painterly | `midjourney` | ~15s |
| Photorealistic, portrait, product | `flux-pro` | ~8s |
| General purpose, balanced | `flux-dev` | ~10s |
| Quick draft, fast iteration | `flux-schnell` | ~2s |
| Image with text, logo, poster | `ideogram` | ~10s |
| Vector art, icon, flat design | `recraft` | ~8s |
| Anime, stylized illustration | `sdxl` | ~5s |
| Gemini-powered, consistent style | `nano-banana` | ~12s |

---

## How to Generate an Image

### Step 1 — Enhance the prompt

Before calling the script, expand the user's prompt with style, lighting, and quality descriptors appropriate for the chosen model.

- **Midjourney**: Add `cinematic lighting`, `ultra detailed`, `--v 7`, `--style raw`
- **Flux**: Add `masterpiece`, `highly detailed`, `sharp focus`, `professional photography`
- **Ideogram**: Be explicit about text content, font style, and layout
- **Recraft**: Specify `vector illustration`, `flat design`, `icon style`

### Step 2 — Run the script

```bash
node {baseDir}/tools/generate.js \
  --model <model_id> \
  --prompt "<enhanced prompt>" \
  --aspect-ratio <ratio>
```

**All parameters:**

| Parameter | Default | Description |
|---|---|---|
| `--model` | `flux-dev` | Model ID from the table above |
| `--prompt` | *(required)* | The image generation prompt |
| `--aspect-ratio` | `1:1` | `1:1`, `16:9`, `9:16`, `4:3`, `3:4`, `3:2`, `21:9` |
| `--num-images` | `1` | Number of images (1–4; Midjourney always returns 4) |
| `--negative-prompt` | — | Things to avoid (not supported by Midjourney) |
| `--seed` | — | Seed for reproducibility |

### Step 3 — Return the result

The script always waits and returns the final image URL(s). No polling required.

```json
{
  "success": true,
  "model": "flux-pro",
  "imageUrl": "https://...",
  "images": ["https://..."]
}
```

Send the `imageUrl` to the user.

---

## Midjourney Actions

After generating a 4-image grid with Midjourney, offer the user these options:

```bash
# Upscale image #2 (subtle, preserves details)
node {baseDir}/tools/generate.js \
  --model midjourney \
  --action upscale \
  --index 2 \
  --job-id <job_id>

# Create a strong variation of image #3
node {baseDir}/tools/generate.js \
  --model midjourney \
  --action variation \
  --index 3 \
  --job-id <job_id> \
  --variation-type 1

# Regenerate with same prompt
node {baseDir}/tools/generate.js \
  --model midjourney \
  --action reroll \
  --job-id <job_id>
```

**Upscale types:** `0` = Subtle (default, best for photos), `1` = Creative (best for illustrations)

**Variation types:** `0` = Subtle (default), `1` = Strong (dramatic changes)

---

## Example Conversations

**User:** "Draw a snow leopard on a snowy mountain with cinematic lighting"

```bash
# Choose midjourney for artistic quality
node {baseDir}/tools/generate.js \
  --model midjourney \
  --prompt "a majestic snow leopard on a snowy mountain peak, cinematic lighting, dramatic atmosphere, ultra detailed --ar 16:9 --v 7" \
  --aspect-ratio 16:9
```

> 🎨 Done! Which one to upscale? (U1-U4) Or create a variant? (V1-V4)

---

**User:** "Use Flux to generate a perfume product poster, white background"

```bash
# Choose flux-pro for photorealistic product shots
node {baseDir}/tools/generate.js \
  --model flux-pro \
  --prompt "a luxury perfume bottle on a clean white background, professional product photography, soft shadows, 8k, highly detailed" \
  --aspect-ratio 3:4
```

---

**User:** "Show me a quick draft"

```bash
# flux-schnell for instant previews
node {baseDir}/tools/generate.js \
  --model flux-schnell \
  --prompt "..." \
  --aspect-ratio 1:1
```

---

**User:** "Make me an App icon, flat style, blue theme"

```bash
# recraft for vector/icon style
node {baseDir}/tools/generate.js \
  --model recraft \
  --prompt "a minimal flat design app icon, blue color scheme, simple geometric shapes, vector style, white background"
```

---

## Setup

**Zero API keys needed!** All requests go through a hosted proxy that handles authentication server-side.

The skill works out of the box — just install and use.

### Advanced: Custom proxy or token

If you want to use your own proxy or a persistent token, set these environment variables:

```json
{
  "skills": {
    "entries": {
      "videoagent-image-studio": {
        "enabled": true,
        "env": {
          "IMAGE_STUDIO_PROXY_URL": "https://your-proxy.vercel.app",
          "IMAGE_STUDIO_TOKEN": "your_token_here"
        }
      }
    }
  }
}
```

| Variable | Required | Description |
|---|---|---|
| `IMAGE_STUDIO_PROXY_URL` | No | Custom proxy base URL (default: `https://image-gen-proxy.vercel.app`) |
| `IMAGE_STUDIO_TOKEN` | No | Persistent token (auto-obtained if not set, 100 free uses per token) |

To deploy your own proxy, see the [videoagent-audio-studio proxy](../videoagent-audio-studio/proxy/) as a reference implementation. You'll need `FAL_KEY` and `LEGNEXT_KEY` as Vercel environment variables.

---

## Changelog

### v2.0.0
- **Simplified async**: The script now blocks until Midjourney completes. No more `--async` / `--poll` flags needed in SKILL.md instructions.
- **Unified output format**: All models return the same `{ success, imageUrl, images }` shape.
- **Reference images for Nano Banana**: Pass `--reference-images "url1,url2"` for character/style consistency across generations.

### v1.3.0
- Added non-blocking async mode for Midjourney (`--async` + `--poll`).

### v1.2.0
- Midjourney turbo mode enabled by default (~10-20s).

### v1.1.0
- Switched Midjourney provider from TTAPI to Legnext.ai for better stability.

### v1.0.0
- Initial release with Midjourney, Flux, SDXL, Nano Banana, Ideogram, Recraft.

Overview

This skill gives one-command access to eight state-of-the-art image generation models so you can create photos, illustrations, icons, logos, and more without managing API keys. It automatically selects the best model for the job or lets you pick a specific engine for style, speed, or format. Results are returned as stable image URLs in a unified response shape for easy delivery to users.

How this skill works

The skill routes requests through a hosted proxy that handles authentication and model orchestration, so no API keys are required by the caller. You provide a prompt (optionally enhanced with style, lighting, and quality descriptors) and the selected model generates images; the script waits for completion and returns imageUrl and images arrays. Midjourney-specific flows (grids, upscales, variations) are supported with simple action commands and the same unified output format.

When to use it

  • User asks to generate any image, photo, illustration, icon, logo, or artwork
  • You want quick drafts and fast iteration without key management
  • You need model-specific strengths (cinematic art, photorealism, vector icons, anime)
  • You want consistent outputs with optional reference images or reproducible seeds

Best practices

  • Enhance prompts with model-appropriate descriptors (lighting, detail, camera terms) before sending
  • Pick model by intent: midjourney for cinematic/artistic, flux-pro for photorealistic product shots, recraft for vector/icon work
  • Use aspect-ratio and num-images to control framing and batch size; Midjourney always returns a 4-image grid
  • For Midjourney, offer users upscale/variation/reroll options after the initial grid
  • Provide negative-prompts or seeds when reproducibility or avoidance is important

Example use cases

  • Generate a cinematic snow leopard scene with Midjourney-style descriptors and return four options for user selection
  • Create a photorealistic perfume product shot with flux-pro for marketing assets
  • Produce a quick app icon in flat, vector style using recraft and return a single PNG/SVG URL
  • Make a poster with text using ideogram and explicit font/layout instructions
  • Iterate fast on concepts using flux-schnell to get near-instant previews

FAQ

Do I need to supply API keys or tokens?

No. The hosted proxy handles authentication server-side so you can generate images without any API keys by default.

Can I use my own proxy or persistent token?

Yes. You can set a custom proxy URL and token via environment variables to route requests through your own infrastructure if desired.