home / skills / merit-systems / x402scan-skills / media-generation

media-generation skill

safe

This skill generates images and videos from text prompts or edits visuals using StableStudio's x402 models for fast, high-quality visual content.

npx playbooks add skill merit-systems/x402scan-skills --skill media-generation

Review the files below or copy the command above to add this skill to your agents.

Files (2)

SKILL.md

2.7 KB

---
name: media-generation
description: |
  Generate images and videos using x402-protected AI models at StableStudio.

  USE FOR:
  - Generating images from text prompts
  - Generating videos from text or images
  - Editing images with AI
  - Creating visual content

  TRIGGERS:
  - "generate image", "create image", "make a picture"
  - "generate video", "create video", "make a video"
  - "edit image", "modify image"
  - "stablestudio", "nano-banana", "sora", "veo"
mcp:
  - x402
---

# Media Generation with StableStudio

Generate images and videos via x402 payments at `https://stablestudio.io`.

## Quick Reference

| Task | Endpoint | Cost | Time |
|------|----------|------|------|
| Image (default) | `/api/x402/nano-banana-pro/generate` | $0.13-0.24 | ~10s |
| Image (budget) | `/api/x402/nano-banana/generate` | $0.039 | ~5s |
| Video (default) | `/api/x402/veo-3.1/generate` | $1.60-3.20 | 1-2min |
| Video (budget) | `/api/x402/wan-2.5/t2v` | $0.34-1.02 | 2-5min |
| Image edit | `/api/x402/nano-banana-pro/edit` | $0.13-0.24 | ~10s |

## Image Generation

**Recommended: nano-banana-pro** (best quality/cost)

```mcp
x402.fetch(
  url="https://stablestudio.io/api/x402/nano-banana-pro/generate",
  method="POST",
  body={
    "prompt": "a cat wearing a space helmet, photorealistic",
    "aspectRatio": "16:9",
    "imageSize": "2K"
  }
)
```

**Options:**
- `aspectRatio`: "1:1", "16:9", "9:16"
- `imageSize`: "1K", "2K", "4K" (nano-banana-pro only)

## Video Generation

**Recommended: veo-3.1** (best quality/cost)

```mcp
x402.fetch(
  url="https://stablestudio.io/api/x402/veo-3.1/generate",
  method="POST",
  body={
    "prompt": "a timelapse of clouds moving over mountains",
    "durationSeconds": "6",
    "aspectRatio": "16:9"
  }
)
```

**Options:**
- `durationSeconds`: "4", "6", "8"
- `aspectRatio`: "16:9", "9:16"

## Job Polling

Generation returns a `jobId`. Poll until complete:

```mcp
x402.fetch_with_auth(
  url="https://stablestudio.io/api/x402/jobs/{jobId}"
)
```

Poll images every 3s, videos every 10s. Result contains `imageUrl` or `videoUrl`.

## Image Editing

Requires uploading the source image first. See [rules/uploads.md](rules/uploads.md).

```mcp
x402.fetch(
  url="https://stablestudio.io/api/x402/nano-banana-pro/edit",
  method="POST",
  body={
    "prompt": "change the background to a beach sunset",
    "images": ["https://...blob-url..."]
  }
)
```

## Model Comparison

| Model | Type | Best For |
|-------|------|----------|
| nano-banana-pro | Image | General purpose, up to 4K |
| nano-banana | Image | Quick drafts, budget |
| gpt-image-1.5 | Image | Fast, variable quality |
| veo-3.1 | Video | High quality, 1080p |
| wan-2.5 | Video | Budget, text or image input |
| sora-2 | Video | Premium quality |

Overview

This skill generates high-quality images and videos using x402-protected AI models hosted at StableStudio. It supports text-to-image, text-to-video, image-to-video, and image editing workflows with cost/time tradeoffs between premium and budget models. Use it to produce final assets or fast drafts depending on project needs.

How this skill works

You send a POST request to a StableStudio x402 endpoint with a prompt and optional parameters (aspect ratio, size, duration). The API returns a jobId; poll the jobs endpoint until the job completes to retrieve imageUrl or videoUrl. For edits, upload or provide the source image URL, include it in the request, and the edit endpoint returns the modified asset.

When to use it

Generate photoreal or stylized images from text prompts
Produce short video clips from text or image inputs
Edit or replace parts of an existing image using AI
Create fast drafts with budget models before finalizing with premium models
Automate visual content generation in pipelines or apps

Best practices

Choose nano-banana-pro for best image quality and 2K–4K outputs; use nano-banana for fast budget drafts
For videos, use veo-3.1 for higher quality and wan-2.5 for cost-sensitive use cases
Poll jobs endpoint: check every ~3s for images and ~10s for videos to avoid wasted requests
Specify aspectRatio and explicit size/duration to control outputs and costs
Provide clear, specific prompts and include example image URLs for targeted edits

Example use cases

Marketing teams generating multiple ad images at different aspect ratios and sizes
Concept artists producing quick visual drafts with nano-banana, then refining with nano-banana-pro
Social creators generating short motion clips (4–8s) using veo-3.1 for reels or stories
Product teams automating image edits (background swaps, object removal) by uploading source images and sending edit prompts
App developers integrating on-demand image/video generation into chatbots or content pipelines

FAQ

How do I get the final image or video URL?

The generate endpoint returns a jobId. Poll https://stablestudio.io/api/x402/jobs/{jobId} until status is complete; the response includes imageUrl or videoUrl.

Which model should I pick for cost vs quality?

Use nano-banana-pro and veo-3.1 for best quality. Choose nano-banana or wan-2.5 when you need faster, cheaper drafts. Balance duration/size to control cost.