home / skills / openclaw / skills / chatgpt-image-gen

chatgpt-image-gen skill

/skills/sonim1/chatgpt-image-gen

This skill helps you generate images with ChatGPT's DALL-E via OpenClaw by automating your logged-in browser session.

npx playbooks add skill openclaw/skills --skill chatgpt-image-gen

Review the files below or copy the command above to add this skill to your agents.

Files (2)
SKILL.md
5.3 KB
---
name: chatgpt-image-gen
description: Generate images using ChatGPT/DALL-E through OpenClaw browser automation. Use when the user wants to create images via ChatGPT's web interface with their logged-in session. Requires Chrome extension setup for first use. Use for image generation tasks where ChatGPT's native DALL-E image creation is preferred.
---

# ChatGPT Image Generation

Generate images using ChatGPT's DALL-E integration through OpenClaw browser automation.

## Prerequisites

1. **Chrome Extension Installation:**
   - Install OpenClaw Browser Relay from Chrome Web Store
   - Or use the extension that comes with OpenClaw

2. **Initial Setup (one-time):**
   - Open ChatGPT (chatgpt.com) in Chrome/Brave
   - Login to your ChatGPT account (Pro subscription recommended for best quality)
   - Click the OpenClaw extension icon on the ChatGPT tab to attach it
   - The badge should show "ON" when attached

## How It Works

This skill uses OpenClaw's built-in `browser` tool with Chrome extension relay (`profile="chrome"`) to control an already-logged-in ChatGPT tab. This bypasses ChatGPT's bot detection because it uses your real browser session.

## CLI Command Reference

**IMPORTANT:** There is NO `browser act` subcommand. Each action is a direct subcommand.

| Action | CLI Syntax |
|--------|-----------|
| List tabs | `openclaw browser tabs` |
| Snapshot | `openclaw browser snapshot --target-id <ID>` |
| Click | `openclaw browser click <ref> --target-id <ID>` |
| Type | `openclaw browser type <ref> "<text>" --target-id <ID>` |
| Press key | `openclaw browser press <key> --target-id <ID>` |
| Navigate | `openclaw browser navigate <url> --target-id <ID>` |
| Screenshot | `openclaw browser screenshot --target-id <ID>` |

- `<ref>` and `<text>` are **positional arguments** (no `--ref` flag)
- `--target-id` accepts a full ID or unique prefix (e.g. `77CB` instead of `77CB8A574E8A44861C5FE49388EF6ABC`)
- `--profile` is a parent option on `openclaw browser`, not on subcommands

## Workflow

### 1. List Attached Tabs

```bash
openclaw browser tabs
```

Look for a tab with URL containing `chatgpt.com`. Note the `targetId`.

### 2. Get Snapshot (find element refs)

```bash
openclaw browser snapshot --target-id <ID> --format ai --efficient
```

This outputs a tree with refs like `e23`, `e589`, etc. Always run snapshot before interacting.

### 3. Click an Element

```bash
openclaw browser click e23 --target-id <ID>
```

### 4. Type Text

```bash
openclaw browser type e589 "Generate an image: a futuristic city at sunset" --target-id <ID>
```

Add `--submit` to press Enter after typing:
```bash
openclaw browser type e589 "Generate an image: a cat riding a skateboard" --target-id <ID> --submit
```

### 5. Press a Key

```bash
openclaw browser press Enter --target-id <ID>
```

### 6. Wait for Generation

Use `sleep` to wait for DALL-E to generate (30-60 seconds):
```bash
sleep 45
```

Then take a new snapshot to check the result:
```bash
openclaw browser snapshot --target-id <ID> --format ai --efficient
```

## Complete Example Session

```bash
# 1. List tabs, find the ChatGPT tab targetId
openclaw browser tabs

# 2. Take snapshot to find element refs
openclaw browser snapshot --target-id 4535E --format ai --efficient

# 3. Click input field (check ref from snapshot, usually labeled "Ask anything")
openclaw browser click e589 --target-id 4535E

# 4. Type prompt and submit
openclaw browser type e589 "Generate an image: a futuristic city at sunset" --target-id 4535E --submit

# 5. Wait for DALL-E generation
sleep 45

# 6. Take new snapshot to see result and find download button
openclaw browser snapshot --target-id 4535E --format ai --efficient

# 7. Click download button (ref from new snapshot)
openclaw browser click e745 --target-id 4535E
```

## Troubleshooting

**"Can't reach the OpenClaw browser control service":**
- Gateway restart needed: `openclaw gateway restart`
- Or restart via OpenClaw menu bar app

**"Chrome extension relay is running, but no tab is connected":**
- ChatGPT tab is not attached
- Go to the ChatGPT tab and click the OpenClaw extension icon

**"ref is required" error:**
- You need to specify which element to interact with
- Run `snapshot` first to get the refs

**Command not found / Unknown command:**
- Do NOT use `browser act` — use direct subcommands: `browser click`, `browser type`, `browser press`
- ref is a positional argument: `browser click e23`, NOT `browser click --ref e23`

**Image generation timeout:**
- DALL-E generation takes 30-60 seconds
- Use `sleep 45` then re-snapshot to check

**Bot detection / Login issues:**
- The tab must be already logged in via your real browser
- Use the Chrome extension relay (attached tab), not the isolated browser

## Tips

1. **Keep ChatGPT tab open:** Once attached, keep the tab open for future use
2. **Check targetId:** The targetId changes if you close/reopen the tab — always run `tabs` first
3. **Use `--submit`:** The `type` command supports `--submit` to press Enter automatically
4. **Unique prefix:** `--target-id` accepts a unique prefix, no need for the full 32-char ID
5. **Pro subscription:** ChatGPT Pro gives better image quality and faster generation

## Security Note

This approach uses your actual Chrome browser session, so it inherits all your ChatGPT permissions and settings. No credentials are stored or transmitted - everything happens in your existing browser session.

Overview

This skill automates image generation via ChatGPT's DALL·E integration by controlling an already-logged-in Chrome/Brave tab through OpenClaw browser relay. It is intended for users who prefer ChatGPT's native image creation workflow and want to drive it programmatically using their own browser session. Initial setup requires a one-time Chrome extension attach step.

How this skill works

The skill uses OpenClaw's browser tool with the Chrome extension relay (profile="chrome") to attach to a live ChatGPT tab. It lists tabs, snapshots the page to discover element refs, and issues direct subcommands (click, type, press, screenshot) to enter prompts and trigger DALL·E generation. All actions run in your real browser session, so ChatGPT sees a normal logged-in user rather than an isolated bot.

When to use it

  • You want images created by ChatGPT/DALL·E using your own ChatGPT session.
  • You need to automate prompt submission, waiting, and downloading generated images from the web UI.
  • You prefer using the ChatGPT web interface workflow rather than the API for creative control.
  • You have the OpenClaw Chrome extension installed and can attach a browser tab.
  • You want to bypass bot-detection issues by using a real browser login session.

Best practices

  • Install and attach the OpenClaw Browser Relay extension before first use and confirm the badge shows ON.
  • Always run openclaw browser tabs and snapshot before interacting to get the current targetId and element refs.
  • Use the snapshot output to identify element refs and avoid guessing refs across sessions.
  • Include --submit with the type command to automatically press Enter when appropriate.
  • Allow 30–60 seconds for DALL·E generation and use sleep then re-snapshot to detect results.

Example use cases

  • Automate bulk image generation by programmatically typing prompts and downloading results from the ChatGPT UI.
  • Reproduce a designer's prompt workflow on a logged-in ChatGPT Pro account for higher-quality images.
  • Integrate with a local script pipeline that opens a ChatGPT tab, attaches OpenClaw, submits prompts, waits, and saves outputs.
  • Troubleshoot image generation UI flows by capturing snapshots and clicks to reproduce issues on the live site.

FAQ

Do I need a ChatGPT account?

Yes. The Chrome tab must be logged into your ChatGPT account; a Pro subscription is recommended for best quality.

What if I see a "ref is required" error?

Run openclaw browser snapshot to obtain element refs before using click/type commands; refs are positional tokens from the snapshot output.

How long does image generation take?

DALL·E generation typically takes 30–60 seconds. Use sleep (e.g., sleep 45) then take another snapshot to check for the result.