home / skills / questnova502 / claude-skills-sync / baoyu-danger-gemini-web

baoyu-danger-gemini-web skill

/skills/baoyu-danger-gemini-web

This skill generates images or text using Gemini Web from prompts, enabling rapid visuals and content ideas for covers, articles, and social media.

npx playbooks add skill questnova502/claude-skills-sync --skill baoyu-danger-gemini-web

Review the files below or copy the command above to add this skill to your agents.

Files (25)
SKILL.md
8.8 KB
---
name: baoyu-danger-gemini-web
description: Image generation skill using Gemini Web. Generates images from text prompts via Google Gemini. Also supports text generation. Use as the image generation backend for other skills like cover-image, xhs-images, article-illustrator.
---

# Gemini Web Client

Supports:
- Text generation
- Image generation (download + save)
- Reference images for vision input (attach local images)
- Multi-turn conversations via persisted `--sessionId`

## Script Directory

**Important**: All scripts are located in the `scripts/` subdirectory of this skill.

**Agent Execution Instructions**:
1. Determine this SKILL.md file's directory path as `SKILL_DIR`
2. Script path = `${SKILL_DIR}/scripts/<script-name>.ts`
3. Replace all `${SKILL_DIR}` in this document with the actual path

**Script Reference**:
| Script | Purpose |
|--------|---------|
| `scripts/main.ts` | CLI entry point for text/image generation |
| `scripts/gemini-webapi/*` | TypeScript port of `gemini_webapi` (GeminiClient, types, utils) |

## ⚠️ Disclaimer (REQUIRED)

**Before using this skill**, the consent check MUST be performed.

### Consent Check Flow

**Step 1**: Check consent file

```bash
# macOS
cat ~/Library/Application\ Support/baoyu-skills/gemini-web/consent.json 2>/dev/null

# Linux
cat ~/.local/share/baoyu-skills/gemini-web/consent.json 2>/dev/null

# Windows (PowerShell)
Get-Content "$env:APPDATA\baoyu-skills\gemini-web\consent.json" 2>$null
```

**Step 2**: If consent exists and `accepted: true` with matching `disclaimerVersion: "1.0"`:

Print warning and proceed:
```
⚠️  Warning: Using reverse-engineered Gemini Web API (not official). Accepted on: <acceptedAt date>
```

**Step 3**: If consent file doesn't exist or `disclaimerVersion` mismatch:

Display disclaimer and ask user:

```
⚠️  DISCLAIMER

This tool uses a reverse-engineered Gemini Web API, NOT an official Google API.

Risks:
- May break without notice if Google changes their API
- No official support or guarantees
- Use at your own risk

Do you accept these terms and wish to continue?
```

Use `AskUserQuestion` tool with options:
- **Yes, I accept** - Continue and save consent
- **No, I decline** - Exit immediately

**Step 4**: On acceptance, create consent file:

```bash
# macOS
mkdir -p ~/Library/Application\ Support/baoyu-skills/gemini-web
cat > ~/Library/Application\ Support/baoyu-skills/gemini-web/consent.json << 'EOF'
{
  "version": 1,
  "accepted": true,
  "acceptedAt": "<ISO timestamp>",
  "disclaimerVersion": "1.0"
}
EOF

# Linux
mkdir -p ~/.local/share/baoyu-skills/gemini-web
cat > ~/.local/share/baoyu-skills/gemini-web/consent.json << 'EOF'
{
  "version": 1,
  "accepted": true,
  "acceptedAt": "<ISO timestamp>",
  "disclaimerVersion": "1.0"
}
EOF
```

**Step 5**: On decline, output message and stop:
```
User declined the disclaimer. Exiting.
```

---

## Quick start

```bash
npx -y bun ${SKILL_DIR}/scripts/main.ts "Hello, Gemini"
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "Explain quantum computing"
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A cute cat" --image cat.png
npx -y bun ${SKILL_DIR}/scripts/main.ts --promptfiles system.md content.md --image out.png

# Multi-turn conversation (agent generates unique sessionId)
npx -y bun ${SKILL_DIR}/scripts/main.ts "Remember this: 42" --sessionId my-unique-id-123
npx -y bun ${SKILL_DIR}/scripts/main.ts "What number?" --sessionId my-unique-id-123
```

## Commands

### Text generation

```bash
# Simple prompt (positional)
npx -y bun ${SKILL_DIR}/scripts/main.ts "Your prompt here"

# Explicit prompt flag
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "Your prompt here"
npx -y bun ${SKILL_DIR}/scripts/main.ts -p "Your prompt here"

# With model selection
npx -y bun ${SKILL_DIR}/scripts/main.ts -p "Hello" -m gemini-2.5-pro

# Pipe from stdin
echo "Summarize this" | npx -y bun ${SKILL_DIR}/scripts/main.ts
```

### Image generation

```bash
# Generate image with default path (./generated.png)
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A sunset over mountains" --image

# Generate image with custom path
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A cute robot" --image robot.png

# Shorthand
npx -y bun ${SKILL_DIR}/scripts/main.ts "A dragon" --image=dragon.png
```

### Vision input (reference images)

```bash
# Text + image -> text
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "Describe this image" --reference a.png

# Text + image -> image
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "Generate a variation" --reference a.png --image out.png
```

### Output formats

```bash
# Plain text (default)
npx -y bun ${SKILL_DIR}/scripts/main.ts "Hello"

# JSON output
npx -y bun ${SKILL_DIR}/scripts/main.ts "Hello" --json
```

## Options

| Option | Description |
|--------|-------------|
| `--prompt <text>`, `-p` | Prompt text |
| `--promptfiles <files...>` | Read prompt from files (concatenated in order) |
| `--model <id>`, `-m` | Model: gemini-3-pro (default), gemini-2.5-pro, gemini-2.5-flash |
| `--image [path]` | Generate image, save to path (default: generated.png) |
| `--reference <files...>`, `--ref <files...>` | Reference images for vision input |
| `--sessionId <id>` | Session ID for multi-turn conversation (agent generates unique ID) |
| `--list-sessions` | List saved sessions (max 100, sorted by update time) |
| `--json` | Output as JSON |
| `--login` | Refresh cookies only, then exit |
| `--cookie-path <path>` | Custom cookie file path |
| `--profile-dir <path>` | Chrome profile directory |
| `--help`, `-h` | Show help |

CLI note: `scripts/main.ts` supports text generation, image generation, reference images (`--reference/--ref`), and multi-turn conversations via `--sessionId`.

## Models

- `gemini-3-pro` - Default, latest model
- `gemini-2.5-pro` - Previous generation pro
- `gemini-2.5-flash` - Fast, lightweight

## Authentication

First run opens a browser to authenticate with Google. Cookies are cached for subsequent runs.

**Supported browsers** (auto-detected in order):
- Google Chrome
- Google Chrome Canary / Beta
- Chromium
- Microsoft Edge

Override with `GEMINI_WEB_CHROME_PATH` environment variable if needed.

```bash
# Force cookie refresh
npx -y bun ${SKILL_DIR}/scripts/main.ts --login
```

## Environment variables

| Variable | Description |
|----------|-------------|
| `GEMINI_WEB_DATA_DIR` | Data directory |
| `GEMINI_WEB_COOKIE_PATH` | Cookie file path |
| `GEMINI_WEB_CHROME_PROFILE_DIR` | Chrome profile directory |
| `GEMINI_WEB_CHROME_PATH` | Chrome executable path |

## Proxy Configuration

If you need a proxy to access Google services (e.g., in China), set `HTTP_PROXY` and `HTTPS_PROXY` environment variables before running:

```bash
# Example with local proxy
HTTP_PROXY=http://127.0.0.1:7890 HTTPS_PROXY=http://127.0.0.1:7890 npx -y bun ${SKILL_DIR}/scripts/main.ts "Hello"

# Image generation with proxy
HTTP_PROXY=http://127.0.0.1:7890 HTTPS_PROXY=http://127.0.0.1:7890 npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A cat" --image cat.png

# Cookie refresh with proxy
HTTP_PROXY=http://127.0.0.1:7890 HTTPS_PROXY=http://127.0.0.1:7890 npx -y bun ${SKILL_DIR}/scripts/main.ts --login
```

**Note**: Environment variables must be set inline with the command. Shell profile settings (e.g., `.bashrc`) may not be inherited by subprocesses.

## Examples

### Generate text response
```bash
npx -y bun ${SKILL_DIR}/scripts/main.ts "What is the capital of France?"
```

### Generate image
```bash
npx -y bun ${SKILL_DIR}/scripts/main.ts "A photorealistic image of a golden retriever puppy" --image puppy.png
```

### Get JSON output for parsing
```bash
npx -y bun ${SKILL_DIR}/scripts/main.ts "Hello" --json | jq '.text'
```

### Generate image from prompt files
```bash
# Concatenate system.md + content.md as prompt
npx -y bun ${SKILL_DIR}/scripts/main.ts --promptfiles system.md content.md --image output.png
```

### Multi-turn conversation
```bash
# Start a session with unique ID (agent generates this)
npx -y bun ${SKILL_DIR}/scripts/main.ts "You are a helpful math tutor." --sessionId task-abc123

# Continue the conversation (remembers context)
npx -y bun ${SKILL_DIR}/scripts/main.ts "What is 2+2?" --sessionId task-abc123
npx -y bun ${SKILL_DIR}/scripts/main.ts "Now multiply that by 10" --sessionId task-abc123

# List recent sessions (max 100, sorted by update time)
npx -y bun ${SKILL_DIR}/scripts/main.ts --list-sessions
```

Session files are stored in `~/Library/Application Support/baoyu-skills/gemini-web/sessions/<id>.json` and contain:
- `id`: Session ID
- `metadata`: Gemini chat metadata for continuation
- `messages`: Array of `{role, content, timestamp, error?}`
- `createdAt`, `updatedAt`: Timestamps

## Extension Support

Custom configurations via EXTEND.md.

**Check paths** (priority order):
1. `.baoyu-skills/baoyu-danger-gemini-web/EXTEND.md` (project)
2. `~/.baoyu-skills/baoyu-danger-gemini-web/EXTEND.md` (user)

If found, load before workflow. Extension content overrides defaults.

Overview

This skill connects to Google Gemini Web to generate images and text from prompts. It provides a CLI for single-shot and multi-turn conversations, supports reference images for vision input, and saves generated images to disk. Use it as an image-generation backend for other tools or to run one-off content and illustration jobs locally.

How this skill works

The skill runs a TypeScript CLI that invokes a reverse-engineered Gemini Web client to produce text or download image outputs. It supports model selection, session persistence for multi-turn chats, reference images for vision-aware prompts, and local cookie-based authentication via an OAuth browser flow. A mandatory consent check warns users about the unofficial API and writes a local consent file before any network calls.

When to use it

  • Generate photorealistic or stylized images from text prompts and save them locally.
  • Create image variations or vision-aware outputs using reference images.
  • Run multi-turn conversations with persistent context across runs using sessionId.
  • Integrate as a backend for other skills (cover-image, article-illustrator, xhs-images).
  • Quickly prototype prompts and export JSON responses for programmatic parsing.

Best practices

  • Run the consent flow and verify the consent file is saved before automated runs.
  • Keep cookies and profile paths under version-controlled, secure locations; use GEMINI_WEB_COOKIE_PATH or GEMINI_WEB_CHROME_PATH when needed.
  • Use --promptfiles to compose complex system + content prompts for predictable behavior.
  • Choose model by task: gemini-3-pro for quality, gemini-2.5-flash for speed.
  • Set HTTP_PROXY/HTTPS_PROXY inline when behind a network proxy to ensure authentication and image downloads succeed.

Example use cases

  • Generate cover images for articles: concatenate system + content prompt files and output a PNG.
  • Produce social media illustration batches by scripting prompts and saving each image to a unique path.
  • Create image variations from a reference photo using --reference and --image out.png.
  • Run an interactive tutoring session that preserves context across commands with --sessionId.
  • Automate JSON output parsing by calling the CLI with --json and piping to jq or another parser.

FAQ

Is this an official Google API?

No. This uses a reverse-engineered Gemini Web interface. A consent check warns you and saves acceptance locally before use.

How do I authenticate?

First run opens your browser to sign in to Google; cookies are cached for subsequent runs. Use --login to refresh cookies manually.

Can I run behind a proxy?

Yes. Set HTTP_PROXY and HTTPS_PROXY inline with the CLI command so the subprocess inherits the proxy settings.