home / skills / jimliu / baoyu-skills / baoyu-danger-gemini-web

baoyu-danger-gemini-web skill

/skills/baoyu-danger-gemini-web

This skill generates text and images using Gemini Web API, supporting vision input and multi-turn conversations for seamless creative tasks.

npx playbooks add skill jimliu/baoyu-skills --skill baoyu-danger-gemini-web

Review the files below or copy the command above to add this skill to your agents.

Files (25)
SKILL.md
7.3 KB
---
name: baoyu-danger-gemini-web
description: Generates images and text via reverse-engineered Gemini Web API. Supports text generation, image generation from prompts, reference images for vision input, and multi-turn conversations. Use when other skills need image generation backend, or when user requests "generate image with Gemini", "Gemini text generation", or needs vision-capable AI generation.
---

# Gemini Web Client

Text/image generation via Gemini Web API. Supports reference images and multi-turn conversations.

## Script Directory

**Important**: All scripts are located in the `scripts/` subdirectory of this skill.

**Agent Execution Instructions**:
1. Determine this SKILL.md file's directory path as `SKILL_DIR`
2. Script path = `${SKILL_DIR}/scripts/<script-name>.ts`
3. Replace all `${SKILL_DIR}` in this document with the actual path

**Script Reference**:
| Script | Purpose |
|--------|---------|
| `scripts/main.ts` | CLI entry point for text/image generation |
| `scripts/gemini-webapi/*` | TypeScript port of `gemini_webapi` (GeminiClient, types, utils) |

## Consent Check (REQUIRED)

Before first use, verify user consent for reverse-engineered API usage.

**Consent file locations**:
- macOS: `~/Library/Application Support/baoyu-skills/gemini-web/consent.json`
- Linux: `~/.local/share/baoyu-skills/gemini-web/consent.json`
- Windows: `%APPDATA%\baoyu-skills\gemini-web\consent.json`

**Flow**:
1. Check if consent file exists with `accepted: true` and `disclaimerVersion: "1.0"`
2. If valid consent exists → print warning with `acceptedAt` date, proceed
3. If no consent → show disclaimer, ask user via `AskUserQuestion`:
   - "Yes, I accept" → create consent file with ISO timestamp, proceed
   - "No, I decline" → output decline message, stop
4. Consent file format: `{"version":1,"accepted":true,"acceptedAt":"<ISO>","disclaimerVersion":"1.0"}`

---

## Preferences (EXTEND.md)

Use Bash to check EXTEND.md existence (priority order):

```bash
# Check project-level first
test -f .baoyu-skills/baoyu-danger-gemini-web/EXTEND.md && echo "project"

# Then user-level (cross-platform: $HOME works on macOS/Linux/WSL)
test -f "$HOME/.baoyu-skills/baoyu-danger-gemini-web/EXTEND.md" && echo "user"
```

┌──────────────────────────────────────────────────────────┬───────────────────┐
│                           Path                           │     Location      │
├──────────────────────────────────────────────────────────┼───────────────────┤
│ .baoyu-skills/baoyu-danger-gemini-web/EXTEND.md          │ Project directory │
├──────────────────────────────────────────────────────────┼───────────────────┤
│ $HOME/.baoyu-skills/baoyu-danger-gemini-web/EXTEND.md    │ User home         │
└──────────────────────────────────────────────────────────┴───────────────────┘

┌───────────┬───────────────────────────────────────────────────────────────────────────┐
│  Result   │                                  Action                                   │
├───────────┼───────────────────────────────────────────────────────────────────────────┤
│ Found     │ Read, parse, apply settings                                               │
├───────────┼───────────────────────────────────────────────────────────────────────────┤
│ Not found │ Use defaults                                                              │
└───────────┴───────────────────────────────────────────────────────────────────────────┘

**EXTEND.md Supports**: Default model | Proxy settings | Custom data directory

## Usage

```bash
# Text generation
npx -y bun ${SKILL_DIR}/scripts/main.ts "Your prompt"
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "Your prompt" --model gemini-2.5-pro

# Image generation
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A cute cat" --image cat.png
npx -y bun ${SKILL_DIR}/scripts/main.ts --promptfiles system.md content.md --image out.png

# Vision input (reference images)
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "Describe this" --reference image.png
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "Create variation" --reference a.png --image out.png

# Multi-turn conversation
npx -y bun ${SKILL_DIR}/scripts/main.ts "Remember: 42" --sessionId session-abc
npx -y bun ${SKILL_DIR}/scripts/main.ts "What number?" --sessionId session-abc

# JSON output
npx -y bun ${SKILL_DIR}/scripts/main.ts "Hello" --json
```

## Options

| Option | Description |
|--------|-------------|
| `--prompt`, `-p` | Prompt text |
| `--promptfiles` | Read prompt from files (concatenated) |
| `--model`, `-m` | Model: gemini-3-pro (default), gemini-2.5-pro, gemini-2.5-flash |
| `--image [path]` | Generate image (default: generated.png) |
| `--reference`, `--ref` | Reference images for vision input |
| `--sessionId` | Session ID for multi-turn conversation |
| `--list-sessions` | List saved sessions |
| `--json` | Output as JSON |
| `--login` | Refresh cookies, then exit |
| `--cookie-path` | Custom cookie file path |
| `--profile-dir` | Chrome profile directory |

## Models

| Model | Description |
|-------|-------------|
| `gemini-3-pro` | Default, latest |
| `gemini-2.5-pro` | Previous pro |
| `gemini-2.5-flash` | Fast, lightweight |

## Authentication

First run opens browser for Google auth. Cookies cached automatically.

Supported browsers (auto-detected): Chrome, Chrome Canary/Beta, Chromium, Edge.

Force refresh: `--login` flag. Override browser: `GEMINI_WEB_CHROME_PATH` env var.

## Environment Variables

| Variable | Description |
|----------|-------------|
| `GEMINI_WEB_DATA_DIR` | Data directory |
| `GEMINI_WEB_COOKIE_PATH` | Cookie file path |
| `GEMINI_WEB_CHROME_PROFILE_DIR` | Chrome profile directory |
| `GEMINI_WEB_CHROME_PATH` | Chrome executable path |
| `HTTP_PROXY`, `HTTPS_PROXY` | Proxy for Google access (set inline with command) |

## Sessions

Session files stored in data directory under `sessions/<id>.json`.

Contains: `id`, `metadata` (Gemini chat state), `messages` array, timestamps.

## Extension Support

Custom configurations via EXTEND.md. See **Preferences** section for paths and supported options.

Overview

This skill provides text and image generation using a reverse-engineered Gemini Web API. It supports single-turn and multi-turn conversations, image generation from prompts, and vision input via reference images. Built for CLI integration, it can act as an image/text backend for other tools or direct user requests.

How this skill works

The skill runs as a TypeScript CLI that interacts with a Gemini Web client implementation. It loads cookies via a browser-based Google sign-in, accepts prompts and optional reference images, and returns generated text or image files. Sessions and preferences are stored on disk to enable multi-turn conversations and persistent settings.

When to use it

  • Generate images from text prompts or using reference images for variations.
  • Produce text responses or multi-turn chat where Gemini-style output is desired.
  • Provide a local CLI backend for other skills that require image generation.
  • Batch or script generation tasks via shell commands or npx invocations.
  • Test or prototype flows that need vision-capable language model outputs.

Best practices

  • Run the consent flow on first use and verify the consent file exists before automating calls.
  • Use session IDs for multi-turn flows to preserve conversation state and context.
  • Store large numbers of sessions or generated media in a dedicated data directory via environment variables.
  • Provide reference images when you need the model to base new images on existing visuals.
  • Use the --json flag for machine-readable output when integrating with other tools.

Example use cases

  • Generate a product mockup image from a prompt and a reference photo for consistent styling.
  • Run an automated captioning pipeline: feed images as references and collect text outputs as JSON.
  • Prototype a chatbot that needs Gemini text quality and keeps state across turns via session files.
  • Create prompt-driven image variations for creative exploration or A/B testing.
  • Scripted batch generation of social media assets using different models and output formats.

FAQ

Is explicit consent required before using this skill?

Yes. The first run must record user consent in a local consent file before the skill proceeds.

How are credentials handled for Gemini access?

Authentication uses a browser-based Google sign-in and stores cookies locally; you can refresh with the --login option or customize the cookie/profile paths via environment variables.