home / skills / tkersey / dotfiles / web-browser

web-browser skill

needs review

This skill automates live website interactions in a real browser via CDP, enabling navigation, form actions, content extraction, and visual debugging.

npx playbooks add skill tkersey/dotfiles --skill web-browser

Review the files below or copy the command above to add this skill to your agents.

Files (12)

SKILL.md

4.6 KB

---
name: web-browser
description: "Use when tasks need real-browser web automation in Chrome/Chromium via CDP: open or navigate URLs, click/type/select in forms, run page JS, wait for selectors, scrape structured content, capture screenshots, or validate UI flows on live sites."
---

# Web Browser

## When to use
- Navigate or interact with live websites.
- Click buttons, fill forms, or extract page content.
- Evaluate JavaScript in a real browser context.
- Capture screenshots for debugging or review.
- UI validation and smoke testing in a real browser.

## Requirements
- Run commands from `codex/skills/web-browser/` (so `./tools/*.js` resolves).
- Node.js (ESM + top-level `await`) and repo deps installed (notably `puppeteer-core`).
- Chrome/Chromium installed, or set `CODEX_CHROME_PATH`.

## Quick start
```bash
# Start Chrome with CDP on :9222
./tools/start.js

# Safer (recommended on your laptop): avoid killing an existing Chrome session
./tools/start.js --no-kill

# Open a deterministic tab, then operate on it
./tools/nav.js https://example.com --new
./tools/eval.js 'document.title'
./tools/screenshot.js
```

Loop: take small steps → inspect state (`eval.js` / `screenshot.js`) → repeat.

## Configuration
- **Flags**
  - `start.js`: `--port`, `--user-data-dir`, `--chrome-path`, `--profile`, `--no-kill`
  - `nav.js` / `eval.js` / `screenshot.js` / `pick.js`: `--port`, `--browser-url`
- **Environment**
  - `CODEX_BROWSER_URL`: CDP URL (used when no CLI `--port`/`--browser-url`)
  - `CODEX_BROWSER_PORT` (alias: `CODEX_CDP_PORT`): CDP port for localhost
  - `CODEX_BROWSER_USER_DATA_DIR`: Chrome user data dir (used by `start.js`)
  - `CODEX_CHROME_PATH` (aliases: `CHROME_PATH`, `PUPPETEER_EXECUTABLE_PATH`): Chrome/Chromium executable

## Targeting (active tab)
All tools operate on the “active tab”, defined as the last page returned by `puppeteer.pages()` (roughly: the most recently opened tab).
- Prefer `./tools/nav.js <url> --new` when you want deterministic targeting.
- If actions hit the “wrong” tab, close extra tabs or open a fresh one.

## Common commands
```bash
# See all options (safe; does not start/kill Chrome)
./tools/start.js --help

# Use a non-default port
./tools/start.js --port 9223 --no-kill
./tools/nav.js --port 9223 https://example.com --new

# Or configure once via env
export CODEX_BROWSER_URL=http://localhost:9223
./tools/eval.js 'document.title'

# Wait for a selector (polling with timeout; good for SPAs)
./tools/eval.js '(async () => { for (let i = 0; i < 50; i++) { const el = document.querySelector("button[type=submit]"); if (el) return true; await new Promise(r => setTimeout(r, 100)); } return false; })()'

# Screenshot current viewport
# Prints a PNG filepath in your system temp dir
./tools/screenshot.js

# Pick elements interactively
# Prints tag/id/class/text/html/parents for one element (or many via Cmd/Ctrl+click)
./tools/pick.js "Click the submit button"
```

Tip: use `pick.js` to inspect attributes/text, then craft a selector for `eval.js`.

## Recipes
```bash
# Scrape structured data (return an array of objects for readable output)
./tools/eval.js 'Array.from(document.querySelectorAll("a"), a => ({ href: a.href, text: a.textContent?.trim() }))'
```

- **Login flows**
  - Dedicated automation profile: run `./tools/start.js`, log in once, then reuse the persisted profile in your user-data-dir (default: `~/.cache/scraping`).
  - Default-profile bootstrap: run `./tools/start.js --profile` when you truly need existing cookies/logins.

## Security & privacy
- `./tools/start.js --profile` uses `rsync -a --delete` to copy your default Chrome profile into your user-data-dir (cookies/sessions/PII).
- Treat your user-data-dir (default: `~/.cache/scraping`) as sensitive; avoid on shared machines and delete it when done if needed.

## Troubleshooting
- `✗ Failed to connect to Chrome via CDP`: run `./tools/start.js` (or set `CODEX_BROWSER_URL`/`--port` to match your Chrome).
- `✗ No active tab found`: open a page first (e.g., `./tools/nav.js https://example.com --new`).
- Port `9222` is busy: pick a new one (`./tools/start.js --port 9223`).
- Selector flakiness: use the polling pattern above; SPAs often need a wait after `domcontentloaded`.
- Chrome path not found: set `CODEX_CHROME_PATH` or pass `--chrome-path`.

## Pitfalls
- `./tools/start.js` defaults to killing Chrome processes (use `--no-kill` to avoid this).
- Use single quotes around JS to avoid shell-escaping issues.

## References
- `codex/skills/web-browser/tools/start.js`
- `codex/skills/web-browser/tools/nav.js`
- `codex/skills/web-browser/tools/eval.js`
- `codex/skills/web-browser/tools/screenshot.js`
- `codex/skills/web-browser/tools/pick.js`

Overview

This skill enables real-browser web automation in Chrome/Chromium via the Chrome DevTools Protocol (CDP). It supports navigation, form interaction, running page JavaScript, scraping structured content, taking screenshots, and validating UI flows on live sites. Use it when you need deterministic, inspectable browser actions rather than headless HTTP requests.

How this skill works

The tooling connects to a running Chrome/Chromium instance over CDP and targets the active tab (the most recently opened page). Commands let you open or navigate URLs, wait for selectors, evaluate arbitrary page JavaScript, click/type/select elements, capture screenshots, and output scraped data. You can start Chrome with a controlled user-data directory to preserve cookies and sessions for login flows.

When to use it

Automating multi-step interactions on websites (clicks, form fills, flows).
Scraping structured content that requires a real browser or JavaScript execution.
Capturing screenshots of pages or UI states for debugging or reporting.
Validating UI flows or smoke-testing live sites in a real browser.
Running page-level JavaScript to inspect or assert DOM state.

Best practices

Start Chrome with a dedicated user-data-dir for repeatable sessions and logins.
Prefer opening a new tab (--new) for deterministic targeting of the active tab.
Use polling/wait loops for selectors on SPAs to avoid flakiness instead of fixed sleeps.
Avoid killing your primary Chrome session by using the --no-kill flag on local machines.
Store sensitive user-data-dir files securely and delete them after use if needed.

Example use cases

Open a product page, extract structured fields (title, price, images) and return JSON.
Automate a login flow by reusing a persisted browser profile and then navigate a protected page.
Take a screenshot of a failing UI state during a test run for triage.
Interactively pick elements on a page to craft robust selectors, then run eval.js to scrape.
Wait for a dynamic element on a SPA and click the submit button to continue a workflow.

FAQ

What environment is required to run the tooling?

Node.js with ESM support, puppeteer-core installed, and Chrome/Chromium available or pointed to via environment variable.

How do I target a specific tab reliably?

Open a deterministic tab using the --new option when navigating; the tools operate on the last opened page as the active tab.