home / skills / mitsuhiko / agent-stuff / web-browser

web-browser skill

/skills/web-browser

This skill allows you to automate web page exploration by controlling a browser to navigate, interact, and extract data efficiently.

This is most likely a fork of the web-browser skill from hqman
npx playbooks add skill mitsuhiko/agent-stuff --skill web-browser

Review the files below or copy the command above to add this skill to your agents.

Files (13)
SKILL.md
2.2 KB
---
name: web-browser
description: "Allows to interact with web pages by performing actions such as clicking buttons, filling out forms, and navigating links. It works by remote controlling Google Chrome or Chromium browsers using the Chrome DevTools Protocol (CDP). When Claude needs to browse the web, it can use this skill to do so."
license: Stolen from Mario
---

# Web Browser Skill

Minimal CDP tools for collaborative site exploration.

## Start Chrome

```bash
./scripts/start.js              # Fresh profile
./scripts/start.js --profile    # Copy your profile (cookies, logins)
```

Start Chrome on `:9222` with remote debugging.

## Navigate

```bash
./scripts/nav.js https://example.com
./scripts/nav.js https://example.com --new
```

Navigate current tab or open new tab.

## Evaluate JavaScript

```bash
./scripts/eval.js 'document.title'
./scripts/eval.js 'document.querySelectorAll("a").length'
./scripts/eval.js 'JSON.stringify(Array.from(document.querySelectorAll("a")).map(a => ({ text: a.textContent.trim(), href: a.href })).filter(link => !link.href.startsWith("https://")))'
```

Execute JavaScript in active tab (async context).  Be careful with string escaping, best to use single quotes.

## Screenshot

```bash
./scripts/screenshot.js
```

Screenshot current viewport, returns temp file path

## Pick Elements

```bash
./scripts/pick.js "Click the submit button"
```

Interactive element picker. Click to select, Cmd/Ctrl+Click for multi-select, Enter to finish.

## Dismiss Cookie Dialogs

```bash
./scripts/dismiss-cookies.js          # Accept cookies
./scripts/dismiss-cookies.js --reject # Reject cookies (where possible)
```

Automatically dismisses EU cookie consent dialogs.

Run after navigating to a page:
```bash
./scripts/nav.js https://example.com && ./scripts/dismiss-cookies.js
```

## Background Logging (Console + Errors + Network)

Automatically started by `start.js` and writes JSONL logs to:

```
~/.cache/agent-web/logs/YYYY-MM-DD/<targetId>.jsonl
```

Manually start:
```bash
./scripts/watch.js
```

Tail latest log:
```bash
./scripts/logs-tail.js           # dump current log and exit
./scripts/logs-tail.js --follow  # keep following
```

Summarize network responses:
```bash
./scripts/net-summary.js
```

Overview

This skill provides a lightweight web-browser controller that remote-controls Google Chrome/Chromium via the Chrome DevTools Protocol (CDP). It enables automated navigation, element interaction, JavaScript evaluation, screenshots, and basic logging for collaborative site exploration. The tool is designed for agents to browse, inspect, and interact with pages programmatically.

How this skill works

The skill launches Chrome with remote debugging enabled and connects over CDP to control tabs, execute scripts, click elements, and capture screenshots. It exposes commands to navigate pages, evaluate arbitrary JavaScript in the page context, pick DOM elements interactively, dismiss common cookie dialogs, and record console/network logs. Logs and network summaries are written to JSONL for later analysis.

When to use it

  • Automating site navigation and interactions during agent-driven browsing tasks
  • Inspecting page structure or extracting data by running in-page JavaScript
  • Capturing visual state or verifying UI via screenshots
  • Debugging agent actions with console, error, and network logs
  • Quickly dismissing cookie consent dialogs to continue automated flows

Best practices

  • Start Chrome with a fresh or copied profile depending on whether you need cookies/logins
  • Run JavaScript evaluations in single quotes to avoid shell escaping issues
  • Use the interactive element picker to reliably target buttons and inputs
  • Enable logging when debugging network issues; use JSONL outputs for structured analysis
  • Call the cookie-dismiss command immediately after navigation to avoid modal interference

Example use cases

  • Open a product page, extract all outbound links and their anchor text with an in-page script
  • Automate login and form submission in a new tab using navigation plus element clicks and fills
  • Capture a sequence of screenshots while stepping through a checkout flow for QA
  • Tail real-time console and network logs to diagnose failing resource loads or JavaScript errors
  • Run a headful browsing session that accepts or rejects EU cookie banners before scraping

FAQ

Do I need Chrome installed to use this skill?

Yes — the skill controls Google Chrome or Chromium via CDP, so a compatible browser must be installed and launched with remote debugging.

How are logs stored and accessed?

Background logs are saved as JSONL files in a cache directory by date and target ID; helper scripts allow tailing or summarizing network responses.