home / skills / git-fg / thecattoolkit / browsing-web
This skill automates dynamic website navigation and interactions using browser references, enabling reliable clicks, typing, and stateful workflows.
npx playbooks add skill git-fg/thecattoolkit --skill browsing-webReview the files below or copy the command above to add this skill to your agents.
---
name: browsing-web
description: "Interactive browser automation using agent-browser. Use when navigating dynamic sites, authentication, clicking, typing, and complex state navigation. Do NOT use for simple read-only text extraction."
allowed-tools: [Bash]
---
# Browser Interaction Protocol
## Core Loop (The Ref Pattern)
You interact with the browser using **References (@refs)** derived from snapshots, not CSS selectors.
1. **Navigate**: `agent-browser open "url"`
2. **Snapshot**: `agent-browser snapshot -i` (Gets accessibility tree with `@e` refs)
3. **Interact**: `agent-browser click @e1` (Uses ref from snapshot)
## Critical Constraints
1. **Never Guess Selectors**: You cannot guess `@e1`. You MUST run `snapshot` to see current refs.
2. **Interactive Only**: Always use `snapshot -i` to filter non-interactive elements (saves tokens).
3. **Stateful**: The browser persists between commands. You do not need to re-open.
## Common Patterns
### Navigation & extraction
```bash
agent-browser open "https://google.com"
agent-browser snapshot -i
# Output shows: [ref=e4] button "Search"
agent-browser fill @e2 "Claude Code"
agent-browser click @e4
agent-browser wait --load networkidle
```
### Visual Verification
Only if structure is confusing:
```bash
agent-browser screenshot page.png
```
This skill enables interactive browser automation using the agent-browser protocol to navigate dynamic websites, perform authentication, click and type, and manage complex UI state. It is designed for tasks that require action and stateful interaction rather than simple read-only scraping. Use it when you need precise, repeatable control of a live browser session.
The skill operates with a reference-driven loop: open a page, take an interactive snapshot, then interact using element references (refs) returned by the snapshot. You must run snapshot -i to receive @e refs that identify interactive elements; actions like click, fill, wait, and screenshot use those refs. The browser session is stateful and persists between commands so you can chain navigation and interactions without re-opening the page.
Can I interact with elements without running snapshot?
No. You must run snapshot -i to obtain @e refs. Guessing refs is not supported and will fail.
Do I need to re-open the browser between steps?
No. The browser session is stateful and persists across commands, so you can continue interacting without re-opening.
When should I take a screenshot?
Use screenshots only for visual verification when the accessibility tree is unclear or to capture visual regressions; rely on snapshots for element references.