home / skills / git-fg / thecattoolkit / browsing-web

browsing-web skill

safe

/plugins/sys-browser/skills/browsing-web

This skill automates dynamic website navigation and interactions using browser references, enabling reliable clicks, typing, and stateful workflows.

npx playbooks add skill git-fg/thecattoolkit --skill browsing-web

Review the files below or copy the command above to add this skill to your agents.

Files (1)

SKILL.md

1.3 KB

---
name: browsing-web
description: "Interactive browser automation using agent-browser. Use when navigating dynamic sites, authentication, clicking, typing, and complex state navigation. Do NOT use for simple read-only text extraction."
allowed-tools: [Bash]
---

# Browser Interaction Protocol

## Core Loop (The Ref Pattern)
You interact with the browser using **References (@refs)** derived from snapshots, not CSS selectors.

1.  **Navigate**: `agent-browser open "url"`
2.  **Snapshot**: `agent-browser snapshot -i` (Gets accessibility tree with `@e` refs)
3.  **Interact**: `agent-browser click @e1` (Uses ref from snapshot)

## Critical Constraints
1.  **Never Guess Selectors**: You cannot guess `@e1`. You MUST run `snapshot` to see current refs.
2.  **Interactive Only**: Always use `snapshot -i` to filter non-interactive elements (saves tokens).
3.  **Stateful**: The browser persists between commands. You do not need to re-open.

## Common Patterns

### Navigation & extraction
```bash
agent-browser open "https://google.com"
agent-browser snapshot -i
# Output shows: [ref=e4] button "Search"
agent-browser fill @e2 "Claude Code"
agent-browser click @e4
agent-browser wait --load networkidle
```

### Visual Verification
Only if structure is confusing:
```bash
agent-browser screenshot page.png
```

Overview

This skill enables interactive browser automation using the agent-browser protocol to navigate dynamic websites, perform authentication, click and type, and manage complex UI state. It is designed for tasks that require action and stateful interaction rather than simple read-only scraping. Use it when you need precise, repeatable control of a live browser session.

How this skill works

The skill operates with a reference-driven loop: open a page, take an interactive snapshot, then interact using element references (refs) returned by the snapshot. You must run snapshot -i to receive @e refs that identify interactive elements; actions like click, fill, wait, and screenshot use those refs. The browser session is stateful and persists between commands so you can chain navigation and interactions without re-opening the page.

When to use it

Logging into sites with multi-step authentication or MFA flows
Clicking buttons, filling forms, and navigating single-page applications
Testing or reproducing complex user journeys that require stateful interaction
Interacting with dynamic content that changes DOM or requires JS-driven events
Automating multi-step workflows involving waits for network or UI updates

Best practices

Always run agent-browser snapshot -i before interacting to get current @e refs; never guess refs
Use wait --load networkidle after navigation or actions that trigger network activity to ensure stable state
Prefer refs from snapshots rather than visual heuristics or guessed selectors to avoid flakiness
Take screenshots only when needed for visual verification to reduce overhead
Chain commands within the same browser session to preserve state (cookies, localStorage, session)

Example use cases

Automate sign-in, handle consent screens, and navigate to a protected resource
Fill complex multi-page forms with validation and conditional fields
Reproduce a user flow for QA that involves clicking, typing, and waiting for async results
Scrape data that requires interacting with controls (e.g., revealing hidden content, pagination)

FAQ

Can I interact with elements without running snapshot?

No. You must run snapshot -i to obtain @e refs. Guessing refs is not supported and will fail.

Do I need to re-open the browser between steps?

No. The browser session is stateful and persists across commands, so you can continue interacting without re-opening.

When should I take a screenshot?

Use screenshots only for visual verification when the accessibility tree is unclear or to capture visual regressions; rely on snapshots for element references.