home / skills / git-fg / thecattoolkit / browsing-web

This skill automates dynamic website navigation and interactions using browser references, enabling reliable clicks, typing, and stateful workflows.

npx playbooks add skill git-fg/thecattoolkit --skill browsing-web

Review the files below or copy the command above to add this skill to your agents.

Files (1)
SKILL.md
1.3 KB
---
name: browsing-web
description: "Interactive browser automation using agent-browser. Use when navigating dynamic sites, authentication, clicking, typing, and complex state navigation. Do NOT use for simple read-only text extraction."
allowed-tools: [Bash]
---

# Browser Interaction Protocol

## Core Loop (The Ref Pattern)
You interact with the browser using **References (@refs)** derived from snapshots, not CSS selectors.

1.  **Navigate**: `agent-browser open "url"`
2.  **Snapshot**: `agent-browser snapshot -i` (Gets accessibility tree with `@e` refs)
3.  **Interact**: `agent-browser click @e1` (Uses ref from snapshot)

## Critical Constraints
1.  **Never Guess Selectors**: You cannot guess `@e1`. You MUST run `snapshot` to see current refs.
2.  **Interactive Only**: Always use `snapshot -i` to filter non-interactive elements (saves tokens).
3.  **Stateful**: The browser persists between commands. You do not need to re-open.

## Common Patterns

### Navigation & extraction
```bash
agent-browser open "https://google.com"
agent-browser snapshot -i
# Output shows: [ref=e4] button "Search"
agent-browser fill @e2 "Claude Code"
agent-browser click @e4
agent-browser wait --load networkidle
```

### Visual Verification
Only if structure is confusing:
```bash
agent-browser screenshot page.png
```

Overview

This skill enables interactive browser automation using the agent-browser protocol to navigate dynamic websites, perform authentication, click and type, and manage complex UI state. It is designed for tasks that require action and stateful interaction rather than simple read-only scraping. Use it when you need precise, repeatable control of a live browser session.

How this skill works

The skill operates with a reference-driven loop: open a page, take an interactive snapshot, then interact using element references (refs) returned by the snapshot. You must run snapshot -i to receive @e refs that identify interactive elements; actions like click, fill, wait, and screenshot use those refs. The browser session is stateful and persists between commands so you can chain navigation and interactions without re-opening the page.

When to use it

  • Logging into sites with multi-step authentication or MFA flows
  • Clicking buttons, filling forms, and navigating single-page applications
  • Testing or reproducing complex user journeys that require stateful interaction
  • Interacting with dynamic content that changes DOM or requires JS-driven events
  • Automating multi-step workflows involving waits for network or UI updates

Best practices

  • Always run agent-browser snapshot -i before interacting to get current @e refs; never guess refs
  • Use wait --load networkidle after navigation or actions that trigger network activity to ensure stable state
  • Prefer refs from snapshots rather than visual heuristics or guessed selectors to avoid flakiness
  • Take screenshots only when needed for visual verification to reduce overhead
  • Chain commands within the same browser session to preserve state (cookies, localStorage, session)

Example use cases

  • Automate sign-in, handle consent screens, and navigate to a protected resource
  • Fill complex multi-page forms with validation and conditional fields
  • Reproduce a user flow for QA that involves clicking, typing, and waiting for async results
  • Scrape data that requires interacting with controls (e.g., revealing hidden content, pagination)

FAQ

Can I interact with elements without running snapshot?

No. You must run snapshot -i to obtain @e refs. Guessing refs is not supported and will fail.

Do I need to re-open the browser between steps?

No. The browser session is stateful and persists across commands, so you can continue interacting without re-opening.

When should I take a screenshot?

Use screenshots only for visual verification when the accessibility tree is unclear or to capture visual regressions; rely on snapshots for element references.