home / skills / supercent-io / skills-template / agent-browser

agent-browser skill

/.agent-skills/agent-browser

This skill enables deterministic web automation with a headless browser, using accessibility tree refs for reliable interactions and isolated sessions.

npx playbooks add skill supercent-io/skills-template --skill agent-browser

Review the files below or copy the command above to add this skill to your agents.

Files (2)
SKILL.md
2.0 KB
---
name: agent-browser
description: Fast headless browser CLI for AI agents. Supports deterministic element selection via accessibility tree snapshots and refs (@e1, @e2).
allowed-tools: [Read, Write, Bash, Grep, Glob]
tags: [browser-automation, headless-browser, ai-agent, playwright, web-scraping]
platforms: [Claude, Gemini, Codex, ChatGPT]
version: 1.0.0
source: vercel-labs/agent-browser
---

# agent-browser - Headless Browser for AI Agents

## When to use this skill

- Web automation and E2E testing
- Scraping data from modern web apps
- Deterministic element interaction using accessibility tree refs
- Isolated browser sessions for different agent tasks

---

## 1. Installation

```bash
npx skills add vercel-labs/agent-browser
# or
npm install -g agent-browser
agent-browser install
```

---

## 2. Core Workflow (Deterministic Interaction)

AI agents should use the snapshot + ref workflow for best results:

1. **Navigate**: `agent-browser open <url>`
2. **Snapshot**: `agent-browser snapshot -i` (Returns tree with refs like @e1, @e2)
3. **Interact**: `agent-browser click @e1` or `agent-browser fill @e2 "text"`
4. **Repeat**: Snapshot again if page changes

---

## 3. Key Commands

| Command | Description |
|---------|-------------|
| `open <url>` | Navigate to a URL |
| `snapshot` | Get accessibility tree with refs |
| `click <sel>` | Click element (by ref or CSS) |
| `fill <sel> <text>` | Clear and fill input |
| `screenshot [path]` | Take page screenshot |
| `close` | Quit browser session |

---

## 4. Advanced Features

- **Isolated Sessions**: Use `--session <name>` to isolate cookies/storage.
- **Persistent Profiles**: Use `--profile <path>` to persist login sessions.
- **Semantic Locators**: `find role button click --name "Submit"`
- **JavaScript Execution**: `eval "window.scrollTo(0, 100)"`

---

## Quick Reference

```bash
# Optimal AI Workflow
agent-browser open example.com
agent-browser snapshot -i --json
# (AI parses refs)
agent-browser click @e2
```

Overview

This skill provides a fast headless browser CLI tailored for AI agents, focusing on deterministic element selection using accessibility tree snapshots and refs (e.g., @e1). It enables isolated browser sessions, persistent profiles, and a concise command set for navigation, interaction, and inspection. The design prioritizes reproducible automation across modern web apps and single-page applications.

How this skill works

Agents operate with a snapshot + ref workflow: open a page, capture an accessibility-tree snapshot that returns stable refs, then invoke interactions (click, fill) by ref or CSS. Sessions can be isolated with named sessions and persistent profiles to retain logins across runs. Additional commands allow semantic queries (role/name), screenshots, and running arbitrary JavaScript for advanced control.

When to use it

  • Automating interactions on modern single-page applications where DOM/IDs change frequently
  • End-to-end and integration testing driven by AI agents
  • Scraping dynamic content while minimizing brittle CSS selector dependence
  • Running multiple isolated agent tasks that require separate cookies/storage
  • Maintaining login sessions across runs using persistent profiles

Best practices

  • Use snapshot -i to get accessibility-tree refs and prefer refs (e.g., @e1) over raw CSS selectors for determinism
  • Take a fresh snapshot after any DOM-changing interaction before targeting new elements
  • Use --session <name> to isolate tasks and --profile <path> for persistent authentication between runs
  • Prefer semantic locators (role + name) for readable, robust interactions when available
  • Keep interactions idempotent and handle timeouts or retries for flaky network conditions

Example use cases

  • AI-driven E2E test that opens a page, snapshots the accessibility tree, clicks buttons by ref, and validates flow
  • Scraping data from a dashboard SPA where element positions change but accessibility refs remain stable
  • Automating login and multi-step form fills using persistent profiles to avoid repeated auth
  • Concurrent agent workflows where each agent runs in an isolated session to prevent cross-talk
  • Triggering complex UI behaviors by eval()ing small scripts then snapshotting to resume deterministic actions

FAQ

How do refs like @e1 stay deterministic?

Refs are generated from accessibility-tree snapshots that map stable accessibility node identities rather than volatile DOM paths, making interactions resilient to layout changes.

Can I persist login state between runs?

Yes. Use --profile <path> to store and reuse browser profile data so sessions, cookies, and local storage persist across agent runs.