home / skills / partme-ai / full-stack-skills / agent-browser

agent-browser skill

/skills/agent-browser

This skill automates browser tasks via the agent-browser CLI, enabling AI agents to interact with web pages reliably using refs, semantic locators, and

npx playbooks add skill partme-ai/full-stack-skills --skill agent-browser

Review the files below or copy the command above to add this skill to your agents.

Files (28)
SKILL.md
7.4 KB
---
name: agent-browser
description: A comprehensive skill for using agent-browser, a CLI tool for browser automation designed for AI agents, developed by Vercel Labs. This skill covers installation, core commands, selectors (refs, CSS, XPath, semantic locators), agent mode, sessions, options, and best practices. Use this skill whenever the user needs to automate browser interactions via CLI commands, especially for AI agents that need to interact with web pages.
license: Complete terms in LICENSE.txt
---

## When to use this skill

Use this skill whenever the user wants to:
- Automate browser interactions via CLI commands
- Use browser automation for AI agents
- Navigate websites and interact with pages using command-line tools
- Use refs-based element selection for deterministic automation
- Integrate browser automation into AI agent workflows
- Capture snapshots of web pages with accessibility trees
- Fill forms, click elements, and extract content via CLI
- Use semantic locators for more reliable element selection
- Work with browser automation in agent mode with JSON output
- Manage multiple browser sessions
- Debug browser automation with headed mode
- Use authenticated sessions with custom headers
- Connect to existing browsers via CDP
- Stream browser viewport for live preview

## How to use this skill

This skill is organized to match the agent-browser official documentation structure (https://github.com/vercel-labs/agent-browser/blob/main/README.md). When working with agent-browser:

1. **Install agent-browser**:
   - Load `examples/getting-started/installation.md` for installation instructions

2. **Quick Start**:
   - Load `examples/quick-start/quick-start.md` for basic workflow examples

3. **Learn core commands**:
   - Load `examples/commands/basic-commands.md` for basic commands (open, click, fill, etc.)
   - Load `examples/commands/advanced-commands.md` for advanced commands (snapshot, eval, etc.)
   - Load `examples/commands/get-info/` for information retrieval commands
   - Load `examples/commands/check-state/` for state checking commands
   - Load `examples/commands/find-elements/` for semantic locator commands
   - Load `examples/commands/wait/` for wait commands
   - Load `examples/commands/mouse-control/` for mouse control commands
   - Load `examples/commands/browser-settings/` for browser configuration
   - Load `examples/commands/cookies-storage/` for cookies and storage management
   - Load `examples/commands/network/` for network interception
   - Load `examples/commands/tabs-windows/` for tab and window management
   - Load `examples/commands/frames/` for iframe handling
   - Load `examples/commands/dialogs/` for dialog handling
   - Load `examples/commands/debug/` for debugging commands
   - Load `examples/commands/navigation/` for navigation commands
   - Load `examples/commands/setup/` for setup commands

4. **Understand selectors**:
   - Load `examples/selectors/refs.md` for refs-based selection (@e1, @e2, etc.)
   - Load `examples/selectors/traditional-selectors.md` for CSS, XPath, and semantic locators

5. **Use agent mode**:
   - Load `examples/agent-mode/introduction.md` for agent mode overview
   - Load `examples/agent-mode/optimal-workflow.md` for optimal AI workflow
   - Load `examples/agent-mode/integration.md` for integrating with AI agents

6. **Advanced features**:
   - Load `examples/advanced/sessions.md` for session management
   - Load `examples/advanced/headed-mode.md` for debugging with visible browser
   - Load `examples/advanced/authenticated-sessions.md` for authentication via headers
   - Load `examples/advanced/custom-executable.md` for custom browser executable
   - Load `examples/advanced/cdp-mode.md` for Chrome DevTools Protocol integration
   - Load `examples/advanced/streaming.md` for browser viewport streaming
   - Load `examples/advanced/architecture.md` for architecture overview
   - Load `examples/advanced/platforms.md` for platform support
   - Load `examples/advanced/usage-with-agents.md` for AI agent integration patterns

7. **Configure options**:
   - Load `examples/options/global-options.md` for global CLI options
   - Load `examples/options/snapshot-options.md` for snapshot-specific options
   - Load `examples/options/session-options.md` for session management options

8. **Reference API documentation** when needed:
   - `api/commands.md` - Complete command reference
   - `api/selectors.md` - Selector reference
   - `api/options.md` - Options reference

9. **Use templates** for quick start:
   - `templates/basic-automation.md` - Basic automation workflow
   - `templates/ai-agent-workflow.md` - AI agent workflow template


### Doc mapping (one-to-one with official documentation)

- See examples and API files → https://github.com/vercel-labs/agent-browser

## Examples and Templates

This skill includes detailed examples organized to match the official documentation structure. All examples are in the `examples/` directory (see mapping above).

**To use examples:**
- Identify the topic from the user's request
- Load the appropriate example file from the mapping above
- Follow the instructions, syntax, and best practices in that file
- Adapt the code examples to your specific use case

**To use templates:**
- Reference templates in `templates/` directory for common scaffolding
- Adapt templates to your specific needs and coding style

## API Reference

- **Commands API**: `api/commands.md` - Complete command reference with syntax and examples
- **Selectors API**: `api/selectors.md` - Selector types and usage reference
- **Options API**: `api/options.md` - All options reference

## Best Practices

1. **Use Refs**: Prefer refs (@e1, @e2) over traditional selectors for deterministic automation
2. **Snapshot First**: Always snapshot before interacting with elements to get refs
3. **Agent Mode**: Use `--json` flag for machine-readable output in agent mode
4. **Session Management**: Use `--session` to maintain state across commands
5. **Interactive Snapshot**: Use `-i` flag for interactive snapshot selection
6. **Semantic Locators**: Use semantic locators (role/name) when refs are not available
7. **Error Handling**: Check command exit codes and error messages
8. **Wait for Navigation**: Commands automatically wait for navigation to complete
9. **Headed Mode**: Use `--headed` for debugging, headless for production
10. **CDP Integration**: Use `--cdp` for Chrome DevTools Protocol integration
11. **Streaming**: Use `AGENT_BROWSER_STREAM_PORT` for live browser preview
12. **Authenticated Sessions**: Use `--headers` for authentication without login flows
13. **Custom Executable**: Use `--executable-path` for serverless deployments or custom browsers
14. **Snapshot Options**: Combine `-i`, `-c`, `-d`, `-s` options to optimize snapshot output

## Resources

- **GitHub Repository**: https://github.com/vercel-labs/agent-browser
- **Official README**: https://github.com/vercel-labs/agent-browser/blob/main/README.md
- **Agent Mode Documentation**: https://agent-browser.dev/agent-mode
- **Issues**: https://github.com/vercel-labs/agent-browser/issues

## Keywords

agent-browser, CLI browser automation, AI agents, browser automation CLI, refs, snapshot, agent mode, semantic locators, browser automation tool, command-line browser, AI agent browser, deterministic selectors, accessibility tree, browser commands, web automation CLI, sessions, headed mode, authenticated sessions, CDP mode, streaming, Chrome DevTools Protocol, Playwright, browser automation for AI

Overview

This skill provides a comprehensive guide and examples for using agent-browser, a CLI tool for browser automation tailored to AI agents. It covers installation, core commands, selector strategies, agent mode, session management, advanced options, and practical best practices. Use it to automate interactions, capture snapshots, and integrate browser flows into agent pipelines.

How this skill works

The skill maps directly to agent-browser documentation and example files so you can load targeted examples or API references for each task. It explains core CLI commands (open, click, fill, snapshot, eval, etc.), selector types (refs, CSS, XPath, semantic locators), and advanced features like agent mode with JSON output, CDP integration, and viewport streaming. Examples and templates are organized to be adapted into AI agent workflows or standalone automation scripts.

When to use it

  • Automating web interactions from the command line for scraping, testing, or workflows.
  • Integrating browser control into AI agents that need deterministic page interactions.
  • Capturing snapshots and accessibility trees before manipulating page state.
  • Managing multi-step sessions or authenticated flows without full browser login.
  • Debugging flows with headed mode or streaming a live viewport for review.

Best practices

  • Prefer refs (@e1, @e2) for deterministic element targeting; snapshot first to generate refs.
  • Use --json (agent mode) for machine-readable outputs when integrating with agents.
  • Maintain state across steps with --session for reliable multi-command workflows.
  • Use headed mode (--headed) while debugging and headless for production runs.
  • Combine snapshot options (-i, -c, -d, -s) to optimize selection and output.
  • Use --headers and authenticated sessions to avoid brittle login automation.

Example use cases

  • Run a step-by-step AI agent that navigates a site, extracts structured data, and stores results using refs and snapshots.
  • Create a CLI-driven test harness that opens pages, clicks elements, checks states, and reports pass/fail via exit codes.
  • Automate form submission flows with session persistence across commands and authorization headers.
  • Stream the browser viewport to a preview port during development for live UX inspection.
  • Connect to an existing browser via CDP for advanced debugging or to reuse an authenticated profile.

FAQ

How do I get deterministic element selection?

Take a snapshot first to generate refs and prefer using refs (@e1) over CSS/XPath; use semantic locators when refs are unavailable.

When should I use agent mode (--json)?

Use agent mode when an AI agent or automation pipeline needs structured JSON output to parse results and make decisions programmatically.