home / skills / eyadsibai / ltk / agent-browser

agent-browser skill

/plugins/ltk-devops/skills/agent-browser

This skill automates browser tasks via a CLI, enabling AI agents to navigate, fill forms, capture snapshots, and scrape pages efficiently.

npx playbooks add skill eyadsibai/ltk --skill agent-browser

Review the files below or copy the command above to add this skill to your agents.

Files (1)

SKILL.md

4.8 KB

---
name: agent-browser
description: Use when automating browser interactions via CLI, filling forms, taking screenshots, scraping pages, or asking about "agent-browser", "browser automation", "headless browser", "web scraping", "form filling", "Vercel browser"
version: 1.0.0
---

# agent-browser: CLI Browser Automation

Vercel's headless browser automation CLI designed for AI agents. Uses ref-based selection (@e1, @e2) from accessibility snapshots.

## Setup

```bash
# Check installation
command -v agent-browser >/dev/null 2>&1 && echo "Installed" || echo "NOT INSTALLED"

# Install if needed
npm install -g agent-browser
agent-browser install  # Downloads Chromium
```

## Core Workflow

The snapshot + ref pattern is optimal for LLMs:

1. **Navigate** to URL
2. **Snapshot** to get interactive elements with refs
3. **Interact** using refs (@e1, @e2, etc.)
4. **Re-snapshot** after navigation or DOM changes

```bash
agent-browser open https://example.com
agent-browser snapshot -i          # Get refs
agent-browser click @e1            # Use ref
agent-browser fill @e2 "text"
agent-browser snapshot -i          # Re-snapshot
```

## Key Commands

### Navigation

```bash
agent-browser open <url>       # Navigate to URL
agent-browser back             # Go back
agent-browser forward          # Go forward
agent-browser reload           # Reload page
agent-browser close            # Close browser
```

### Snapshots (Essential for AI)

```bash
agent-browser snapshot              # Full accessibility tree
agent-browser snapshot -i           # Interactive elements only (recommended)
agent-browser snapshot -i --json    # JSON output for parsing
agent-browser snapshot -c           # Compact (remove empty)
agent-browser snapshot -d 3         # Limit depth
```

### Interactions

```bash
agent-browser click @e1                    # Click element
agent-browser dblclick @e1                 # Double-click
agent-browser fill @e1 "text"              # Clear and fill input
agent-browser type @e1 "text"              # Type without clearing
agent-browser press Enter                  # Press key
agent-browser hover @e1                    # Hover element
agent-browser check @e1                    # Check checkbox
agent-browser uncheck @e1                  # Uncheck
agent-browser select @e1 "option"          # Select dropdown
agent-browser scroll down 500              # Scroll
agent-browser scrollintoview @e1           # Scroll element into view
```

### Get Information

```bash
agent-browser get text @e1          # Get element text
agent-browser get html @e1          # Get element HTML
agent-browser get value @e1         # Get input value
agent-browser get attr href @e1     # Get attribute
agent-browser get title             # Get page title
agent-browser get url               # Get current URL
```

### Screenshots & PDFs

```bash
agent-browser screenshot                      # Viewport screenshot
agent-browser screenshot --full               # Full page
agent-browser screenshot output.png           # Save to file
agent-browser pdf output.pdf                  # Save as PDF
```

### Wait

```bash
agent-browser wait @e1              # Wait for element
agent-browser wait 2000             # Wait milliseconds
agent-browser wait "text"           # Wait for text
```

## Examples

### Login Flow

```bash
agent-browser open https://app.example.com/login
agent-browser snapshot -i
# Output: textbox "Email" [ref=e1], textbox "Password" [ref=e2], button "Sign in" [ref=e3]
agent-browser fill @e1 "[email protected]"
agent-browser fill @e2 "password123"
agent-browser click @e3
agent-browser wait 2000
agent-browser snapshot -i  # Verify logged in
```

### Form Filling

```bash
agent-browser open https://forms.example.com
agent-browser snapshot -i
agent-browser fill @e1 "John Doe"
agent-browser fill @e2 "[email protected]"
agent-browser select @e3 "United States"
agent-browser check @e4  # Agree to terms
agent-browser click @e5  # Submit
agent-browser screenshot confirmation.png
```

### Debug Mode (Visible Browser)

```bash
agent-browser --headed open https://example.com
agent-browser --headed snapshot -i
agent-browser --headed click @e1
```

## Sessions (Parallel Browsers)

```bash
agent-browser --session browser1 open https://site1.com
agent-browser --session browser2 open https://site2.com
agent-browser session list
```

## JSON Output

```bash
agent-browser snapshot -i --json
```

Returns:

```json
{
  "success": true,
  "data": {
    "refs": {
      "e1": {"name": "Submit", "role": "button"},
      "e2": {"name": "Email", "role": "textbox"}
    }
  }
}
```

## When to Use vs Alternatives

**Use agent-browser when:**

- Prefer Bash-based workflows
- Need quick one-off automation
- Want simpler CLI commands

**Use Playwright MCP when:**

- Need deep MCP tool integration
- Building complex automation pipelines
- Want tool-based responses

Overview

This skill provides a CLI-driven headless browser automation tool optimized for AI agents and quick shell workflows. It exposes a snapshot + ref model so you can locate and interact with accessible elements using stable refs like @e1. Use it to navigate pages, fill forms, take screenshots, scrape values, and run parallel sessions from the command line.

How this skill works

The tool captures an accessibility snapshot of the page and assigns short refs (e.g., @e1) to interactive elements. You run commands to open URLs, snapshot the tree, and then click, fill, select, or extract values by referring to those refs. After navigation or DOM changes you re-snapshot to refresh refs; JSON output is available for parsing and orchestration.

When to use it

Quick one-off browser automations from shell or scripts
Form filling, login flows, and simple multi-step interactions
Screen capture or PDF export of pages from CI or local scripts
Lightweight scraping of text, attributes, and input values
Parallel sessions when automating multiple targets concurrently

Best practices

Always run snapshot -i to get interactive refs before interacting with elements
Re-snapshot after navigation or major DOM updates to get current refs
Prefer get commands (get text, get attr) for scraping instead of brittle selectors
Use --headed mode to debug visually when refs are ambiguous
Save snapshot -i --json output for deterministic parsing in automation pipelines

Example use cases

Automate login flows: open page, snapshot, fill credentials, click submit, re-snapshot to confirm state
Form population: fill multiple inputs, select dropdowns, check boxes, then screenshot confirmation
Web scraping: snapshot -i --json then extract text or attributes programmatically
Visual testing: take full-page screenshots or PDFs after interactions for regression checks
Parallel browsing: open multiple sessions to perform simultaneous checks across sites

FAQ

What is a ref and why use it?

A ref (e.g., @e1) is a short identifier assigned to an accessible element in a snapshot. Using refs avoids brittle CSS/XPath selectors and maps directly to the accessibility tree the tool exposes.

How do I handle navigation changes?

After navigation or significant DOM updates, run snapshot -i again to refresh refs. Interactions should reference the latest snapshot to remain reliable.

Can I parse snapshots in scripts?

Yes. Use snapshot -i --json to get structured output containing refs and element metadata for programmatic parsing.