home / skills / ruvnet / ruflo / browser

This skill enables AI agents to automate web tasks with AI-optimized browser snapshots, reducing context and enabling precise interactions.

npx playbooks add skill ruvnet/ruflo --skill browser

Review the files below or copy the command above to add this skill to your agents.

Files (1)
SKILL.md
5.2 KB
---
name: browser
description: Web browser automation with AI-optimized snapshots for claude-flow agents
version: 1.0.0
triggers:
  - /browser
  - browse
  - web automation
  - scrape
  - navigate
  - screenshot
tools:
  - browser/open
  - browser/snapshot
  - browser/click
  - browser/fill
  - browser/screenshot
  - browser/close
---

# Browser Automation Skill

Web browser automation using agent-browser with AI-optimized snapshots. Reduces context by 93% using element refs (@e1, @e2) instead of full DOM.

## Core Workflow

```bash
# 1. Navigate to page
agent-browser open <url>

# 2. Get accessibility tree with element refs
agent-browser snapshot -i    # -i = interactive elements only

# 3. Interact using refs from snapshot
agent-browser click @e2
agent-browser fill @e3 "text"

# 4. Re-snapshot after page changes
agent-browser snapshot -i
```

## Quick Reference

### Navigation
| Command | Description |
|---------|-------------|
| `open <url>` | Navigate to URL |
| `back` | Go back |
| `forward` | Go forward |
| `reload` | Reload page |
| `close` | Close browser |

### Snapshots (AI-Optimized)
| Command | Description |
|---------|-------------|
| `snapshot` | Full accessibility tree |
| `snapshot -i` | Interactive elements only (buttons, links, inputs) |
| `snapshot -c` | Compact (remove empty elements) |
| `snapshot -d 3` | Limit depth to 3 levels |
| `screenshot [path]` | Capture screenshot (base64 if no path) |

### Interaction
| Command | Description |
|---------|-------------|
| `click <sel>` | Click element |
| `fill <sel> <text>` | Clear and fill input |
| `type <sel> <text>` | Type with key events |
| `press <key>` | Press key (Enter, Tab, etc.) |
| `hover <sel>` | Hover element |
| `select <sel> <val>` | Select dropdown option |
| `check/uncheck <sel>` | Toggle checkbox |
| `scroll <dir> [px]` | Scroll page |

### Get Info
| Command | Description |
|---------|-------------|
| `get text <sel>` | Get text content |
| `get html <sel>` | Get innerHTML |
| `get value <sel>` | Get input value |
| `get attr <sel> <attr>` | Get attribute |
| `get title` | Get page title |
| `get url` | Get current URL |

### Wait
| Command | Description |
|---------|-------------|
| `wait <selector>` | Wait for element |
| `wait <ms>` | Wait milliseconds |
| `wait --text "text"` | Wait for text |
| `wait --url "pattern"` | Wait for URL |
| `wait --load networkidle` | Wait for load state |

### Sessions
| Command | Description |
|---------|-------------|
| `--session <name>` | Use isolated session |
| `session list` | List active sessions |

## Selectors

### Element Refs (Recommended)
```bash
# Get refs from snapshot
agent-browser snapshot -i
# Output: button "Submit" [ref=e2]

# Use ref to interact
agent-browser click @e2
```

### CSS Selectors
```bash
agent-browser click "#submit"
agent-browser fill ".email-input" "[email protected]"
```

### Semantic Locators
```bash
agent-browser find role button click --name "Submit"
agent-browser find label "Email" fill "[email protected]"
agent-browser find testid "login-btn" click
```

## Examples

### Login Flow
```bash
agent-browser open https://example.com/login
agent-browser snapshot -i
agent-browser fill @e2 "[email protected]"
agent-browser fill @e3 "password123"
agent-browser click @e4
agent-browser wait --url "**/dashboard"
```

### Form Submission
```bash
agent-browser open https://example.com/contact
agent-browser snapshot -i
agent-browser fill @e1 "John Doe"
agent-browser fill @e2 "[email protected]"
agent-browser fill @e3 "Hello, this is my message"
agent-browser click @e4
agent-browser wait --text "Thank you"
```

### Data Extraction
```bash
agent-browser open https://example.com/products
agent-browser snapshot -i
# Iterate through product refs
agent-browser get text @e1  # Product name
agent-browser get text @e2  # Price
agent-browser get attr @e3 href  # Link
```

### Multi-Session (Swarm)
```bash
# Session 1: Navigator
agent-browser --session nav open https://example.com
agent-browser --session nav state save auth.json

# Session 2: Scraper (uses same auth)
agent-browser --session scrape state load auth.json
agent-browser --session scrape open https://example.com/data
agent-browser --session scrape snapshot -i
```

## Integration with Claude Flow

### MCP Tools
All browser operations are available as MCP tools with `browser/` prefix:
- `browser/open`
- `browser/snapshot`
- `browser/click`
- `browser/fill`
- `browser/screenshot`
- etc.

### Memory Integration
```bash
# Store successful patterns
npx @claude-flow/cli memory store --namespace browser-patterns --key "login-flow" --value "snapshot->fill->click->wait"

# Retrieve before similar task
npx @claude-flow/cli memory search --query "login automation"
```

### Hooks
```bash
# Pre-browse hook (get context)
npx @claude-flow/cli hooks pre-edit --file "browser-task.ts"

# Post-browse hook (record success)
npx @claude-flow/cli hooks post-task --task-id "browse-1" --success true
```

## Tips

1. **Always use snapshots** - They're optimized for AI with refs
2. **Prefer `-i` flag** - Gets only interactive elements, smaller output
3. **Use refs, not selectors** - More reliable, deterministic
4. **Re-snapshot after navigation** - Page state changes
5. **Use sessions for parallel work** - Each session is isolated

Overview

This skill provides web browser automation with AI-optimized snapshots and element refs for Claude-flow agents. It lets agents navigate pages, capture compact accessibility trees, interact reliably using refs, and integrate with multi-agent workflows and memory stores. The approach reduces context size dramatically and is built for coordinated, autonomous browsing tasks.

How this skill works

The skill captures an accessibility-focused snapshot of the page and assigns short element refs (e.g., @e2) so agents can reference UI elements without sending full DOM. Agents issue high-level commands (open, click, fill, snapshot, screenshot, wait) against those refs or standard selectors. Snapshots can be compacted (interactive-only, depth limits) and are exposed as MCP tools so Claude Flow agents can orchestrate distributed sessions and reuse stored patterns.

When to use it

  • Automating login, form submission, and routine web tasks with minimal context cost
  • Scraping structured data while minimizing payload by using interactive snapshots
  • Coordinating multi-agent workflows that require isolated sessions or shared auth
  • Integrating browser actions into Claude Flow pipelines and memory-based reuse
  • Testing UI flows or reproducing user journeys deterministically

Best practices

  • Always take a snapshot before interacting and re-snapshot after navigation or DOM changes
  • Prefer interactive snapshots (-i) to reduce output and focus on actionable elements
  • Use element refs (@eN) returned by snapshots rather than brittle CSS selectors
  • Manage parallel work with named sessions to isolate cookies and state
  • Store successful flow patterns in memory to reuse proven sequences across tasks

Example use cases

  • Login automation: open login page, snapshot -i, fill email/password via refs, click submit, wait for dashboard URL
  • Form automation: open contact page, snapshot -i, fill fields by refs, click submit, wait for confirmation text
  • Data extraction: open listings page, snapshot -i, iterate product refs to get name, price, and link attributes
  • Multi-session swarm: one session performs navigation/auth, another loads saved auth state and scrapes protected pages
  • Integration with Claude Flow: expose browser/ tools to agents for coordinated browsing and hook-based logging

FAQ

How do element refs reduce context?

Refs map important accessibility nodes to short identifiers so agents reference elements by id instead of shipping the full DOM, dramatically shrinking payloads.

When should I use CSS selectors vs refs?

Use refs by default for reliability. Use CSS selectors for ad-hoc or external tooling when refs are unavailable.