home / skills / openclaw / skills / agentic-browser

agentic-browser skill

needs review

This skill automates web navigation and data extraction for AI agents using Playwright, enabling reliable interactions, screenshots, and video records.

npx playbooks add skill openclaw/skills --skill agentic-browser

Review the files below or copy the command above to add this skill to your agents.

Files (11)

SKILL.md

9.7 KB

---
name: agent-browser
description: "Browser automation for AI agents via inference.sh. Navigate web pages, interact with elements using @e refs, take screenshots, record video. Capabilities: web scraping, form filling, clicking, typing, drag-drop, file upload, JavaScript execution. Use for: web automation, data extraction, testing, agent browsing, research. Triggers: browser, web automation, scrape, navigate, click, fill form, screenshot, browse web, playwright, headless browser, web agent, surf internet, record video"
allowed-tools: Bash(infsh *)
---

# Agentic Browser

Browser automation for AI agents via [inference.sh](https://inference.sh). Uses Playwright under the hood with a simple `@e` ref system for element interaction.

![Agentic Browser](https://cloud.inference.sh/app/files/u/4mg21r6ta37mpaz6ktzwtt8krr/01kgjw8atdxgkrsr8a2t5peq7b.jpeg)

## Quick Start

```bash
# Install CLI
curl -fsSL https://cli.inference.sh | sh && infsh login

# Open a page and get interactive elements
infsh app run agent-browser --function open --input '{"url": "https://example.com"}' --session new
```

> **Install note:** The [install script](https://cli.inference.sh) only detects your OS/architecture, downloads the matching binary from `dist.inference.sh`, and verifies its SHA-256 checksum. No elevated permissions or background processes. [Manual install & verification](https://dist.inference.sh/cli/checksums.txt) available.

## Core Workflow

Every browser automation follows this pattern:

1. **Open** - Navigate to URL, get `@e` refs for elements
2. **Interact** - Use refs to click, fill, drag, etc.
3. **Re-snapshot** - After navigation/changes, get fresh refs
4. **Close** - End session (returns video if recording)

```bash
# 1. Start session
RESULT=$(infsh app run agent-browser --function open --session new --input '{
  "url": "https://example.com/login"
}')
SESSION_ID=$(echo $RESULT | jq -r '.session_id')
# Elements: @e1 [input] "Email", @e2 [input] "Password", @e3 [button] "Sign In"

# 2. Fill and submit
infsh app run agent-browser --function interact --session $SESSION_ID --input '{
  "action": "fill", "ref": "@e1", "text": "[email protected]"
}'
infsh app run agent-browser --function interact --session $SESSION_ID --input '{
  "action": "fill", "ref": "@e2", "text": "password123"
}'
infsh app run agent-browser --function interact --session $SESSION_ID --input '{
  "action": "click", "ref": "@e3"
}'

# 3. Re-snapshot after navigation
infsh app run agent-browser --function snapshot --session $SESSION_ID --input '{}'

# 4. Close when done
infsh app run agent-browser --function close --session $SESSION_ID --input '{}'
```

## Functions

| Function | Description |
|----------|-------------|
| `open` | Navigate to URL, configure browser (viewport, proxy, video recording) |
| `snapshot` | Re-fetch page state with `@e` refs after DOM changes |
| `interact` | Perform actions using `@e` refs (click, fill, drag, upload, etc.) |
| `screenshot` | Take page screenshot (viewport or full page) |
| `execute` | Run JavaScript code on the page |
| `close` | Close session, returns video if recording was enabled |

## Interact Actions

| Action | Description | Required Fields |
|--------|-------------|-----------------|
| `click` | Click element | `ref` |
| `dblclick` | Double-click element | `ref` |
| `fill` | Clear and type text | `ref`, `text` |
| `type` | Type text (no clear) | `text` |
| `press` | Press key (Enter, Tab, etc.) | `text` |
| `select` | Select dropdown option | `ref`, `text` |
| `hover` | Hover over element | `ref` |
| `check` | Check checkbox | `ref` |
| `uncheck` | Uncheck checkbox | `ref` |
| `drag` | Drag and drop | `ref`, `target_ref` |
| `upload` | Upload file(s) | `ref`, `file_paths` |
| `scroll` | Scroll page | `direction` (up/down/left/right), `scroll_amount` |
| `back` | Go back in history | - |
| `wait` | Wait milliseconds | `wait_ms` |
| `goto` | Navigate to URL | `url` |

## Element Refs

Elements are returned with `@e` refs:

```
@e1 [a] "Home" href="/"
@e2 [input type="text"] placeholder="Search"
@e3 [button] "Submit"
@e4 [select] "Choose option"
@e5 [input type="checkbox"] name="agree"
```

**Important:** Refs are invalidated after navigation. Always re-snapshot after:
- Clicking links/buttons that navigate
- Form submissions
- Dynamic content loading

## Features

### Video Recording

Record browser sessions for debugging or documentation:

```bash
# Start with recording enabled (optionally show cursor indicator)
SESSION=$(infsh app run agent-browser --function open --session new --input '{
  "url": "https://example.com",
  "record_video": true,
  "show_cursor": true
}' | jq -r '.session_id')

# ... perform actions ...

# Close to get the video file
infsh app run agent-browser --function close --session $SESSION --input '{}'
# Returns: {"success": true, "video": <File>}
```

### Cursor Indicator

Show a visible cursor in screenshots and video (useful for demos):

```bash
infsh app run agent-browser --function open --session new --input '{
  "url": "https://example.com",
  "show_cursor": true,
  "record_video": true
}'
```

The cursor appears as a red dot that follows mouse movements and shows click feedback.

### Proxy Support

Route traffic through a proxy server:

```bash
infsh app run agent-browser --function open --session new --input '{
  "url": "https://example.com",
  "proxy_url": "http://proxy.example.com:8080",
  "proxy_username": "user",
  "proxy_password": "pass"
}'
```

### File Upload

Upload files to file inputs:

```bash
infsh app run agent-browser --function interact --session $SESSION --input '{
  "action": "upload",
  "ref": "@e5",
  "file_paths": ["/path/to/file.pdf"]
}'
```

### Drag and Drop

Drag elements to targets:

```bash
infsh app run agent-browser --function interact --session $SESSION --input '{
  "action": "drag",
  "ref": "@e1",
  "target_ref": "@e2"
}'
```

### JavaScript Execution

Run custom JavaScript:

```bash
infsh app run agent-browser --function execute --session $SESSION --input '{
  "code": "document.querySelectorAll(\"h2\").length"
}'
# Returns: {"result": "5", "screenshot": <File>}
```

## Deep-Dive Documentation

| Reference | Description |
|-----------|-------------|
| [references/commands.md](references/commands.md) | Full function reference with all options |
| [references/snapshot-refs.md](references/snapshot-refs.md) | Ref lifecycle, invalidation rules, troubleshooting |
| [references/session-management.md](references/session-management.md) | Session persistence, parallel sessions |
| [references/authentication.md](references/authentication.md) | Login flows, OAuth, 2FA handling |
| [references/video-recording.md](references/video-recording.md) | Recording workflows for debugging |
| [references/proxy-support.md](references/proxy-support.md) | Proxy configuration, geo-testing |

## Ready-to-Use Templates

| Template | Description |
|----------|-------------|
| [templates/form-automation.sh](templates/form-automation.sh) | Form filling with validation |
| [templates/authenticated-session.sh](templates/authenticated-session.sh) | Login once, reuse session |
| [templates/capture-workflow.sh](templates/capture-workflow.sh) | Content extraction with screenshots |

## Examples

### Form Submission

```bash
SESSION=$(infsh app run agent-browser --function open --session new --input '{
  "url": "https://example.com/contact"
}' | jq -r '.session_id')

# Get elements: @e1 [input] "Name", @e2 [input] "Email", @e3 [textarea], @e4 [button] "Send"

infsh app run agent-browser --function interact --session $SESSION --input '{"action": "fill", "ref": "@e1", "text": "John Doe"}'
infsh app run agent-browser --function interact --session $SESSION --input '{"action": "fill", "ref": "@e2", "text": "[email protected]"}'
infsh app run agent-browser --function interact --session $SESSION --input '{"action": "fill", "ref": "@e3", "text": "Hello!"}'
infsh app run agent-browser --function interact --session $SESSION --input '{"action": "click", "ref": "@e4"}'

infsh app run agent-browser --function snapshot --session $SESSION --input '{}'
infsh app run agent-browser --function close --session $SESSION --input '{}'
```

### Search and Extract

```bash
SESSION=$(infsh app run agent-browser --function open --session new --input '{
  "url": "https://google.com"
}' | jq -r '.session_id')

infsh app run agent-browser --function interact --session $SESSION --input '{"action": "fill", "ref": "@e1", "text": "weather today"}'
infsh app run agent-browser --function interact --session $SESSION --input '{"action": "press", "text": "Enter"}'
infsh app run agent-browser --function interact --session $SESSION --input '{"action": "wait", "wait_ms": 2000}'

infsh app run agent-browser --function snapshot --session $SESSION --input '{}'
infsh app run agent-browser --function close --session $SESSION --input '{}'
```

### Screenshot with Video

```bash
SESSION=$(infsh app run agent-browser --function open --session new --input '{
  "url": "https://example.com",
  "record_video": true
}' | jq -r '.session_id')

# Take full page screenshot
infsh app run agent-browser --function screenshot --session $SESSION --input '{
  "full_page": true
}'

# Close and get video
RESULT=$(infsh app run agent-browser --function close --session $SESSION --input '{}')
echo $RESULT | jq '.video'
```

## Sessions

Browser state persists within a session. Always:

1. Start with `--session new` on first call
2. Use returned `session_id` for subsequent calls
3. Close session when done

## Related Skills

```bash
# Web search (for research + browse)
npx skills add inference-sh/skills@web-search

# LLM models (analyze extracted content)
npx skills add inference-sh/skills@llm-models
```

## Documentation

- [inference.sh Sessions](https://inference.sh/docs/extend/sessions) - Session management
- [Multi-function Apps](https://inference.sh/docs/extend/multi-function-apps) - How functions work

Overview

This skill provides browser automation for AI agents using inference.sh and Playwright. It exposes an @e reference system to identify and interact with page elements, plus screenshots and video recording for debugging. Use it to automate navigation, scraping, form flows, and reproducible agent browsing sessions.

How this skill works

Open a URL to start a session and receive @e refs that map to DOM elements. Perform actions (click, fill, drag, upload, execute JS) against those refs, re-snapshot after navigations to refresh refs, and close the session to return artifacts like screenshots or video. Sessions persist browser state so multi-step flows and authenticated interactions work reliably.

When to use it

Automating repetitive browser tasks and end-to-end testing
Extracting structured data or taking reproducible screenshots and videos
Powering agent-driven web browsing and research workflows
Filling and submitting forms, handling file uploads, and complex UI interactions
Running JavaScript on pages for custom scraping or state inspection

Best practices

Always call open with --session new then reuse the returned session_id for subsequent actions
After any navigation or form submit, run snapshot to refresh @e refs before interacting further
Enable record_video and show_cursor for demos or debugging reproducibility issues
Use proxy_url settings for geo-testing or to route traffic through a controlled network
Prefer fill for clearing and typing fields, and type for incremental input to trigger input handlers

Example use cases

Automated login and authenticated scraping with session persistence
Form submission templates: fill fields, validate responses, capture screenshots
Search workflows: enter query, press Enter, wait, snapshot results for extraction
UI testing: drive clicks, drag-and-drop, assert element states, capture video on failure
Content capture: full-page screenshots and recorded sessions for documentation

FAQ

What are @e refs and how long do they last?

@e refs are short handles to DOM elements returned after open or snapshot. They are invalidated after navigation or major DOM changes, so re-snapshot when the page changes.

Can I record a video and get it after the session?

Yes. Enable record_video on open and close the session to receive the video artifact. Optionally enable show_cursor to display a visible cursor and click feedback.