home / skills / different-ai / agent-bank / browser-automation

browser-automation skill

safe

This skill offers safe, composable browser automation workflows that click, type, and validate state changes using minimal primitives.

npx playbooks add skill different-ai/agent-bank --skill browser-automation

Review the files below or copy the command above to add this skill to your agents.

Files (1)

SKILL.md

1.6 KB

---
name: browser-automation
description: Reliable, composable browser automation using minimal OpenCode Browser primitives.
license: MIT
compatibility: opencode
metadata:
  audience: agents
  domain: browser
---

## What I do

- Provide a safe, composable workflow for browsing tasks
- Use `browser_query` list and index selection to click reliably
- Confirm state changes after each action

## Best-practice workflow

1. Inspect tabs with `browser_get_tabs`
2. Open new tabs with `browser_open_tab` when needed
3. Navigate with `browser_navigate` if needed
4. Wait for UI using `browser_query` with `timeoutMs`
5. Discover candidates using `browser_query` with `mode=list`
6. Click, type, or select using `index`
7. Confirm using `browser_query` or `browser_snapshot`

## Selecting options

- Use `browser_select` for native `<select>` elements
- Prefer `value` or `label`; use `optionIndex` when needed
- Example: `browser_select({ selector: "select", value: "plugin" })`

## Query modes

- `text`: read visible text from a matched element
- `value`: read input values
- `list`: list many matches with text/metadata
- `exists`: check presence and count
- `page_text`: extract visible page text

## Opening tabs

- Use `browser_open_tab` to create a new tab, optionally with `url` and `active`
- Example: `browser_open_tab({ url: "https://example.com", active: false })`

## Troubleshooting

- If a selector fails, run `browser_query` with `mode=page_text` to confirm the content exists
- Use `mode=list` on broad selectors (`button`, `a`, `*[role="button"]`) and choose by index
- Confirm results after each action

Overview

This skill provides reliable, composable browser automation built on minimal OpenCode Browser primitives. It focuses on safe navigation, deterministic element selection, and explicit state confirmation after each action. The workflow is CLI-first and designed for automation tasks in finance and other data-sensitive domains.

How this skill works

It inspects browser state (tabs, page text, element lists) and performs actions using a small set of primitives: open tabs, navigate, query, click/type/select, and snapshot. Queries support multiple modes (text, value, list, exists, page_text) so you can discover candidates, pick by index, and confirm state changes. Each action is followed by verification to ensure reliability.

When to use it

Automating multi-step web workflows that require robust element selection and confirmation
Scraping or extracting visible page text while preserving precise navigation state
Interacting with complex or dynamic UIs where selectors alone are unreliable
Opening background tabs for parallel fetches or multi-account workflows
Building CLI-first agents that must perform repeatable browser tasks in finance

Best practices

Start by listing tabs with browser_get_tabs and open new ones with browser_open_tab when needed
Use browser_query with timeoutMs to wait for UI readiness before interacting
Discover many candidates with browser_query(mode=list) and choose elements by index for deterministic clicks
After each click/type/select, confirm the result with browser_query or browser_snapshot
Use browser_select for native selects and prefer value or label; fall back to optionIndex only when necessary

Example use cases

Log into banking dashboards across multiple accounts using separate tabs and confirmed state checks
Fill multi-page forms: discover buttons by list mode, click by index, then verify field values
Extract transaction tables using page_text for downstream processing or reconciliation
Run automated tests that require deterministic interactions with dynamic web controls
Collect data from third-party portals while keeping navigation and tab state explicit

FAQ

What query mode should I use to find buttons reliably?

Use browser_query with mode=list on broad selectors (button, a, *[role="button"]) and pick the correct element by index, then confirm the result.

How do I handle failing selectors or unexpected content?

Run browser_query with mode=page_text to verify visible content, widen your selector, use mode=list to inspect candidates, and confirm each action before continuing.