home / skills / chachamaru127 / claude-code-harness / agent-browser

agent-browser skill

safe

This skill automates browser tasks such as opening pages, clicking elements, filling forms, and taking screenshots to streamline web interactions.

npx playbooks add skill chachamaru127/claude-code-harness --skill agent-browser

Review the files below or copy the command above to add this skill to your agents.

Files (3)

SKILL.md

6.3 KB

---
name: agent-browser
description: "ブラウザを手足のように操る。ページ遷移、フォーム入力、スクショ、なんでもこい。Use when users ask to navigate websites, fill forms, take screenshots, extract web data, test web apps, or automate browser workflows. Trigger phrases include 'go to [url]', 'click on', 'fill out the form', 'take a screenshot', 'scrape', 'automate', 'test the website', 'log into', or any browser interaction request. Do NOT load for: sharing URLs, embedding links, screenshot image files."
description-en: "Control browser like hands and feet. Navigate, fill forms, screenshot, bring it on. Use when users ask to navigate websites, fill forms, take screenshots, extract web data, test web apps, or automate browser workflows. Trigger phrases include 'go to [url]', 'click on', 'fill out the form', 'take a screenshot', 'scrape', 'automate', 'test the website', 'log into', or any browser interaction request. Do NOT load for: sharing URLs, embedding links, screenshot image files."
description-ja: "ブラウザを手足のように操る。ページ遷移、フォーム入力、スクショ、なんでもこい。Use when users ask to navigate websites, fill forms, take screenshots, extract web data, test web apps, or automate browser workflows. Trigger phrases include 'go to [url]', 'click on', 'fill out the form', 'take a screenshot', 'scrape', 'automate', 'test the website', 'log into', or any browser interaction request. Do NOT load for: sharing URLs, embedding links, screenshot image files."
allowed-tools: ["Bash", "Read"]
user-invocable: false
context: fork
argument-hint: "[url] [--headless]"
---

# Agent Browser Skill

ブラウザ自動化を行うスキル。agent-browser CLI を使用して、UI デバッグ・検証・自動操作を実行します。

---

## トリガーフレーズ

このスキルは以下のフレーズで自動起動します：

- 「ページを開いて」「URLを確認して」
- 「クリックして」「入力して」「フォームに」
- 「スクリーンショットを撮って」
- 「UIを確認して」「画面をテストして」
- "open this page", "click on", "fill the form", "screenshot"

---

## 機能詳細

| 機能 | 詳細 |
|------|------|
| **ブラウザ自動化** | See [references/browser-automation.md](references/browser-automation.md) |
| **AI スナップショットワークフロー** | See [references/ai-snapshot-workflow.md](references/ai-snapshot-workflow.md) |

## 実行手順

### Step 0: agent-browser の確認

```bash
# インストール確認
which agent-browser

# 未インストールの場合
npm install -g agent-browser
agent-browser install
```

### Step 1: ユーザーのリクエストを分類

| リクエストタイプ | 対応アクション |
|----------------|---------------|
| URL を開く | `agent-browser open <url>` |
| 要素をクリック | スナップショット → `agent-browser click @ref` |
| フォーム入力 | スナップショット → `agent-browser fill @ref "text"` |
| 状態確認 | `agent-browser snapshot -i -c` |
| スクリーンショット | `agent-browser screenshot <path>` |
| デバッグ | `agent-browser --headed open <url>` |

### Step 2: AI スナップショットワークフロー（推奨）

ほとんどの操作で、まず**スナップショットを取得**してから要素参照で操作します：

```bash
# 1. ページを開く
agent-browser open https://example.com

# 2. スナップショット取得（AI 向け、インタラクティブ要素のみ）
agent-browser snapshot -i -c

# 出力例:
# - link "Home" [ref=e1]
# - button "Login" [ref=e2]
# - input "Email" [ref=e3]
# - input "Password" [ref=e4]
# - button "Submit" [ref=e5]

# 3. 要素参照で操作
agent-browser click @e2           # Login ボタンをクリック
agent-browser fill @e3 "[email protected]"
agent-browser fill @e4 "password123"
agent-browser click @e5           # Submit
```

### Step 3: 結果の確認

```bash
# 現在の状態をスナップショットで確認
agent-browser snapshot -i -c

# または URL を確認
agent-browser get url

# スクリーンショットを取得
agent-browser screenshot result.png
```

---

## クイックリファレンス

### 基本操作

| コマンド | 説明 |
|---------|------|
| `open <url>` | URL を開く |
| `snapshot -i -c` | AI 向けスナップショット |
| `click @e1` | 要素をクリック |
| `fill @e1 "text"` | フォームに入力 |
| `type @e1 "text"` | テキストを入力 |
| `press Enter` | キーを押す |
| `screenshot [path]` | スクリーンショット |
| `close` | ブラウザを閉じる |

### ナビゲーション

| コマンド | 説明 |
|---------|------|
| `back` | 戻る |
| `forward` | 進む |
| `reload` | リロード |

### 情報取得

| コマンド | 説明 |
|---------|------|
| `get text @e1` | テキスト取得 |
| `get html @e1` | HTML 取得 |
| `get url` | 現在の URL |
| `get title` | ページタイトル |

### 待機

| コマンド | 説明 |
|---------|------|
| `wait @e1` | 要素を待機 |
| `wait 1000` | 1秒待機 |

### デバッグ

| コマンド | 説明 |
|---------|------|
| `--headed` | ブラウザを表示 |
| `console` | コンソールログ |
| `errors` | ページエラー |
| `highlight @e1` | 要素をハイライト |

---

## セッション管理

複数のタブ/セッションを並列管理：

```bash
# セッションを指定
agent-browser --session admin open https://admin.example.com
agent-browser --session user open https://example.com

# セッション一覧
agent-browser session list

# 特定セッションで操作
agent-browser --session admin snapshot -i -c
```

---

## MCP ブラウザツールとの使い分け

| ツール | 推奨度 | 用途 |
|--------|--------|------|
| **agent-browser** | ★★★ | 第一選択。AI 向けスナップショットが強力 |
| chrome-devtools MCP | ★★☆ | Chrome が既に開いている場合 |
| playwright MCP | ★★☆ | 複雑な E2E テスト |

**原則**: まず agent-browser を試し、うまくいかない場合のみ MCP ツールを使用。

---

## 注意事項

- agent-browser はヘッドレスモードがデフォルト
- `--headed` オプションでブラウザを表示可能
- セッションは明示的に `close` するまで維持される
- 認証が必要なサイトはセッションを活用

Overview

This skill controls a browser like an extension of your hands and eyes. It automates page navigation, element interaction, form filling, screenshots, data extraction, and web app testing. Use it to perform repeatable browser workflows or to debug UI issues quickly.

How this skill works

The skill uses the agent-browser CLI to open URLs, take AI-friendly snapshots that reference interactive elements, and execute actions by element reference (click, fill, type, press keys). Typical flow: open a page, capture a snapshot to discover element refs, then run click/fill/screenshot commands. It can run headless by default or visible with --headed and supports session-scoped parallel tabs.

When to use it

Navigate to or verify a specific URL
Automate filling and submitting web forms
Take screenshots or record page state for debugging
Scrape visible text or HTML from page elements
Run quick UI tests or reproduce UX bugs

Best practices

Start every interactive workflow with agent-browser snapshot -i -c to get element refs
Prefer element references (e.g., @e3) over brittle selectors for reliable automation
Use --session to isolate parallel flows and remember to close sessions when done
Run headed mode (--headed) for visual debugging and headless for CI or scripted runs
Capture screenshots after key steps to validate state and for audit trails

Example use cases

Log into a web app, fill credentials, and verify post-login page elements
Automate a multi-step checkout form: open, fill fields, submit, screenshot confirmation
Scrape product titles and prices by taking a snapshot and extracting text from listed elements
Run a smoke test: open critical pages, click key buttons, confirm responses and capture screenshots
Reproduce a reported UI bug by recording the exact sequence of opens, clicks, and form inputs

FAQ

How do I target a specific button or input?

Take a snapshot (agent-browser snapshot -i -c) to get element refs like @e2, then use agent-browser click @e2 or agent-browser fill @e3 "text".

Can I run multiple sessions at once?

Yes. Use --session <name> when opening pages to maintain separate sessions and list them with agent-browser session list.

How do I see the browser while debugging?

Add the --headed flag to commands (e.g., agent-browser --headed open <url>) to run the browser in visible mode.