home / skills / robdtaylor / personal-ai-infrastructure / browser

Browser skill

/skills/Browser

This skill enables fast, code-first browser tasks via pre-written MCPs executed by CLI, delivering reliable verification, screenshots, and navigation.

npx playbooks add skill robdtaylor/personal-ai-infrastructure --skill browser

Review the files below or copy the command above to add this skill to your agents.

Files (13)
SKILL.md
8.2 KB
---
name: Browser
description: Code-first browser automation and web verification. USE WHEN browser, screenshot, navigate, web testing, verify UI, VERIFY phase. Replaces Playwright MCP with 99% token savings.
---

# Browser - Code-First Browser Automation

**Browser automation and web verification using code-first Playwright.**

---

## File-Based MCP

This skill is a **file-based MCP** - pre-written code that executes existing scripts, NOT generates new code.

**Why file-based?** Filter data in code BEFORE returning to model context = 99%+ token savings.

---

## STOP - CLI First, Always

### The Wrong Pattern

**DO NOT write new TypeScript code for simple browser tasks:**

```typescript
// WRONG - Writing new code defeats the purpose of file-based MCPs
import { PlaywrightBrowser } from '$PAI_DIR/skills/Browser/index.ts'
const browser = new PlaywrightBrowser()
await browser.launch({ headless: true })
await browser.navigate('https://example.com')
await browser.screenshot({ path: '/tmp/shot.png' })
await browser.close()
```

**Problems with this approach:**
- You're writing 5+ lines of boilerplate every time
- You manage browser lifecycle manually
- You duplicate what the CLI already does
- You're generating new code instead of executing existing code

### The Right Pattern

**USE the CLI tool - it executes pre-written code:**

```bash
# RIGHT - One command, zero boilerplate
bun run $PAI_DIR/skills/Browser/Tools/Browse.ts screenshot https://example.com /tmp/shot.png
```

**Benefits:**
- One command, instant execution
- Lifecycle handled automatically
- Error handling built-in
- TRUE file-based MCP pattern

---

## CLI Commands (Primary Interface)

**Location:** `$PAI_DIR/skills/Browser/Tools/Browse.ts`

### screenshot - Take a screenshot

```bash
bun run $PAI_DIR/skills/Browser/Tools/Browse.ts screenshot <url> [output-path]
```

**Examples:**
```bash
# Screenshot to default location
bun run $PAI_DIR/skills/Browser/Tools/Browse.ts screenshot https://danielmiessler.com

# Screenshot to specific file
bun run $PAI_DIR/skills/Browser/Tools/Browse.ts screenshot https://example.com /tmp/example.png
```

### verify - Check element exists

```bash
bun run $PAI_DIR/skills/Browser/Tools/Browse.ts verify <url> <selector>
```

**Examples:**
```bash
# Verify body exists
bun run $PAI_DIR/skills/Browser/Tools/Browse.ts verify https://example.com "body"

# Verify specific element
bun run $PAI_DIR/skills/Browser/Tools/Browse.ts verify https://danielmiessler.com "h1"

# Verify by CSS selector
bun run $PAI_DIR/skills/Browser/Tools/Browse.ts verify https://example.com ".main-content"
```

### open - Open URL in visible browser

```bash
bun run $PAI_DIR/skills/Browser/Tools/Browse.ts open <url>
```

**Examples:**
```bash
# Open site for manual inspection
bun run $PAI_DIR/skills/Browser/Tools/Browse.ts open https://danielmiessler.com
```

---

## Decision Tree: When to Use What

```
                    What are you trying to do?
                              |
           ┌──────────────────┴──────────────────┐
           ▼                                     ▼
    ┌─────────────┐                      ┌─────────────┐
    │   SIMPLE    │                      │   COMPLEX   │
    │ Single task │                      │ Multi-step  │
    └─────────────┘                      └─────────────┘
           │                                     │
           ▼                                     ▼
    ┌─────────────┐                      ┌─────────────┐
    │ • Screenshot│                      │ • Form fill │
    │ • Verify    │                      │ • Auth flow │
    │ • Open URL  │                      │ • Conditionals│
    └─────────────┘                      └─────────────┘
           │                                     │
           ▼                                     ▼
    ┌─────────────┐                      ┌─────────────┐
    │ USE CLI     │                      │ USE WORKFLOW│
    │ Browse.ts   │                      │ or API      │
    └─────────────┘                      └─────────────┘
```

### Quick Reference

| Task | Use CLI? | Use TypeScript? |
|------|----------|-----------------|
| Take screenshot | YES | NO |
| Verify element exists | YES | NO |
| Open page visually | YES | NO |
| Fill multi-field form | NO | YES (Workflow) |
| Authentication flow | NO | YES (Workflow) |
| Conditional logic | NO | YES (API) |
| Multi-step interaction | NO | YES (Workflow) |

**The Rule:** Can you describe it in ONE action? (screenshot, verify, open) → CLI

---

## VERIFY Phase Integration

**The Browser skill is MANDATORY for VERIFY phase of web changes.**

### Using CLI for Verification

Before claiming ANY web change is "live" or "working":

```bash
# 1. Take screenshot of the changed page
bun run $PAI_DIR/skills/Browser/Tools/Browse.ts screenshot https://example.com/changed-page /tmp/verify.png

# 2. Verify the specific element that changed
bun run $PAI_DIR/skills/Browser/Tools/Browse.ts verify https://example.com/changed-page ".changed-element"
```

**Then use the Read tool to view the screenshot:**
```
Read /tmp/verify.png
```

**If you haven't LOOKED at the rendered page, you CANNOT claim it works.**

---

## Workflow Routing

For complex, multi-step tasks, use the pre-built workflows:

| Trigger | Workflow |
|---------|----------|
| Fill forms, interact with page | `Workflows/Interact.md` |
| Extract page content | `Workflows/Extract.md` |
| Complex verification sequence | `Workflows/VerifyPage.md` |
| Screenshot with custom options | `Workflows/Screenshot.md` |

**Workflows use the TypeScript API internally but are pre-written.**

---

## Advanced: TypeScript API

**Only use this for custom automation that CLI cannot handle.**

Before using this API, ask yourself:
1. Did I check if CLI can do this? (screenshot/verify/open)
2. Is this a multi-step workflow? (not just one action)
3. Do I need conditional logic between actions?

**If you answered NO to all, use the CLI instead.**

### Quick Start (Advanced Users Only)

```typescript
import { PlaywrightBrowser } from '$PAI_DIR/skills/Browser/index.ts'

const browser = new PlaywrightBrowser()
await browser.launch({ headless: true })
await browser.navigate('https://example.com')
// ... custom logic here ...
await browser.close()
```

### API Reference

**Navigation:**
- `launch(options?)` - Start browser
- `navigate(url)` - Go to URL
- `goBack()` / `goForward()` - History navigation
- `reload()` - Refresh page
- `close()` - Shut down browser

**Capture:**
- `screenshot({ path, fullPage, selector })` - Take screenshot
- `getVisibleText(selector?)` - Extract text
- `getVisibleHtml(options)` - Get HTML
- `savePdf(path)` - Export PDF
- `getAccessibilityTree()` - A11y snapshot

**Interaction:**
- `click(selector)` - Click element
- `fill(selector, value)` - Fill input
- `type(selector, text, delay?)` - Type with delay
- `select(selector, value)` - Select dropdown
- `pressKey(key)` - Keyboard input
- `hover(selector)` - Mouse hover
- `drag(source, target)` - Drag and drop
- `uploadFile(selector, path)` - File upload

**Waiting:**
- `waitForSelector(selector, options)` - Wait for element
- `waitForText(text, options)` - Wait for text
- `waitForNavigation(options)` - Wait for page load
- `waitForNetworkIdle(timeout?)` - Wait for idle
- `wait(ms)` - Fixed delay

**JavaScript:**
- `evaluate(script)` - Run JS
- `getConsoleLogs(options)` - Get console output
- `setUserAgent(ua)` - Change user agent

**Viewport:**
- `resize(width, height)` - Set size
- `setDevice(name)` - Emulate device

---

## Token Savings

| Approach | Tokens | Notes |
|----------|--------|-------|
| Playwright MCP | ~13,700 | Loaded at startup, always |
| CLI tool | ~0 | Executes pre-written code |
| TypeScript API | ~50-200 | Only what you write |
| **CLI Savings** | **99%+** | Compared to MCP |

Overview

This skill provides code-first browser automation and web verification using a file-based Playwright setup. It favors a CLI-first pattern for simple one-step tasks (screenshot, verify, open) and offers pre-written workflows and a TypeScript API for more complex multi-step automation. It saves tokens by executing existing scripts instead of generating new code.

How this skill works

The primary interface is a CLI tool that runs pre-written TypeScript scripts (Tools/Browse.ts) to handle lifecycle, errors, and common actions. Use commands like screenshot, verify, and open for direct tasks; use pre-built workflows for multi-step interactions or the TypeScript API only when custom conditional logic is required. The skill is mandatory for the VERIFY phase to confirm web changes visually and by selector checks.

When to use it

  • Take quick screenshots of a live page or change
  • Verify presence of an element by CSS selector during a VERIFY phase
  • Open a page in a visible browser for manual inspection
  • Run multi-step interactions or auth flows using pre-built workflows
  • Write custom, conditional automation only when CLI/workflows can’t cover the need

Best practices

  • Always try the CLI Browse.ts first for one-action tasks — zero boilerplate and automated lifecycle.
  • For VERIFY checks: capture a screenshot and run a selector verify before claiming a change is live.
  • Use pre-written workflows for multi-step interactions instead of writing new scripts.
  • Only use the TypeScript API for complex conditional logic, auth flows, or custom interactions that workflow files don’t support.
  • Store screenshots to a predictable temp path and use a Read tool to inspect images as proof.

Example use cases

  • Take a screenshot of a staging page: bun run $PAI_DIR/skills/Browser/Tools/Browse.ts screenshot https://example.com /tmp/shot.png
  • Verify that a heading or CTA exists: bun run $PAI_DIR/skills/Browser/Tools/Browse.ts verify https://example.com "h1"
  • Open a page in a visible browser for manual QA: bun run $PAI_DIR/skills/Browser/Tools/Browse.ts open https://example.com
  • Run a pre-built workflow to fill a complex form or extract content (Workflows/Interact.md, Workflows/Extract.md)
  • Use the TypeScript API for a custom auth sequence only after confirming CLI/workflows cannot handle it

FAQ

Why prefer the CLI over writing new TypeScript code?

The CLI executes pre-written code, manages browser lifecycle, has built-in error handling, and saves tokens by avoiding generated code and repeated boilerplate.

What if my task needs conditional steps or multi-page flows?

Use the pre-built workflows for complex sequences. If none fit, use the TypeScript API for custom automation but only after confirming CLI and workflows are insufficient.