home / skills / badlogic / pi-skills / browser-tools

browser-tools skill

This skill enables automated browser interactions using Chrome DevTools Protocol to navigate pages, extract data, and debug frontends.

npx playbooks add skill badlogic/pi-skills --skill browser-tools

Review the files below or copy the command above to add this skill to your agents.

Files (12)

SKILL.md

2.8 KB

---
name: browser-tools
description: Interactive browser automation via Chrome DevTools Protocol. Use when you need to interact with web pages, test frontends, or when user interaction with a visible browser is required.
---

# Browser Tools

Chrome DevTools Protocol tools for agent-assisted web automation. These tools connect to Chrome running on `:9222` with remote debugging enabled.

## Setup

Run once before first use:

```bash
cd {baseDir}/browser-tools
npm install
```

## Start Chrome

```bash
{baseDir}/browser-start.js              # Fresh profile
{baseDir}/browser-start.js --profile    # Copy user's profile (cookies, logins)
```

Launch Chrome with remote debugging on `:9222`. Use `--profile` to preserve user's authentication state.

## Navigate

```bash
{baseDir}/browser-nav.js https://example.com
{baseDir}/browser-nav.js https://example.com --new
```

Navigate to URLs. Use `--new` flag to open in a new tab instead of reusing current tab.

## Evaluate JavaScript

```bash
{baseDir}/browser-eval.js 'document.title'
{baseDir}/browser-eval.js 'document.querySelectorAll("a").length'
```

Execute JavaScript in the active tab. Code runs in async context. Use this to extract data, inspect page state, or perform DOM operations programmatically.

## Screenshot

```bash
{baseDir}/browser-screenshot.js
```

Capture current viewport and return temporary file path. Use this to visually inspect page state or verify UI changes.

## Pick Elements

```bash
{baseDir}/browser-pick.js "Click the submit button"
```

**IMPORTANT**: Use this tool when the user wants to select specific DOM elements on the page. This launches an interactive picker that lets the user click elements to select them. The user can select multiple elements (Cmd/Ctrl+Click) and press Enter when done. The tool returns CSS selectors for the selected elements.

Common use cases:
- User says "I want to click that button" → Use this tool to let them select it
- User says "extract data from these items" → Use this tool to let them select the elements
- When you need specific selectors but the page structure is complex or ambiguous

## Cookies

```bash
{baseDir}/browser-cookies.js
```

Display all cookies for the current tab including domain, path, httpOnly, and secure flags. Use this to debug authentication issues or inspect session state.

## Extract Page Content

```bash
{baseDir}/browser-content.js https://example.com
```

Navigate to a URL and extract readable content as markdown. Uses Mozilla Readability for article extraction and Turndown for HTML-to-markdown conversion. Works on pages with JavaScript content (waits for page to load).

## When to Use

- Testing frontend code in a real browser
- Interacting with pages that require JavaScript
- When user needs to visually see or interact with a page
- Debugging authentication or session issues
- Scraping dynamic content that requires JS execution

Overview

This skill provides interactive browser automation via the Chrome DevTools Protocol to control a visible Chrome instance. It is designed for tasks that need a real browser, such as frontend testing, scraping JavaScript-driven sites, or letting a user select page elements. The tools connect to Chrome on :9222 and include navigation, evaluation, screenshots, cookie inspection, content extraction, and an interactive element picker.

How this skill works

The skill launches or connects to a Chrome instance with remote debugging enabled and issues DevTools Protocol commands to navigate, run JavaScript, capture screenshots, and read or modify page state. It can run arbitrary JS in the page context, extract readable article content as markdown, and present an interactive picker that returns CSS selectors for user-chosen elements. Cookies and session details are exposed for debugging authentication and session issues.

When to use it

Testing or debugging frontend code in a real browser environment
Interacting with or scraping pages that rely on JavaScript execution
When a visible browser or screenshots are needed to verify UI changes
Debugging authentication, cookies, or session-related problems
When the user must select specific DOM elements interactively

Best practices

Run npm install in the tool directory once before first use
Start Chrome with the provided launcher to ensure remote debugging on :9222
Use --profile when you need the user's cookies and authenticated sessions
Prefer the interactive picker for ambiguous or complex page structures
Keep injected JavaScript short and avoid long-running synchronous loops

Example use cases

Open a page, run JS to extract a list of items, and return structured data
Ask the user to click a button in the interactive picker, then perform the click programmatically
Capture a screenshot after a UI change to verify a bug fix or visual regression
Extract article content as markdown from a dynamically rendered news page
Inspect cookies for the current tab to diagnose login or session faults

FAQ

Do I need to run Chrome manually?

No. Use the provided browser-start script to launch Chrome with remote debugging enabled; you can choose a fresh profile or --profile to reuse user data.

How does the interactive picker work?

The picker lets the user click elements in the visible page; selected elements return CSS selectors. Multi-select with Cmd/Ctrl+Click and press Enter when done.

Can I run arbitrary JavaScript on the page?

Yes. The evaluate tool executes JavaScript in an async page context for data extraction or DOM operations.