home / skills / iamzhihuix / happy-claude-skills / browser

browser skill

/skills/browser

This skill automates browser tasks using minimal CDP tools to start Chrome, navigate pages, run scripts, take screenshots, and extract DOM data.

npx playbooks add skill iamzhihuix/happy-claude-skills --skill browser

Review the files below or copy the command above to add this skill to your agents.

Files (7)
SKILL.md
2.6 KB
---
name: browser
description: Minimal Chrome DevTools Protocol tools for browser automation and scraping. Use when you need to start Chrome, navigate pages, execute JavaScript, take screenshots, or interactively pick DOM elements. Triggers include "browse website", "scrape page", "take screenshot", "automate browser", "extract DOM", "web scraping".
---

# Browser Tools

Minimal CDP tools for collaborative site exploration and scraping.

**Credits**: Based on [Mario Zechner](https://mariozechner.at)'s article [What if you don't need MCP?](https://mariozechner.at/posts/2025-11-02-what-if-you-dont-need-mcp/), adapted from [Factory.ai](https://docs.factory.ai/guides/skills/browser).

## Setup

Before first use, install dependencies:

```bash
npm install --prefix skills/browser
```

## Start Chrome

```bash
./skills/browser/scripts/start.js              # Fresh profile
./skills/browser/scripts/start.js --profile    # Copy your profile (cookies, logins)
```

Start Chrome on `:9222` with remote debugging.

## Navigate

```bash
./skills/browser/scripts/nav.js https://example.com
./skills/browser/scripts/nav.js https://example.com --new
```

Navigate current tab or open new tab.

## Evaluate JavaScript

```bash
./skills/browser/scripts/eval.js 'document.title'
./skills/browser/scripts/eval.js 'document.querySelectorAll("a").length'
```

Execute JavaScript in active tab (async context).

**IMPORTANT**: The code must be a single expression or use IIFE for multiple statements:

- Single expression: `'document.title'`
- Multiple statements: `'(() => { const x = 1; return x + 1; })()'`
- Avoid newlines in the code string - keep it on one line

## Screenshot

```bash
./skills/browser/scripts/screenshot.js
```

Screenshot current viewport, returns temp file path.

## Pick Elements

```bash
./skills/browser/scripts/pick.js "Click the submit button"
```

Interactive element picker. Click to select, Cmd/Ctrl+Click for multi-select, Enter to finish.

## Workflow

1. **Start Chrome** with `start.js --profile` to mirror your authenticated state.
2. **Drive navigation** via `nav.js https://target.app` or open secondary tabs with `--new`.
3. **Inspect the DOM** using `eval.js` for quick counts, attribute checks, or extracting JSON payloads.
4. **Capture artifacts** with `screenshot.js` for visual proof or `pick.js` when you need precise selectors or text snapshots.

## Usage Notes

- Start Chrome first before using other tools
- The `--profile` flag syncs your actual Chrome profile so you're logged in everywhere
- JavaScript evaluation runs in an async context in the page
- Pick tool allows you to visually select DOM elements by clicking on them

Overview

This skill provides minimal Chrome DevTools Protocol (CDP) tools for browser automation, interactive DOM picking, scraping, and screenshots. It is designed for quick setup and hands-on exploration when you need to start Chrome with remote debugging, run page JavaScript, or capture page artifacts. Use it to drive navigation, extract content, and collect visual evidence with a lightweight CLI interface.

How this skill works

The tools start a Chrome instance with remote debugging enabled and connect via CDP to control tabs. You can navigate pages, evaluate JavaScript expressions in the page context, take viewport screenshots, and interactively pick DOM elements by clicking on the page. JavaScript evaluation runs in an async page context and supports single-expression or IIFE-style code passed as a one-line string.

When to use it

  • Quickly automate browser navigation and interactions for scraping or testing
  • Extract page data or JSON payloads with inline JavaScript evaluation
  • Capture screenshots for visual verification or evidence
  • Select precise CSS selectors or text using the interactive picker
  • Work with authenticated sessions by launching Chrome with your profile

Best practices

  • Start Chrome with the profile flag when you need logged-in state (cookies and sessions)
  • Keep evaluated JavaScript as a single expression or an IIFE on one line to ensure correct execution
  • Launch the Chrome process before running other commands to avoid connection failures
  • Use the picker for complex DOM targeting, and combine it with eval to extract attributes or structured data
  • Use new tabs for parallel exploration with the navigation command's new-tab option

Example use cases

  • Scrape product listings by navigating to pages, counting items, and extracting attributes with eval
  • Automate login flows by starting Chrome with your profile and navigating programmatically
  • Collect screenshots of pages for regression testing or visual monitoring
  • Interactively pick form fields, buttons, or text nodes to generate reliable selectors for automation scripts
  • Open secondary tabs to compare pages or capture multiple artifacts in one session

FAQ

How should I format JavaScript passed to the eval tool?

Provide a single expression or wrap multiple statements in an immediately-invoked function expression (IIFE) on one line, e.g. '(() => { const x = 1; return x + 1; })()'.

What does the --profile flag do when starting Chrome?

It launches Chrome with a copy of your existing profile so cookies, logins, and other session state are available in the started browser.

Can I pick multiple elements with the picker?

Yes. Use Cmd/Ctrl+Click to multi-select elements and press Enter to finish and return the selection.