home / skills / browserbase / skills-old / browse

browse skill

/skills/browse

This skill automates web tasks with a browser automation CLI, enabling you to create, test, and deploy reliable web automations.

npx playbooks add skill browserbase/skills-old --skill browse

Review the files below or copy the command above to add this skill to your agents.

Files (1)
SKILL.md
4.9 KB
---
name: browse
description: Browser automation CLI for AI agents - create, test, and deploy web automations
---

# Browse - Browser Automation CLI

Browser automation CLI for AI agents. Create, test, and deploy web automations using the `browse` CLI.

## Setup (Run First!)

Before using this skill, install the required CLIs:

```bash
npm install -g @browserbasehq/browse-cli @browserbasehq/sdk-functions
```

Set your credentials:
```bash
export BROWSERBASE_API_KEY="your_api_key"
export BROWSERBASE_PROJECT_ID="your_project_id"
```

Get credentials from: https://browserbase.com/settings

## When to Use

Use this skill when:
- User wants to automate a website task
- User needs to scrape data from a site
- User wants to create a Browserbase Function
- Starting from scratch on a new automation

## Workflow

### 1. Understand the Goal

Ask clarifying questions:
- What website/URL are you automating?
- What's the end goal (extract data, submit forms, monitor changes)?
- Does it require authentication?
- Should this run on a schedule or on-demand?

### 2. Explore the Site Interactively

Start a local browser session to understand the site structure:

```bash
browse open https://example.com
```

Use snapshot to understand the DOM:
```bash
browse snapshot
```

Take screenshots to see the visual layout:
```bash
browse screenshot exploration.png
```

### 3. Identify Key Elements

For each step of the automation, identify:
- Selectors for interactive elements
- Wait conditions needed
- Data to extract

Use the accessibility tree refs to understand element relationships:
```
[@0-5] button: "Submit"
[@0-6] textbox: "Email"
[@0-7] textbox: "Password"
```

### 4. Test Interactions Manually

Before writing code, verify each step works:

```bash
browse fill @0-6 "[email protected]"
browse fill @0-7 "password123"
browse click @0-5
browse wait load networkidle
browse snapshot
```

### 5. Enable Network Capture (if needed)

For API-based automations or debugging:
```bash
browse network on
# perform actions
browse network path
# inspect captured requests in the directory
```

### 6. Create the Function

Once you understand the flow, create a full function project:

```bash
pnpm dlx @browserbasehq/sdk-functions init my-automation
cd my-automation
```

This creates a complete project with:
- `package.json` with dependencies
- `.env` for credentials
- `tsconfig.json`
- `index.ts` template

Edit `index.ts` with your automation logic:

```typescript
import { defineFn } from "@browserbasehq/sdk-functions";
import { chromium } from "playwright-core";

defineFn("my-automation", async (context) => {
  const { session } = context;
  const browser = await chromium.connectOverCDP(session.connectUrl);
  const page = browser.contexts()[0]!.pages()[0]!;

  // Your automation steps here
  await page.goto("https://example.com");
  await page.fill('input[name="email"]', context.params.email);
  await page.click('button[type="submit"]');
  
  // Extract and return data
  const result = await page.textContent('.result');
  return { success: true, result };
});
```

### 7. Test Locally

Start the local development server:
```bash
pnpm bb dev index.ts
```

Then invoke locally via curl:
```bash
curl -X POST http://127.0.0.1:14113/v1/functions/my-automation/invoke \
  -H "Content-Type: application/json" \
  -d '{"params": {"email": "[email protected]"}}'
```

### 8. Deploy to Browserbase

When ready for production:
```bash
pnpm bb publish index.ts
```

### 9. Test Production

Invoke the deployed function via API:
```bash
curl -X POST https://api.browserbase.com/v1/functions/<function-id>/invoke \
  -H "Content-Type: application/json" \
  -H "x-bb-api-key: $BROWSERBASE_API_KEY" \
  -d '{"params": {"email": "[email protected]"}}'
```

## Best Practices

### Selectors
- Prefer data attributes (`data-testid`) over CSS classes
- Use text content as fallback (`text=Submit`)
- Avoid fragile selectors like nth-child

### Waiting
- Always wait for navigation/network after clicks
- Use `waitForSelector` for dynamic content
- Set reasonable timeouts

### Error Handling
- Wrap risky operations in try/catch
- Return structured error information
- Log intermediate steps for debugging

### Data Extraction
- Use `page.evaluate()` for complex extraction
- Validate extracted data before returning
- Handle missing elements gracefully

## Example: E-commerce Price Monitor

```typescript
defineFn("price-monitor", async (context) => {
  const { session, params } = context;
  const browser = await chromium.connectOverCDP(session.connectUrl);
  const page = browser.contexts()[0]!.pages()[0]!;

  await page.goto(params.productUrl);
  await page.waitForSelector('.price');

  const price = await page.evaluate(() => {
    const el = document.querySelector('.price');
    return el?.textContent?.replace(/[^0-9.]/g, '');
  });

  return {
    url: params.productUrl,
    price: parseFloat(price || '0'),
    timestamp: new Date().toISOString(),
  };
});
```

Overview

This skill is a browser automation CLI for AI agents that lets you create, test, and deploy web automations with the browse command. It provides interactive exploration, element inspection, network capture, and a flow for turning an interactive session into a reusable Browserbase Function. Use it to build reliable, repeatable automations and production functions for scraping, form submissions, monitoring, and integration tasks.

How this skill works

The CLI opens real browser sessions so you can interactively explore pages, capture DOM snapshots, and record selectors and network traffic. After validating interactions locally, you scaffold a function project, author Playwright-based automation code, run a local dev server for testing, and publish the function to Browserbase for production invocation. The workflow includes tools for waits, error handling, and structured data extraction to make automations robust.

When to use it

  • Automating repetitive website tasks like form fills or navigation
  • Scraping structured data or monitoring page changes on a schedule
  • Creating an API-style Browserbase Function that performs browser work
  • Debugging complex interactions using live DOM snapshots and network capture
  • Prototyping automation flows interactively before coding

Best practices

  • Install required CLIs and export BROWSERBASE_API_KEY and BROWSERBASE_PROJECT_ID before starting
  • Prefer stable selectors (data-testid or data attributes) and avoid brittle nth-child selectors
  • Always wait for navigation or use waitForSelector after actions that change the DOM
  • Wrap risky steps in try/catch, log intermediate state, and return structured error info
  • Validate and sanitize extracted data; handle missing elements gracefully

Example use cases

  • E-commerce price monitor that visits a product page, extracts price, and returns timestamped data
  • Automated login and daily report extraction, saving structured results to an API
  • Form submission automation for lead capture across multiple sites
  • API-backed scraper that triggers an automation and returns parsed results on demand
  • Debugging and reproducing flaky site interactions using network capture and snapshots

FAQ

What do I need to set up before using the CLI?

Install @browserbasehq/browse-cli and @browserbasehq/sdk-functions globally and export BROWSERBASE_API_KEY and BROWSERBASE_PROJECT_ID.

How do I test automations locally?

Scaffold a function project, run pnpm bb dev index.ts to start the local dev server, and invoke it with curl to test.