home / skills / mjunaidca / mjs-agent-skills / browser-use

This skill automates browser tasks with Playwright MCP, enabling navigation, form filling, data extraction, and proof of UI testing.

npx playbooks add skill mjunaidca/mjs-agent-skills --skill browser-use

Review the files below or copy the command above to add this skill to your agents.

Files (5)
SKILL.md
4.2 KB
---
name: browser-use
description: Browser automation using Playwright MCP. Navigate websites, fill forms, click elements, take screenshots, and extract data. Use when tasks require web browsing, form submission, web scraping, UI testing, or any browser interaction.
---

# Browser Automation

Automate browser interactions via Playwright MCP server.

## Server Lifecycle

### Start Server
```bash
# Using helper script (recommended)
bash scripts/start-server.sh

# Or manually
npx @playwright/mcp@latest --port 8808 --shared-browser-context &
```

### Stop Server
```bash
# Using helper script (closes browser first)
bash scripts/stop-server.sh

# Or manually
python3 scripts/mcp-client.py call -u http://localhost:8808 -t browser_close -p '{}'
pkill -f "@playwright/mcp"
```

### When to Stop
- **End of task**: Stop when browser work is complete
- **Long sessions**: Keep running if doing multiple browser tasks
- **Errors**: Stop and restart if browser becomes unresponsive

**Important:** The `--shared-browser-context` flag is required to maintain browser state across multiple mcp-client.py calls. Without it, each call gets a fresh browser context.

## Quick Reference

### Navigation

```bash
# Go to URL
python3 scripts/mcp-client.py call -u http://localhost:8808 -t browser_navigate \
  -p '{"url": "https://example.com"}'

# Go back
python3 scripts/mcp-client.py call -u http://localhost:8808 -t browser_navigate_back -p '{}'
```

### Get Page State

```bash
# Accessibility snapshot (returns element refs for clicking/typing)
python3 scripts/mcp-client.py call -u http://localhost:8808 -t browser_snapshot -p '{}'

# Screenshot
python3 scripts/mcp-client.py call -u http://localhost:8808 -t browser_take_screenshot \
  -p '{"type": "png", "fullPage": true}'
```

### Interact with Elements

Use `ref` from snapshot output to target elements:

```bash
# Click element
python3 scripts/mcp-client.py call -u http://localhost:8808 -t browser_click \
  -p '{"element": "Submit button", "ref": "e42"}'

# Type text
python3 scripts/mcp-client.py call -u http://localhost:8808 -t browser_type \
  -p '{"element": "Search input", "ref": "e15", "text": "hello world", "submit": true}'

# Fill form (multiple fields)
python3 scripts/mcp-client.py call -u http://localhost:8808 -t browser_fill_form \
  -p '{"fields": [{"ref": "e10", "value": "[email protected]"}, {"ref": "e12", "value": "password123"}]}'

# Select dropdown
python3 scripts/mcp-client.py call -u http://localhost:8808 -t browser_select_option \
  -p '{"element": "Country dropdown", "ref": "e20", "values": ["US"]}'
```

### Wait for Conditions

```bash
# Wait for text to appear
python3 scripts/mcp-client.py call -u http://localhost:8808 -t browser_wait_for \
  -p '{"text": "Success"}'

# Wait for time (ms)
python3 scripts/mcp-client.py call -u http://localhost:8808 -t browser_wait_for \
  -p '{"time": 2000}'
```

### Execute JavaScript

```bash
python3 scripts/mcp-client.py call -u http://localhost:8808 -t browser_evaluate \
  -p '{"function": "return document.title"}'
```

### Multi-Step Playwright Code

For complex workflows, use `browser_run_code` to run multiple actions in one call:

```bash
python3 scripts/mcp-client.py call -u http://localhost:8808 -t browser_run_code \
  -p '{"code": "async (page) => { await page.goto(\"https://example.com\"); await page.click(\"text=Learn more\"); return await page.title(); }"}'
```

**Tip:** Use `browser_run_code` for complex multi-step operations that should be atomic (all-or-nothing).

## Workflow: Form Submission

1. Navigate to page
2. Get snapshot to find element refs
3. Fill form fields using refs
4. Click submit
5. Wait for confirmation
6. Screenshot result

## Workflow: Data Extraction

1. Navigate to page
2. Get snapshot (contains text content)
3. Use browser_evaluate for complex extraction
4. Process results

## Tool Reference

See [references/playwright-tools.md](references/playwright-tools.md) for complete tool documentation.

## Troubleshooting

| Issue | Solution |
|-------|----------|
| Element not found | Run browser_snapshot first to get current refs |
| Click fails | Try browser_hover first, then click |
| Form not submitting | Use `"submit": true` with browser_type |
| Page not loading | Increase wait time or use browser_wait_for |

Overview

This skill provides browser automation via a Playwright MCP server to navigate websites, interact with UI elements, take screenshots, and extract data programmatically. It exposes atomic actions (navigate, click, type, snapshot, evaluate) and a multi-step code runner for complex flows. Use it when tasks require reliable, scriptable browser behavior across multiple calls while preserving session state.

How this skill works

The skill communicates with a running Playwright MCP server to send commands and receive page snapshots. Typical flow: start the server with a shared browser context, navigate to pages, request an accessibility snapshot to obtain element refs, then perform clicks, typing, form fills, selects, waits, screenshots, or JS evaluation. For multi-step atomic workflows, use the browser_run_code endpoint to run async Playwright code on the server.

When to use it

  • Automated form submissions that require filling multiple fields and clicking submit
  • Web scraping or data extraction where snapshots or evaluate scripts collect page content
  • UI testing and validation that needs screenshots, interactions, and waits
  • Complex multi-step flows that must run atomically (login, navigate, extract)
  • Any task that needs a persistent browser session across multiple calls

Best practices

  • Start the MCP server with --shared-browser-context to preserve state across calls
  • Always request a browser_snapshot before targeting elements to get current refs
  • Prefer browser_run_code for multi-step operations that must be atomic
  • Use browser_wait_for with text or timeouts to handle asynchronous loading
  • Take screenshots after critical steps for verification and debugging

Example use cases

  • Log in to a web app, navigate to a report page, and extract tabular data
  • Automate a signup flow: fill inputs, select dropdowns, submit, and confirm success
  • Run end-to-end UI checks: navigate, interact, capture screenshots, and assert text
  • Scrape product listings by snapshotting and running evaluate scripts for structured fields
  • Automate repetitive admin tasks across pages while maintaining session cookies

FAQ

Do I need to keep the MCP server running between calls?

Yes—keep it running when performing multiple browser tasks. Use the --shared-browser-context flag to preserve session state across calls.

What if an element ref is missing or clicks fail?

Run browser_snapshot to refresh refs. If clicks fail, try browser_hover before click, or increase wait time and verify selectors via evaluate.