home / skills / mjunaidca / mjs-agent-skills / browsing-with-playwright

browsing-with-playwright skill

/.claude/skills/browsing-with-playwright

This skill automates browser tasks using Playwright MCP to navigate, fill forms, capture data, and verify results across pages.

npx playbooks add skill mjunaidca/mjs-agent-skills --skill browsing-with-playwright

Review the files below or copy the command above to add this skill to your agents.

Files (6)
SKILL.md
4.7 KB
---
name: browsing-with-playwright
description: |
  Browser automation using Playwright MCP. Navigate websites, fill forms, click elements,
  take screenshots, and extract data. Use when tasks require web browsing, form submission,
  web scraping, UI testing, or any browser interaction. NOT when only fetching static
  content (use curl/wget instead).
---

# Browser Automation

Automate browser interactions via Playwright MCP server.

## Server Lifecycle

### Start Server
```bash
# Using helper script (recommended)
bash scripts/start-server.sh

# Or manually
npx @playwright/mcp@latest --port 8808 --shared-browser-context &
```

### Stop Server
```bash
# Using helper script (closes browser first)
bash scripts/stop-server.sh

# Or manually
python3 scripts/mcp-client.py call -u http://localhost:8808 -t browser_close -p '{}'
pkill -f "@playwright/mcp"
```

### When to Stop
- **End of task**: Stop when browser work is complete
- **Long sessions**: Keep running if doing multiple browser tasks
- **Errors**: Stop and restart if browser becomes unresponsive

**Important:** The `--shared-browser-context` flag is required to maintain browser state across multiple mcp-client.py calls. Without it, each call gets a fresh browser context.

## Quick Reference

### Navigation

```bash
# Go to URL
python3 scripts/mcp-client.py call -u http://localhost:8808 -t browser_navigate \
  -p '{"url": "https://example.com"}'

# Go back
python3 scripts/mcp-client.py call -u http://localhost:8808 -t browser_navigate_back -p '{}'
```

### Get Page State

```bash
# Accessibility snapshot (returns element refs for clicking/typing)
python3 scripts/mcp-client.py call -u http://localhost:8808 -t browser_snapshot -p '{}'

# Screenshot
python3 scripts/mcp-client.py call -u http://localhost:8808 -t browser_take_screenshot \
  -p '{"type": "png", "fullPage": true}'
```

### Interact with Elements

Use `ref` from snapshot output to target elements:

```bash
# Click element
python3 scripts/mcp-client.py call -u http://localhost:8808 -t browser_click \
  -p '{"element": "Submit button", "ref": "e42"}'

# Type text
python3 scripts/mcp-client.py call -u http://localhost:8808 -t browser_type \
  -p '{"element": "Search input", "ref": "e15", "text": "hello world", "submit": true}'

# Fill form (multiple fields)
python3 scripts/mcp-client.py call -u http://localhost:8808 -t browser_fill_form \
  -p '{"fields": [{"ref": "e10", "value": "[email protected]"}, {"ref": "e12", "value": "password123"}]}'

# Select dropdown
python3 scripts/mcp-client.py call -u http://localhost:8808 -t browser_select_option \
  -p '{"element": "Country dropdown", "ref": "e20", "values": ["US"]}'
```

### Wait for Conditions

```bash
# Wait for text to appear
python3 scripts/mcp-client.py call -u http://localhost:8808 -t browser_wait_for \
  -p '{"text": "Success"}'

# Wait for time (ms)
python3 scripts/mcp-client.py call -u http://localhost:8808 -t browser_wait_for \
  -p '{"time": 2000}'
```

### Execute JavaScript

```bash
python3 scripts/mcp-client.py call -u http://localhost:8808 -t browser_evaluate \
  -p '{"function": "return document.title"}'
```

### Multi-Step Playwright Code

For complex workflows, use `browser_run_code` to run multiple actions in one call:

```bash
python3 scripts/mcp-client.py call -u http://localhost:8808 -t browser_run_code \
  -p '{"code": "async (page) => { await page.goto(\"https://example.com\"); await page.click(\"text=Learn more\"); return await page.title(); }"}'
```

**Tip:** Use `browser_run_code` for complex multi-step operations that should be atomic (all-or-nothing).

## Workflow: Form Submission

1. Navigate to page
2. Get snapshot to find element refs
3. Fill form fields using refs
4. Click submit
5. Wait for confirmation
6. Screenshot result

## Workflow: Data Extraction

1. Navigate to page
2. Get snapshot (contains text content)
3. Use browser_evaluate for complex extraction
4. Process results

## Verification

Run: `python3 scripts/verify.py`

Expected: `✓ Playwright MCP server running`

## If Verification Fails

1. Run diagnostic: `pgrep -f "@playwright/mcp"`
2. Check: Server process running on port 8808
3. Try: `bash scripts/start-server.sh`
4. **Stop and report** if still failing - do not proceed with downstream steps

## Tool Reference

See [references/playwright-tools.md](references/playwright-tools.md) for complete tool documentation.

## Troubleshooting

| Issue | Solution |
|-------|----------|
| Element not found | Run browser_snapshot first to get current refs |
| Click fails | Try browser_hover first, then click |
| Form not submitting | Use `"submit": true` with browser_type |
| Page not loading | Increase wait time or use browser_wait_for |
| Server not responding | Stop and restart: `bash scripts/stop-server.sh && bash scripts/start-server.sh` |

Overview

This skill provides browser automation via a Playwright MCP server to navigate sites, interact with page elements, fill forms, take screenshots, and extract data. It exposes simple RPC-style commands and a multi-step code runner so agents can perform atomic browser workflows reliably. Use it when tasks require real browser interactions rather than simple HTTP fetches.

How this skill works

The skill communicates with a running Playwright MCP server over HTTP and issues commands like navigate, snapshot, click, type, fill_form, take_screenshot, evaluate, and run_code. Snapshots return element references (refs) that are used to target clicks and typing. For complex sequences that must run as one unit, the browser_run_code endpoint executes async Playwright code on the server.

When to use it

  • Filling and submitting interactive web forms that require JS execution or client state
  • Scraping dynamically rendered pages where HTML is produced by JavaScript
  • End-to-end UI testing or visual verification with screenshots
  • Multi-step browser workflows that must maintain session state
  • Interactions requiring user-like actions (clicks, hovers, selects)

Best practices

  • Start the MCP server with --shared-browser-context to preserve state across calls
  • Run browser_snapshot before interacting to obtain current element refs
  • Prefer browser_run_code for complex, multi-step, atomic workflows
  • Use browser_wait_for to wait for text or timeouts instead of blind sleeps
  • Stop and restart the server if the browser becomes unresponsive

Example use cases

  • Automate signup and login flows: navigate, fill_form, click submit, wait for confirmation and screenshot
  • Scrape product listings that load via JavaScript: navigate, snapshot, evaluate page DOM for structured data
  • Automated UI test that clicks through a feature, takes screenshots, and asserts visible text
  • Select values from dropdowns, submit filters, and capture the resulting page snapshot
  • Run a transactional sequence atomically using browser_run_code to avoid intermediate state leaks

FAQ

Do I always need to run a server?

Yes — the skill talks to a Playwright MCP server process. Start it once for multiple calls or scripts.

When should I restart the server?

Restart if the browser becomes unresponsive, if verification fails, or after long problematic sessions.