home / skills / nicobailon / surf-cli / surf

surf skill

/skills/surf

This skill lets you automate and inspect Chrome tasks from the CLI, enabling reliable browser testing, debugging, and AI-assisted workflows.

npx playbooks add skill nicobailon/surf-cli --skill surf

Review the files below or copy the command above to add this skill to your agents.

Files (1)
SKILL.md
16.7 KB
---
name: surf
description: Control Chrome browser via CLI for testing, automation, and debugging. Use when the user needs browser automation, screenshots, form filling, page inspection, network/CPU emulation, DevTools streaming, or AI queries via ChatGPT/Gemini/Perplexity/Grok.
---

# Surf Browser Automation

Control Chrome browser via CLI or Unix socket.

## CLI Quick Reference

```bash
surf --help                    # Full help
surf <group>                   # Group help (tab, scroll, page, wait, dialog, emulate, form, perf, ai)
surf --help-full               # All commands
surf --find <term>             # Search tools
surf --help-topic <topic>      # Topic guide (refs, semantic, frames, devices, windows)
```

## Core Workflow

```bash
# 1. Navigate to page
surf navigate "https://example.com"

# 2. Read page to get element refs
surf page.read

# 3. Click by ref or coordinates
surf click --ref "e1"
surf click --x 100 --y 200

# 4. Type text
surf type --text "hello"

# 5. Screenshot
surf screenshot --output /tmp/shot.png
```

## AI Assistants (No API Keys)

Query AI models using your browser's logged-in session. Must be logged into the respective service in Chrome.

### ChatGPT
```bash
surf chatgpt "explain this code"
surf chatgpt "summarize" --with-page              # Include current page context
surf chatgpt "review" --model gpt-4o              # Specify model
surf chatgpt "analyze" --file document.pdf        # With file attachment
```

### Gemini
```bash
surf gemini "explain quantum computing"
surf gemini "summarize" --with-page               # Include page context
surf gemini "analyze" --file data.csv             # Attach file
surf gemini "a robot surfing" --generate-image /tmp/robot.png
surf gemini "add sunglasses" --edit-image photo.jpg --output out.jpg
surf gemini "summarize" --youtube "https://youtube.com/..."
surf gemini "hello" --model gemini-2.5-flash      # Models: gemini-3-pro (default), gemini-2.5-pro, gemini-2.5-flash
surf gemini "wide banner" --generate-image /tmp/banner.png --aspect-ratio 16:9
```

### Perplexity
```bash
surf perplexity "what is quantum computing"
surf perplexity "explain this page" --with-page   # Include page context
surf perplexity "deep dive" --mode research       # Research mode (Pro)
surf perplexity "latest news" --model sonar       # Model selection (Pro)
```

### Grok (via x.com - requires X.com login in Chrome)
```bash
surf grok "what are the latest AI trends on X"    # Search X posts
surf grok "analyze @username recent activity"     # Profile analysis  
surf grok "summarize this page" --with-page       # Include page context
surf grok "find viral AI posts" --deep-search     # DeepSearch mode
surf grok "quick question" --model fast           # Models: auto, fast, expert, thinking (default)
```

**Grok Validation & Troubleshooting:**
```bash
# Validate Grok UI and check available models (no query sent)
surf grok --validate

# If models changed, save discovered models to surf.json config
surf grok --validate --save-models
```

### AI Tool Troubleshooting

When AI queries fail, check these common issues:

1. **Not logged in**: The error "login required" means you need to log into the service in Chrome
2. **Model selection failed**: The UI may have changed. Run `surf grok --validate` to check
3. **Response timeout**: Thinking models (ChatGPT o1, Grok thinking) can take 45+ seconds
4. **Element not found**: The service's UI changed. Check for surf-cli updates

**Debugging workflow for agents:**
```bash
# 1. Check if the service is accessible and UI is valid
surf grok --validate

# 2. If models mismatch, update the local settings
surf grok --validate --save-models

# 3. Retry with explicit model name from validation output
surf grok "query" --model <model-from-validation>

# 4. If still failing, try with longer timeout
surf grok "query" --timeout 600
```

## Tab Management

```bash
surf tab.list
surf tab.new "https://google.com"
surf tab.switch 12345
surf tab.close 12345
surf tab.reload                # Reload current tab

# Named tabs (aliases)
surf tab.name myapp            # Name current tab
surf tab.switch myapp          # Switch by name
surf tab.named                 # List named tabs
surf tab.unname myapp          # Remove name

# Tab groups
surf tab.group                 # Create/add to tab group
surf tab.ungroup               # Remove from group
surf tab.groups                # List all tab groups
```

## Window Management

```bash
surf window.list                              # List all windows
surf window.list --tabs                       # Include tab details
surf window.new                               # New window
surf window.new --url "https://example.com"   # New window with URL
surf window.new --incognito                   # New incognito window
surf window.new --unfocused                   # Don't focus new window
surf window.focus 12345                       # Focus window by ID
surf window.close 12345                       # Close window
surf window.resize --id 123 --width 1920 --height 1080
surf window.resize --id 123 --state maximized # States: normal, minimized, maximized, fullscreen
```

**Window isolation for agents:**
```bash
# Create isolated window for agent work
surf window.new "https://example.com"
# Returns window ID, use with subsequent commands:
surf --window-id 123 tab.list
surf --window-id 123 go "https://other.com"
```

## Input Methods

```bash
# CDP method (real events) - default
surf type --text "hello"
surf click --x 100 --y 200

# JS method (DOM manipulation) - for contenteditable
surf type --text "hello" --selector "#input" --method js

# Keys
surf key Enter
surf key "cmd+a"
surf key.repeat --key Tab --count 5           # Repeat key presses

# Hover and drag
surf hover --ref e5
surf drag --from-x 100 --from-y 100 --to-x 200 --to-y 200
```

## Page Inspection

```bash
surf page.read                 # Accessibility tree with refs + page text
surf page.read --no-text       # Interactive elements only (no text content)
surf page.read --ref e5        # Get specific element details
surf page.read --depth 3       # Limit tree depth
surf page.read --compact       # Minimal output for LLM efficiency
surf page.text                 # Plain text content only
surf page.state                # Modals, loading state, scroll info
```

## Semantic Element Location

Find and act on elements by role, text, or label instead of refs:

```bash
# Find by ARIA role
surf locate.role button --name "Submit" --action click
surf locate.role textbox --name "Email" --action fill --value "[email protected]"
surf locate.role link --all                    # Return all matches

# Find by text content
surf locate.text "Sign In" --action click
surf locate.text "Accept" --exact --action click

# Find form field by label
surf locate.label "Username" --action fill --value "john"
surf locate.label "Password" --action fill --value "secret"
```

**Actions:** `click`, `fill`, `hover`, `text` (get text content)

## Text Search

```bash
surf search "login"                    # Find text in page
surf search "Error" --case-sensitive   # Case-sensitive
surf search "button" --limit 5         # Limit results
surf find "login"                      # Alias for search
```

## Element Inspection

```bash
surf element.styles e5                 # Get computed styles by ref
surf element.styles ".card"            # Or by CSS selector
# Returns: font, color, background, border, padding, bounding box
```

## Scrolling

```bash
surf scroll.bottom
surf scroll.top  
surf scroll.to --y 500         # Scroll to Y position
surf scroll.to --ref e5        # Scroll element into view
surf scroll.by --y 200         # Scroll by amount
surf scroll.info               # Get scroll position
```

## Waiting

```bash
surf wait 2                    # Wait 2 seconds
surf wait.element ".loaded"    # Wait for element
surf wait.network              # Wait for network idle
surf wait.url "/success"       # Wait for URL pattern
surf wait.dom --stable 100     # Wait for DOM stability
surf wait.load                 # Wait for page load complete
```

## Dialog Handling

```bash
surf dialog.info               # Get current dialog type/message
surf dialog.accept             # Accept (OK)
surf dialog.accept --text "response"  # Accept prompt with text
surf dialog.dismiss            # Dismiss (Cancel)
```

## Device/Network Emulation

```bash
# Network throttling
surf emulate.network slow-3g   # Presets: slow-3g, fast-3g, 4g, offline
surf emulate.network reset     # Disable throttling

# CPU throttling  
surf emulate.cpu 4             # 4x slower
surf emulate.cpu 1             # Reset

# Device emulation (19 presets)
surf emulate.device "iPhone 14"
surf emulate.device "Pixel 7"
surf emulate.device --list     # List available devices

# Custom viewport
surf emulate.viewport --width 1280 --height 720
surf emulate.touch --enable    # Enable touch emulation

# Geolocation
surf emulate.geo --lat 37.7749 --lon -122.4194
surf emulate.geo --clear
```

## Form Automation

```bash
surf page.read                 # Get element refs first

# Fill by ref
surf form.fill --data '[{"ref":"e1","value":"John"},{"ref":"e2","value":"[email protected]"}]'

# Checkboxes: true/false
surf form.fill --data '[{"ref":"e7","value":true}]'

# Dropdown selection
surf select e5 "Option A"                    # By value (default)
surf select e5 "Option A" "Option B"         # Multi-select
surf select e5 --by label "Display Text"     # By visible label
surf select e5 --by index 2                  # By index (0-based)
```

## File Upload

```bash
surf upload --ref e5 --files "/path/to/file.txt"
surf upload --ref e5 --files "/path/file1.txt,/path/file2.txt"
```

## Iframe Handling

```bash
surf frame.list                # List frames with IDs
surf frame.switch "FRAME_ID"   # Switch to iframe context
surf frame.main                # Return to main frame
surf frame.js --id "FRAME_ID" --code "return document.title"

# After frame.switch, subsequent commands target that frame:
surf frame.switch "iframe-1"
surf page.read                 # Reads iframe content
surf click e5                  # Clicks in iframe
surf frame.main                # Back to main page
```

## Network Inspection

```bash
surf network                   # List captured requests
surf network --stream          # Real-time network events
surf network.get --id "req-123"   # Full request details
surf network.body --id "req-123"  # Get response body
surf network.curl --id "req-123"  # Generate curl command
surf network.origins           # List origins with stats
surf network.stats             # Capture statistics
surf network.export            # Export all requests
surf network.clear             # Clear captured requests
```

## Console

```bash
surf console                   # Get console messages
surf console --stream          # Real-time console
surf console --stream --level error  # Errors only
```

## JavaScript Execution

```bash
surf js "return document.title"
surf js "document.querySelector('.btn').click()"
```

## Performance

```bash
surf perf.metrics              # Current metrics snapshot
surf perf.start                # Start trace
surf perf.stop                 # Stop and get results
```

## Screenshots

```bash
surf screenshot                           # Auto-saves to /tmp/surf-snap-*.png
surf screenshot --output /tmp/shot.png    # Save to specific file
surf screenshot --selector ".card"        # Element only
surf screenshot --full-page               # Full page scroll capture
surf screenshot --no-save                 # Return base64 only, don't save file
```

## Zoom

```bash
surf zoom                      # Get current zoom level
surf zoom 1.5                  # Set zoom to 150%
surf zoom 1                    # Reset to 100%
```

## Cookies & Storage

```bash
surf cookie.list               # List cookies for current page
surf cookie.list --domain .google.com
surf cookie.set --name "token" --value "abc123"
surf cookie.get --name "token"
surf cookie.clear              # Clear all cookies
```

## History & Bookmarks

```bash
surf history --query "github" --max 20
surf bookmarks --query "docs"
surf bookmark.add --url "https://..." --title "My Bookmark"
surf bookmark.remove
```

## Health Checks & Smoke Tests

```bash
surf health --url "http://localhost:3000"
surf smoke --urls "http://localhost:3000" "http://localhost:3000/about"
surf smoke --urls "..." --screenshot /tmp/smoke
```

## Workflows

Execute multi-step browser automation as a single command with smart auto-waits.

### Inline Workflows

```bash
# Pipe-separated commands
surf do 'go "https://example.com" | click e5 | screenshot'

# Multi-step login flow
surf do 'go "https://example.com/login" | type "[email protected]" --selector "#email" | type "pass" --selector "#password" | click --selector "button[type=submit]"'

# Validate without executing
surf do 'go "url" | click e5' --dry-run
```

### Named Workflows

Save workflows as JSON files in `~/.surf/workflows/` (user) or `./.surf/workflows/` (project):

```bash
# List available workflows
surf workflow.list

# Show workflow details
surf workflow.info my-workflow

# Run by name with arguments
surf do my-workflow --email "[email protected]" --password "secret"

# Validate workflow file
surf workflow.validate workflow.json
```

### Workflow JSON Format

```json
{
  "name": "Login Flow",
  "description": "Automate login process",
  "args": {
    "email": { "required": true },
    "password": { "required": true },
    "url": { "default": "https://example.com/login" }
  },
  "steps": [
    { "tool": "navigate", "args": { "url": "%{url}" } },
    { "tool": "type", "args": { "text": "%{email}", "selector": "input[name=email]" } },
    { "tool": "type", "args": { "text": "%{password}", "selector": "input[name=password]" } },
    { "tool": "click", "args": { "selector": "button[type=submit]" } },
    { "tool": "screenshot", "args": {}, "as": "result" }
  ]
}
```

### Loops and Step Outputs

```json
{
  "steps": [
    // Capture step output for later use
    { "tool": "js", "args": { "code": "return [1,2,3]" }, "as": "items" },
    
    // Fixed iterations
    { "repeat": 5, "steps": [
      { "tool": "click", "args": { "ref": "e5" } }
    ]},
    
    // Iterate over array
    { "each": "%{items}", "as": "item", "steps": [
      { "tool": "js", "args": { "code": "console.log('%{item}')" } }
    ]},
    
    // Repeat until condition
    { "repeat": 20, "until": { "tool": "js", "args": { "code": "return done" } }, "steps": [...] }
  ]
}
```

### Workflow Options

```bash
--file, -f <path>     # Load from JSON file
--dry-run             # Parse and validate without executing
--on-error stop|continue  # Error handling (default: stop)
--step-delay <ms>     # Delay between steps (default: 100, 0 to disable)
--no-auto-wait        # Disable automatic waits
--json                # Structured JSON output
```

**Auto-waits:** Commands automatically wait for completion:
- Navigation (`go`, `back`, `forward`) → waits for page load
- Clicks, key presses, form fills → waits for DOM stability
- Tab switches → waits for tab to load

**Why use `do`?** Instead of 6-8 separate CLI calls with LLM orchestration between each, a workflow executes deterministically. Faster, cheaper, and more reliable.

## Error Diagnostics

```bash
# Auto-capture screenshot + console on failure
surf wait.element ".missing" --auto-capture --timeout 2000
# Saves to /tmp/surf-error-*.png
```

## Common Options

```bash
--tab-id <id>         # Target specific tab
--window-id <id>      # Target specific window
--json                # Raw JSON output  
--auto-capture        # Screenshot + console on error
--timeout <ms>        # Override default timeout
```

## Tips

1. **First CDP operation is slow** (~5-8s) - debugger attachment overhead, subsequent calls fast
2. **Use refs from page.read** for reliable element targeting over CSS selectors
3. **JS method for contenteditable** - Modern editors (ChatGPT, Claude, Notion) need `--method js`
4. **Named tabs for workflows** - `tab.name app` then `tab.switch app`
5. **Auto-capture for debugging** - `--auto-capture` saves diagnostics on failure
6. **AI tools use browser session** - Must be logged into the service, no API keys needed
7. **Grok validation** - Run `surf grok --validate` if queries fail to check UI changes
8. **Long timeouts for thinking models** - ChatGPT o1, Grok thinking can take 60+ seconds
9. **Use `surf do` for multi-step tasks** - Reduces token overhead and improves reliability
10. **Dry-run workflows first** - `surf do '...' --dry-run` validates without executing
11. **Window isolation** - Use `window.new` + `--window-id` to keep agent work separate from your browsing
12. **Semantic locators** - `locate.role`, `locate.text`, `locate.label` for more robust element finding
13. **Frame context** - Use `frame.switch` before interacting with iframe content

## Socket API

For programmatic access:

```bash
echo '{"type":"tool_request","method":"execute_tool","params":{"tool":"tab.list","args":{}},"id":"1"}' | nc -U /tmp/surf.sock
```

Overview

This skill exposes a battle-tested CLI to control Chrome for testing, automation, and debugging. It provides zero-config browser control for navigating pages, interacting with elements, taking screenshots, inspecting network/console, emulating devices, and issuing AI queries using logged-in browser sessions. The CLI is agent-agnostic and designed for fast, repeatable automation across development and QA workflows.

How this skill works

Commands talk to a running Chrome instance via CLI or Unix socket and use DevTools to drive navigation, input, screenshots, and network inspection. It reads page structure into element refs, supports semantic locating (role/label/text), and streams DevTools events for real-time debugging. AI queries run through the browser session so no API keys are required, provided you are logged into the target service in Chrome.

When to use it

  • Automating form fills, login flows, and repetitive UI tasks from scripts or CI
  • Capturing screenshots, full-page snapshots, or element-level images for visual tests
  • Inspecting network requests, console logs, and performance traces during debugging
  • Emulating devices, network/CPU conditions, geolocation, or touch input for QA
  • Running agent workflows that need controlled, isolated browser windows or tabs
  • Querying ChatGPT/Gemini/Perplexity/Grok via a logged-in Chrome session without API keys

Best practices

  • Call page.read first to generate stable element refs before interacting
  • Prefer semantic locate (role/label/text) over brittle selectors when possible
  • Use window isolation to create per-agent contexts and avoid cross-contamination
  • Validate AI UI bindings (e.g., surf grok --validate) when model selection fails
  • Record network or perf traces for intermittent issues and increase timeouts for slow models

Example use cases

  • Run a named workflow to log into an app, exercise features, and snapshot results for CI
  • Automate a multi-field form submission using form.fill or locate.label commands
  • Stream network events while reproducing a bug to capture failing requests and response bodies
  • Throttling network and CPU to test app behavior on slow mobile conditions
  • Ask Gemini or ChatGPT to summarize the current page context using --with-page for richer AI responses

FAQ

Do I need API keys to query AI models?

No. AI queries use the browser's logged-in session; you must be signed into the service in Chrome.

How do I handle UI changes that break commands?

Run validation (e.g., surf grok --validate) to inspect UI selectors, update local settings, or use semantic locate to target accessible roles and labels.

Can I run workflows in CI?

Yes. Workflows support dry-run validation, named JSON workflows, and can run headless or in isolated windows depending on environment setup.