home / skills / nicobailon / surf-cli / surf-codebase

surf-codebase skill

/.claude/skills/surf-codebase

This skill helps you navigate and modify the surf-cli codebase, enabling browser automation, CDP access, and AI agent orchestration.

npx playbooks add skill nicobailon/surf-cli --skill surf-codebase

Review the files below or copy the command above to add this skill to your agents.

Files (1)
SKILL.md
3.6 KB
---
name: surf-codebase
description: Navigate and modify surf-cli codebase - Chrome extension + native host for AI browser automation. Use for surf-cli code work, architecture questions, implementing browser control/CDP/accessibility/network features.
---

# surf-cli Codebase

## Architecture
```
cli.cjs --socket:/tmp/surf.sock--> host.cjs --native-msg--> service-worker/index.ts --CDP/chrome-APIs--> browser
```

## Add CLI Command

1. Add to `TOOLS` in native/cli.cjs:158 (args, opts, examples)
2. Add handler in src/service-worker/index.ts:50 `handleMessage()` switch
3. CDP op? → add method src/cdp/controller.ts:60
4. DOM interaction? → add handler src/content/accessibility-tree.ts:99

## Add CDP Operation

1. Add method to `CDPController` class src/cdp/controller.ts:60
2. Use `this.send(tabId, "Domain.method", params)`
3. Handle events in `handleCDPEvent()` if needed

## Core Files

**src/service-worker/index.ts** (~2500L) - Central msg router, CDP ops, screenshot cache, tab registry
- Msg types: EXECUTE_CLICK, EXECUTE_TYPE, READ_PAGE, EXECUTE_SCREENSHOT
- Screenshot cache: generateScreenshotId(), cacheScreenshot(), getScreenshot()
- Tab names: tabNameRegistry Map<name,tabId>

**src/cdp/controller.ts** (~1000L) - CDP wrapper, CDPController class
- Mouse: click(), rightClick(), doubleClick(), hover(), drag()
- Keyboard: type(), pressKey(), pressKeyChord()
- Screenshots: captureScreenshot(), captureRegion()
- Network/Console: enableNetworkTracking(), getNetworkRequests(), getResponseBody(), subscribeToNetwork(), enableConsoleTracking(), getConsoleMessages()
- Emulation: emulateNetwork(), emulateCPU(), emulateGeolocation()

**src/content/accessibility-tree.ts** (~1900L) - Content script, generates a11y tree YAML, element interactions
- Handlers: GENERATE_ACCESSIBILITY_TREE, CLICK_ELEMENT, FORM_INPUT, GET_ELEMENT_COORDINATES, WAIT_FOR_ELEMENT, WAIT_FOR_URL
- Element refs: e1, e2... in window.__piRefs for stable references

**native/cli.cjs** (~2100L) - CLI parser, socket client
- TOOLS: command defs with args/opts
- ALIASES: shortcut→command map
- AUTO_SCREENSHOT_TOOLS: commands that auto-capture
- parseArgs(), sendRequest()

**native/host.cjs** (~2100L) - Socket server, AI integration
- handleToolRequest(): main dispatcher
- mapToolToMessage(): tool→extension msg converter
- queueAiRequest(): AI request serialization
- AI clients: chatgptClient, geminiClient, perplexityClient

**src/native/port-manager.ts** - Extension↔native messaging, auto-reconnect, request/response tracking

## Message Flow

1. CLI: `surf click e5`
2. cli.cjs parses → socket → host.cjs
3. host.cjs routes to EXECUTE_CLICK msg
4. Native msg → service-worker/index.ts
5. Service worker: CDP (cdp/controller.ts) or content script (chrome.tabs.sendMessage)
6. Response flows back to CLI

## Debug

- Extension logs: chrome://extensions → Service Worker → Console
- Host logs: /tmp/surf-host.log
- CLI raw output: `--json` flag

## Test

```bash
npm run build            # Build ext
npm test                 # Run tests
npm run test:coverage    # Coverage
```

## Element Refs

A11y tree assigns e1, e2... during READ_PAGE. Stored window.__piRefs. Reset each `read` command.

## Network Capture

- CDP captures via Network.* events cdp/controller.ts:73
- Storage: /tmp/surf/requests.jsonl
- Formatters: native/formatters/network.cjs:259

## Build/Install

```bash
npm run dev              # Watch mode
npm run build            # Prod → dist/
# Load dist/ as unpacked ext
surf install <ext-id>    # Register native host
```

## Gotchas

- CDP attach: 100-500ms first time/tab
- chrome:// pages restricted
- Content script needs page refresh on existing tabs
- Screenshots auto-resize 1200px unless --full

Overview

This skill lets you navigate and modify the surf-cli codebase that powers AI-driven Chrome automation via a CLI, a native host, and a browser extension. It focuses on implementing and debugging features around Chrome DevTools Protocol (CDP), accessibility-based DOM interactions, network/console capture, and CLI tooling. Use it to add commands, CDP operations, or content-script behaviors and to reason about architecture and message flow.

How this skill works

The codebase routes CLI commands through a native host into a service worker which dispatches either CDP operations or content-script messages to the page. Key components include a CDP controller for browser control, a content-script that generates accessibility trees and stable element refs, and a CLI layer that maps tools to extension messages. Screenshot caching, tab registry, and network/console tracking are implemented centrally in the service worker and CDP controller.

When to use it

  • Adding a new surf-cli command that triggers browser actions from the CLI.
  • Implementing a new CDP operation such as emulation, network manipulation, or custom screenshot capture.
  • Building DOM interactions or accessibility-based selectors that need stable element references.
  • Debugging message flow, CDP attach latency, or native-extension IPC issues.
  • Extending network/console capture and storage or adding formatters for captured data.

Best practices

  • Add CLI entries in the native/cli TOOLS map and create matching handlers in the service worker’s message router.
  • Keep CDP methods on the CDPController and call this.send(tabId, 'Domain.method', params) to maintain consistent wiring.
  • Use accessibility tree handlers and window.__piRefs for stable element references instead of brittle selectors.
  • Handle CDP events in handleCDPEvent when operations require asynchronous event tracking (network, loading, console).
  • Account for CDP attach latency (100–500ms) and chrome:// restrictions; document when refresh is needed for content scripts.

Example use cases

  • Add a 'click-and-screenshot' CLI tool: register in TOOLS, route to EXECUTE_CLICK, then call captureScreenshot in CDPController.
  • Implement network throttling: add emulateNetwork in CDPController and expose a CLI flag to trigger it via the host.
  • Create a form autofill command: use GENERATE_ACCESSIBILITY_TREE to locate fields, then FORM_INPUT via the content script.
  • Extend request logging: subscribeToNetwork in CDPController, format entries using native formatters, and write to /tmp/surf/requests.jsonl.

FAQ

Why do some commands require a page refresh?

Content scripts must be injected into the page; for existing tabs a refresh is required to load the updated content script bundle.

Where are network captures stored and how to format them?

Network events are collected via Network.* CDP events and stored under /tmp/surf/requests.jsonl; native formatters produce readable output.

How do I add a new CDP event handler?

Add a method to CDPController and handle the event in handleCDPEvent to update state or forward messages to the service worker.