home / skills / openclaw / skills / screen-monitor
This skill enables fast screen sharing or full browser control for debugging and UI automation, with simple WebRTC access or a browser relay.
npx playbooks add skill openclaw/skills --skill screen-monitorReview the files below or copy the command above to add this skill to your agents.
---
name: screen-monitor
description: Dual-mode screen sharing and analysis. Model-agnostic (Gemini/Claude/Qwen3-VL).
metadata: {"clawdbot":{"emoji":"🖥️","requires":{"model_features":["vision"]}}}
---
# Screen Monitor
This skill provides two ways for the agent to see and interact with your screen.
## 🟢 Path A: Fast Share (WebRTC)
*Best for: Quick visual checks, restricted browsers, or non-technical environments.*
### Tools
- **`screen_share_link`**: Generates a local WebRTC portal URL.
- **`screen_analyze`**: Captures the current frame from the portal and analyzes it with vision.
**Usage:**
```bash
# Get the link
bash command:"{baseDir}/references/get-share-url.sh"
# Analyze
bash command:"{baseDir}/references/screen-analyze.sh"
```
---
## 🔵 Path B: Full Control (Browser Relay)
*Best for: Deep debugging, UI automation, and clicking/typing in tabs.*
### Setup
1. Run `clawdbot browser extension install`.
2. Load the unpacked extension from `clawdbot browser extension path`.
3. Click the Clawdbot icon in your Chrome toolbar to **Attach**.
### Tools
- **`browser action:snapshot`**: Take a precise screenshot of the attached tab.
- **`browser action:click`**: Interact with elements (requires `profile="chrome"`).
---
## Technical Details
- **Port**: 18795 (WebRTC Backend)
- **Files**:
- `web/screen-share.html`: The sharing portal.
- `references/backend-endpoint.js`: Frame storage server.
This skill provides dual-mode screen sharing and analysis for agents, supporting both quick visual checks and full browser control. It is model-agnostic and works with Gemini, Claude, Qwen3-VL, and similar models. Use the fast WebRTC share for lightweight inspections or the browser relay for deep debugging and UI automation.
Path A (Fast Share) creates a local WebRTC portal that serves a live frame; the agent can capture and analyze the current frame with vision tools. Path B (Full Control) attaches to a Chrome tab via a browser extension and exposes browser actions like precise snapshots, clicks, and typing to the agent. A local backend stores frames and serves them over port 18795 for the WebRTC flow.
Which mode should I choose for minimal setup?
Choose Path A (Fast Share) — it only needs the WebRTC portal link and works without installing an extension.
What browser is required for full control?
Full control requires Chrome (profile="chrome") and the clawdbot browser extension to be installed and attached.