home / skills / openclaw / skills / screen-monitor

screen-monitor skill

/skills/emasoudy/screen-monitor

This skill enables fast screen sharing or full browser control for debugging and UI automation, with simple WebRTC access or a browser relay.

npx playbooks add skill openclaw/skills --skill screen-monitor

Review the files below or copy the command above to add this skill to your agents.

Files (9)
SKILL.md
1.4 KB
---
name: screen-monitor
description: Dual-mode screen sharing and analysis. Model-agnostic (Gemini/Claude/Qwen3-VL).
metadata: {"clawdbot":{"emoji":"🖥️","requires":{"model_features":["vision"]}}}
---

# Screen Monitor

This skill provides two ways for the agent to see and interact with your screen.

## 🟢 Path A: Fast Share (WebRTC)
*Best for: Quick visual checks, restricted browsers, or non-technical environments.*

### Tools
- **`screen_share_link`**: Generates a local WebRTC portal URL.
- **`screen_analyze`**: Captures the current frame from the portal and analyzes it with vision.

**Usage:**
```bash
# Get the link
bash command:"{baseDir}/references/get-share-url.sh"

# Analyze
bash command:"{baseDir}/references/screen-analyze.sh"
```

---

## 🔵 Path B: Full Control (Browser Relay)
*Best for: Deep debugging, UI automation, and clicking/typing in tabs.*

### Setup
1. Run `clawdbot browser extension install`.
2. Load the unpacked extension from `clawdbot browser extension path`.
3. Click the Clawdbot icon in your Chrome toolbar to **Attach**.

### Tools
- **`browser action:snapshot`**: Take a precise screenshot of the attached tab.
- **`browser action:click`**: Interact with elements (requires `profile="chrome"`).

---

## Technical Details
- **Port**: 18795 (WebRTC Backend)
- **Files**: 
  - `web/screen-share.html`: The sharing portal.
  - `references/backend-endpoint.js`: Frame storage server.

Overview

This skill provides dual-mode screen sharing and analysis for agents, supporting both quick visual checks and full browser control. It is model-agnostic and works with Gemini, Claude, Qwen3-VL, and similar models. Use the fast WebRTC share for lightweight inspections or the browser relay for deep debugging and UI automation.

How this skill works

Path A (Fast Share) creates a local WebRTC portal that serves a live frame; the agent can capture and analyze the current frame with vision tools. Path B (Full Control) attaches to a Chrome tab via a browser extension and exposes browser actions like precise snapshots, clicks, and typing to the agent. A local backend stores frames and serves them over port 18795 for the WebRTC flow.

When to use it

  • Quickly show the current screen for visual analysis or troubleshooting (low friction).
  • Inspect static UI elements or read text on screen without browser automation.
  • Perform deep debugging, automated clicking, or typed input inside a Chrome tab.
  • Work in restricted browsers where only a WebRTC share is possible.
  • Run UI automation sequences that require element-level interactions and stateful control.

Best practices

  • Use Path A (Fast Share) for rapid checks or when users cannot install extensions.
  • Use Path B (Full Control) when you need precise element actions, reproducible clicks, or typing.
  • Install and load the unpacked browser extension from the provided clawdbot path before attaching.
  • Keep the WebRTC backend reachable on port 18795 and confirm local firewall settings.
  • Limit shared content to the necessary windows or tabs to protect privacy.

Example use cases

  • Customer support quickly captures a screenshot to identify a UI bug using the WebRTC portal.
  • Automated test script clicks through a signup flow by attaching the browser relay and invoking click actions.
  • Accessibility audit: capture frames and analyze text contrast and layout via vision analysis.
  • Remote pair-debugging: attach the extension to reproduce and fix a JavaScript error in a specific tab.
  • Periodic backups of UI states by taking snapshots of key pages and archiving frames.

FAQ

Which mode should I choose for minimal setup?

Choose Path A (Fast Share) — it only needs the WebRTC portal link and works without installing an extension.

What browser is required for full control?

Full control requires Chrome (profile="chrome") and the clawdbot browser extension to be installed and attached.