home / skills / yuniorglez / gemini-elite-core / stagehand-expert

stagehand-expert skill

safe

This skill orchestrates browser automation and direct CDP communication to accelerate autonomous agents with stagehand v3 for 2026.

npx playbooks add skill yuniorglez/gemini-elite-core --skill stagehand-expert

Review the files below or copy the command above to add this skill to your agents.

Files (4)

SKILL.md

3.6 KB

---
name: stagehand-expert
id: stagehand-expert
version: 4.1.0
description: "Master Architect in Stagehand V3. Expert in Direct CDP Automation, Decision Caching, and Agentic Web Orchestration for 2026."
---

# 🎭 Skill: Stagehand Expert (v4.1.0)

## Executive Summary
The `stagehand-expert` is the elite specialist in browser automation and high-precision agent orchestration. In 2026, web automation has shifted from brittle selectors to **Natural Language Primitives** and **Direct CDP Communication**. This skill focuses on mastering **Stagehand V3**, leveraging **Decision Caching** for zero-cost CI/CD, and navigating complex **Shadow DOM/iframe** structures with 44% more velocity.

---

## 📋 Table of Contents
1. [Proactive Investigation Protocol](#proactive-investigation-protocol)
2. [The "Do Not" List (Anti-Patterns)](#the-do-not-list-anti-patterns)
3. [Core Primitives (Act, Extract, Observe)](#core-primitives-act-extract-observe)
4. [Direct CDP & Performance](#direct-cdp--performance)
5. [Advanced Agent Caching](#advanced-agent-caching)
6. [Autonomous Agents (CUA)](#autonomous-agents-cua)
7. [Reference Library](#reference-library)

---

## 🔍 Proactive Investigation Protocol

Before writing a single test, the expert MUST perform a **Deep Discovery**:
1.  **Route Mapping**: identify the user flow from `page.tsx` or router configs.
2.  **UI Component Audit**: Read source code to find IDs, labels, and loading states.
3.  **Vibe Check**: Measure layout stability using the CDP "Vibe Score."
4.  **Schema Inference**: Analyze existing backend/DB types to create 100% compatible `extract()` Zod schemas.

---

## 🚫 The "Do Not" List (Anti-Patterns)

| Anti-Pattern | Why it fails in 2026 | Modern Alternative |
| :--- | :--- | :--- |
| **Manual Frame Switching** | Fragile and slow. | Use **DeepLocator (>>) & CDP**. |
| **Hardcoded Wait(2000)** | Unreliable and causes jank. | Use **`domSettleTimeout`**. |
| **Missing finally { close() }**| Leaves zombie processes. | **Mandatory `try...finally`**. |
| **LLM Calls in CI** | Slow and expensive. | Use **Persistent Decision Caches**. |
| **Ignoring CSS Animations** | Interactions fail during transitions. | Use **Reanimated-aware Waiters**. |

---

## ⚡ Core Primitives Mastery

-   **Act**: Precise natural language instructions with mapped variables.
-   **Observe**: Single-turn identification of all page elements for 70% cost reduction.
-   **Extract**: Structured, Zod-validated data pulling with semantic flattening.

---

## 💾 Advanced Decision Caching

Transform E2E tests into a deterministic asset:
-   **Develop Locally**: Live LLM generates the cache.
-   **Commit Cache**: Store DOM snapshots and results in Git.
-   **Zero-Cost CI**: Run tests in "Cached-Only" mode.

*See [References: Agent Caching](./references/advanced-agent-caching.md) for details.*

---

## 🤖 Autonomous Agents & CUA

For the most complex UIs (Cross-origin iframes, dynamic canvas):
-   **Computer Use Agent (CUA)**: Pure visual reasoning for impossible-to-parse elements.
-   **Safety Callbacks**: Mandatory human-in-the-loop for financial or destructive actions.

---

## 📖 Reference Library

Detailed deep-dives into Stagehand Excellence:

- [**Direct CDP Communication**](./references/cdp-direct-communication.md): Velocity and deep access.
- [**Agent Caching**](./references/advanced-agent-caching.md): Determinism and cost savings.
- [**Shadow DOM Mastery**](./references/shadow-dom-iframe-mastery.md): Jumping документ boundaries.
- [**Installation & Setup**](./references/setup-guide-v3.md): The Bun/Playwright stack.

---

*Updated: January 22, 2026 - 21:20*

Overview

This skill is the Stagehand Expert: a master architect for Stagehand V3 focused on high-velocity browser automation, Direct CDP communication, and deterministic Decision Caching. It packages tactical primitives and protocols that convert flaky end-to-end suites into fast, reproducible agentic workflows. The emphasis is on Natural Language Primitives, robust Shadow DOM/iframe handling, and zero-cost CI via cached decisions.

How this skill works

The skill inspects application routing, UI component code, and backend schemas to build reliable interaction maps before any test runs. It uses Direct CDP calls and DeepLocator primitives to interact with frames and shadow roots, applies Observe/Act/Extract primitives for precise actions and structured data extraction, and produces decision caches (DOM snapshots + outcomes) that can be committed and replayed in CI. Safety callbacks and visual agents handle edge cases like cross-origin iframes or dynamic canvases.

When to use it

Converting flaky selector-based E2E tests into deterministic, maintainable suites.
Optimizing CI costs by running tests in Cached-Only mode with committed decision caches.
Automating complex UIs with Shadow DOM, nested iframes, or heavy animations.
Building agentic workflows that require programmatic CDP access and performance tuning.
Creating schema-validated extracts for integration tests and contract checks.

Best practices

Run a Deep Discovery: map routes, audit UI components, and infer schemas before writing automation.
Prefer Direct CDP and DeepLocator over manual frame switching and brittle selectors.
Generate and commit Decision Caches during local development to enable zero-cost CI.
Use try...finally patterns to always close browser contexts and avoid zombie processes.
Replace fixed sleeps with domSettleTimeout and reanimated-aware waiters for animation-safe interactions.

Example use cases

Create deterministic regression tests by capturing DOM snapshots and replaying them in CI without LLM calls.
Automate a multi-frame checkout flow that spans cross-origin iframes using DeepLocator + CDP.
Extract and Zod-validate complex form data from dynamic single-page apps for contract tests.
Deploy an autonomous visual CUA agent to interact with canvas-based UIs and return structured outcomes.
Speed up flaky onboarding flows by replacing selector waits with Observe-driven element identification.

FAQ

How does Decision Caching reduce CI costs?

Decision Caching stores DOM snapshots and action outcomes generated locally; CI runs in Cached-Only mode and replays results without live LLM calls, eliminating per-run inference costs.

When should I use a Computer Use Agent (CUA)?

Use a CUA for elements that defy DOM parsing—dynamic canvases, visual-only controls, or cross-origin frames—paired with human-in-the-loop safety callbacks for destructive actions.