home / skills / am-will / codex-skills / gemini-computer-use
This skill helps you automate browser tasks using Gemini Computer Use with Playwright, enabling goal-driven actions, safety prompts, and action loops.
npx playbooks add skill am-will/codex-skills --skill gemini-computer-useReview the files below or copy the command above to add this skill to your agents.
---
name: gemini-computer-use
description: Build and run Gemini 2.5 Computer Use browser-control agents with Playwright. Use when a user wants to automate web browser tasks via the Gemini Computer Use model, needs an agent loop (screenshot → function_call → action → function_response), or asks to integrate safety confirmation for risky UI actions.
---
# Gemini Computer Use
## Quick start
1. Source the env file and set your API key:
```bash
cp env.example env.sh
$EDITOR env.sh
source env.sh
```
2. Create a virtual environment and install dependencies:
```bash
python -m venv .venv
source .venv/bin/activate
pip install google-genai playwright
playwright install chromium
```
3. Run the agent script with a prompt:
```bash
python scripts/computer_use_agent.py \
--prompt "Find the latest blog post title on example.com" \
--start-url "https://example.com" \
--turn-limit 6
```
## Browser selection
- Default: Playwright's bundled Chromium (no env vars required).
- Choose a channel (Chrome/Edge) with `COMPUTER_USE_BROWSER_CHANNEL`.
- Use a custom Chromium-based executable (e.g., Brave) with `COMPUTER_USE_BROWSER_EXECUTABLE`.
If both are set, `COMPUTER_USE_BROWSER_EXECUTABLE` takes precedence.
## Core workflow (agent loop)
1. Capture a screenshot and send the user goal + screenshot to the model.
2. Parse `function_call` actions in the response.
3. Execute each action in Playwright.
4. If a `safety_decision` is `require_confirmation`, prompt the user before executing.
5. Send `function_response` objects containing the latest URL + screenshot.
6. Repeat until the model returns only text (no actions) or you hit the turn limit.
## Operational guidance
- Run in a sandboxed browser profile or container.
- Use `--exclude` to block risky actions you do not want the model to take.
- Keep the viewport at 1440x900 unless you have a reason to change it.
## Resources
- Script: `scripts/computer_use_agent.py`
- Reference notes: `references/google-computer-use.md`
- Env template: `env.example`
This skill builds and runs Gemini 2.5 Computer Use browser-control agents using Playwright to automate web tasks. It implements an agent loop that captures screenshots, interprets model function_calls, performs browser actions, and returns function_responses with updated screenshots and URLs. It also integrates optional safety confirmations for risky UI actions and supports configurable browser selection and sandboxed profiles.
The agent captures a screenshot and sends the user goal plus image to the Gemini Computer Use model. It parses function_call actions from model responses and executes them in Playwright, returning function_response objects that include the current URL and screenshot. If the model signals safety_decision=require_confirmation, the agent prompts for human confirmation before executing potentially risky actions. The loop repeats until the model stops issuing actions or a turn limit is reached.
Which browser does the agent use by default?
It uses Playwright's bundled Chromium by default; you can set COMPUTER_USE_BROWSER_CHANNEL or COMPUTER_USE_BROWSER_EXECUTABLE to change it.
How does safety confirmation work?
When the model returns safety_decision=require_confirmation, the agent pauses and prompts for human approval before performing the flagged action.