home / skills / toilahuongg / google-antigravity-kit / browser-automation
npx playbooks add skill toilahuongg/google-antigravity-kit --skill browser-automationReview the files below or copy the command above to add this skill to your agents.
---
name: browser-automation
description: Guide for browser automation and testing using the browser_subagent tool. Use this skill when users need to interact with web pages, test user flows, scrape dynamic content, automate form submissions, capture screenshots, verify UI changes, or record browser sessions. Supports clicking, typing, navigation, scrolling, waiting for elements, and all standard browser interactions with automatic video recording.
license: Complete terms in LICENSE.txt
---
# Browser Automation
This skill provides comprehensive guidance for browser automation using the `browser_subagent` tool.
## Overview
The `browser_subagent` enables autonomous browser control with automatic session recording. All interactions are captured as WebP videos saved to the artifacts directory.
## Core Capabilities
1. **Navigation** - Open URLs, navigate pages, handle redirects
2. **Interaction** - Click, type, scroll, hover, drag-and-drop
3. **Extraction** - Read DOM content, capture screenshots, scrape data
4. **Verification** - Test user flows, validate UI changes
5. **Recording** - Automatic video capture of all sessions
## When to Use
Use `browser_subagent` (vs `read_url_content`) when:
- JavaScript execution is required
- User interaction is needed (forms, clicks, navigation)
- Authentication or session state is required
- Dynamic content loads after page render
- Visual verification or screenshots are needed
- Recording demonstrations or tutorials
Use `read_url_content` for static HTML content where JavaScript isn't needed.
## Tool Parameters
### TaskName (required)
Human-readable title for the browser task.
- Should be properly capitalized
- Example: "Testing Login Flow", "Scraping Product Data"
- Avoid URLs or technical jargon
### Task (required)
Detailed instructions for the browser subagent. Be explicit about:
- What to do
- When to stop
- What information to return
**Critical**: The subagent is autonomous and one-shot. Provide comprehensive instructions upfront.
### RecordingName (required)
Filename for the video recording.
- All lowercase with underscores
- Maximum 3 words
- Describes what the recording contains
- Example: `login_flow_demo`, `checkout_process`
## Best Practices
### 1. Clear Task Instructions
**Bad**: "Check the website"
```
Task: Go to example.com and check it
```
**Good**: Specific, with clear completion criteria
```
Task: Navigate to https://example.com, wait for the page to fully load,
verify that the main heading contains "Welcome", capture a screenshot of
the page, then return the page title and the text content of the main
heading.
```
### 2. Return Conditions
Always specify what the subagent should return:
```
Task: Navigate to the product page at https://shop.example.com/products/123
and extract the following data:
- Product title
- Price
- Availability status
- Number of reviews
Return this information in a structured format when complete.
```
### 3. Error Handling
Instruct the subagent how to handle failures:
```
Task: Attempt to log in to https://app.example.com with username "testuser"
and password "testpass123". If login succeeds, navigate to the dashboard
and return the user's display name. If login fails, capture a screenshot
of the error message and return the error text.
```
### 4. Wait Conditions
Specify wait conditions for dynamic content:
```
Task: Navigate to https://example.com/search, type "widgets" into the
search box, click the search button, and wait until the results list
appears (look for element with class "search-results"). Once results load,
count the number of result items and return that count.
```
### 5. Multi-Step Flows
Break down complex flows into clear steps:
```
Task: Complete the following checkout flow:
1. Navigate to https://shop.example.com
2. Click "Add to Cart" on the first product
3. Click the cart icon in the top right
4. Click "Proceed to Checkout"
5. Fill in the shipping form with test data
6. Capture a screenshot of the order summary
7. Return the total price shown on the order summary
```
## Common Patterns
### Authentication Testing
```
TaskName: "Testing User Login"
Task: Navigate to https://app.example.com/login, enter "[email protected]"
in the email field, enter "password123" in the password field, click the
"Sign In" button, wait for navigation to complete. If login succeeds and
you see a dashboard, return "Login successful". If there's an error message,
return the error text.
RecordingName: login_test
```
### Data Scraping
```
TaskName: "Scraping Product Listings"
Task: Navigate to https://shop.example.com/products, wait for all product
cards to load, then extract the title and price from each product card.
Return a list of products with their titles and prices. If pagination
exists, only scrape the first page.
RecordingName: product_scrape
```
### Form Submission
```
TaskName: "Submitting Contact Form"
Task: Navigate to https://example.com/contact, fill in the form with:
- Name: "Test User"
- Email: "[email protected]"
- Message: "This is a test message"
Then click the submit button and wait for the confirmation message.
Return the confirmation message text.
RecordingName: contact_form
```
### Screenshot Capture
```
TaskName: "Capturing Homepage Design"
Task: Navigate to https://example.com, wait for complete page load including
all images, scroll to show the full page layout, capture a full-page
screenshot, and return confirmation that the screenshot was saved.
RecordingName: homepage_capture
```
### UI Verification
```
TaskName: "Verifying Responsive Layout"
Task: Navigate to https://example.com, resize the browser window to mobile
width (375px), capture a screenshot, then resize to desktop width (1920px),
capture another screenshot. Return the dimensions used and confirm both
screenshots were captured.
RecordingName: responsive_check
```
## Element Selectors
The browser subagent can find elements using:
- CSS selectors
- Text content
- ARIA labels
- Position/proximity
- Visual descriptions
Be specific when describing elements:
**Good**:
- "Click the blue 'Submit' button at the bottom of the form"
- "Type into the input field labeled 'Email Address'"
- "Click the first product card in the grid"
**Avoid**:
- "Click the button" (which button?)
- "Fill in the field" (which field?)
## Waiting Strategies
### Wait for Navigation
```
After clicking "Submit", wait for the page to navigate to the success page.
```
### Wait for Elements
```
Wait until the spinner disappears and the results table is visible.
```
### Wait for Content
```
Wait until the product count shows a number greater than 0.
```
### Fixed Delays (use sparingly)
```
Wait 3 seconds for animations to complete.
```
## Recording Best Practices
### Naming Convention
- Use lowercase with underscores
- Be descriptive but concise
- Maximum 3 words
- Examples:
- `login_flow`
- `checkout_test`
- `nav_demo`
- `form_submit`
### Recording Purpose
Recordings are automatically saved and useful for:
- Debugging failed automation
- Demonstrating user flows
- Documenting test results
- Creating tutorials
- Reviewing UI behavior
## Advanced Techniques
### Session State
The browser maintains state during a single subagent execution:
- Cookies persist across navigation
- Login sessions remain active
- Form data can carry forward
### Multiple Tabs
If needed, the subagent can work with multiple tabs:
```
Task: Open https://example.com in the current tab, then open a new tab
and navigate to https://example.com/compare. Switch between tabs to
compare data from both pages.
```
### File Downloads
```
Task: Navigate to https://example.com/downloads, click the "Download Report"
button, wait for the download to complete, and return confirmation.
```
### Iframes
```
Task: Navigate to https://example.com, locate the embedded iframe
containing the video player, switch context to that iframe, then click
the play button.
```
## Error Recovery
If the browser tool encounters issues:
1. The subagent will report what went wrong
2. Read the error message carefully
3. Adjust your Task instructions
4. Try again with more specific instructions or wait conditions
Common issues:
- Element not found: Be more specific about element description
- Timeout: Add explicit wait conditions or increase wait time
- Navigation failed: Check URL validity, network issues
## Performance Tips
1. **Be Specific**: Clear selectors are faster than vague descriptions
2. **Minimize Waits**: Only wait when necessary; don't add arbitrary delays
3. **Single Purpose**: One task per browser_subagent call
4. **Return Fast**: Return as soon as the required information is collected
## Examples
See `examples/` directory for complete working examples:
- `examples/login_test.md` - Authentication flow
- `examples/form_automation.md` - Form submission
- `examples/data_extraction.md` - Web scraping
- `examples/ui_testing.md` - Visual verification
## Limitations
- Each subagent call is independent (no session sharing between calls)
- Cannot execute arbitrary JavaScript (but can interact with page elements)
- Video recordings use system resources (keep sessions focused)
- Some sites may block automation (CAPTCHA, bot detection)
## Integration with Workflows
Browser automation pairs well with:
- **Testing workflows**: Automate E2E tests
- **Data collection**: Scrape and process information
- **Documentation**: Record user flows automatically
- **Verification**: Validate deployments
## When NOT to Use
Avoid browser automation when:
- Static HTML scraping is sufficient → use `read_url_content`
- API endpoints are available → use direct API calls
- File processing is needed → use file manipulation tools
- The task requires human judgment (CAPTCHA, visual verification)