home / skills / drshailesh88 / integrated_content_os / browser-automation

browser-automation skill

/skills/cardiology/browser-automation

npx playbooks add skill drshailesh88/integrated_content_os --skill browser-automation

Review the files below or copy the command above to add this skill to your agents.

Files (1)
SKILL.md
8.2 KB
---
name: browser-automation
description: "Browser automation for ChatGPT Plus and Gemini Advanced web interfaces. Uses Playwright MCP to interact with your paid subscriptions without API costs. Supports both models for writing comparison."
---

# Browser Automation for AI Web Interfaces

Use your **ChatGPT Plus** and **Gemini Advanced** subscriptions through browser automation. No API costs - just your monthly subscription.

---

## How It Works

```
┌─────────────────────────────────────────────────────────────────┐
│                    BROWSER AUTOMATION FLOW                       │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  Your Prompt ──► Playwright MCP ──► Browser Instance            │
│                                           │                      │
│                              ┌────────────┴────────────┐        │
│                              ▼                         ▼        │
│                      ┌─────────────┐           ┌─────────────┐  │
│                      │  ChatGPT    │           │   Gemini    │  │
│                      │  chat.openai│           │   gemini.   │  │
│                      │  .com       │           │   google.com│  │
│                      └──────┬──────┘           └──────┬──────┘  │
│                             │                         │         │
│                             ▼                         ▼         │
│                      Response captured & returned to you        │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘
```

---

## Prerequisites

### 1. Playwright MCP Must Be Active

You have Playwright MCP configured. Verify it's working:

```
Use browser_snapshot to check if browser is available
```

### 2. Login Sessions

The browser automation uses saved sessions. You need to log in once:

**First-time setup:**
1. Navigate to ChatGPT/Gemini
2. Log in with your credentials
3. Session is saved for future use

---

## ChatGPT Automation

### Step-by-Step Workflow

**Step 1: Navigate to ChatGPT**
```
browser_navigate → https://chat.openai.com
```

**Step 2: Check if logged in**
```
browser_snapshot → Look for chat input or login button
```

**Step 3: If not logged in, authenticate**
```
browser_click → "Log in" button
browser_type → Enter email
browser_click → Continue
browser_type → Enter password
browser_click → Log in
```

**Step 4: Start new chat**
```
browser_click → "New chat" button (or navigate to chat.openai.com)
```

**Step 5: Type your prompt**
```
browser_type → Your prompt text in the message input
```

**Step 6: Submit and wait**
```
browser_click → Send button
browser_wait_for → Wait for response to complete
```

**Step 7: Capture response**
```
browser_snapshot → Get the response text
```

### Example: ChatGPT Writing Task

```
I will now use browser automation to get ChatGPT's response:

1. browser_navigate to https://chat.openai.com
2. browser_snapshot to see current state
3. browser_type to enter prompt in textarea
4. browser_click to send
5. browser_wait_for response
6. browser_snapshot to capture output
```

---

## Gemini Automation

### Step-by-Step Workflow

**Step 1: Navigate to Gemini**
```
browser_navigate → https://gemini.google.com
```

**Step 2: Check if logged in**
```
browser_snapshot → Look for chat input
```

**Step 3: Type your prompt**
```
browser_type → Your prompt in the input area
```

**Step 4: Submit**
```
browser_press_key → Enter (or click send button)
```

**Step 5: Wait and capture**
```
browser_wait_for → Response generation
browser_snapshot → Get response
```

---

## Practical Commands

### For Claude Code Session

When you want me to use browser automation, say:

```
"Use browser automation to ask ChatGPT: [your prompt]"
"Get Gemini's take on: [your prompt]"
"Compare browser outputs for: [your prompt]"
```

I will then:
1. Use Playwright MCP tools
2. Navigate to the appropriate site
3. Enter your prompt
4. Capture and return the response

---

## Handling Authentication

### Session Persistence

Browser automation works best with persistent sessions:

```python
# The Playwright MCP maintains browser state
# Once logged in, sessions typically persist
```

### If Session Expires

If you see a login screen:

1. **ChatGPT**: Look for "Log in" button, click it
2. **Gemini**: Look for "Sign in" button, click it
3. Complete authentication flow
4. Resume automation

### Two-Factor Authentication

If 2FA is required:
1. Automation will pause at 2FA screen
2. You manually complete 2FA
3. Automation continues

---

## Limitations

### Browser Automation Caveats

| Limitation | Workaround |
|------------|------------|
| Slower than API | Use for comparison, not bulk |
| Can break if UI changes | Report issues, I'll adapt |
| Requires active session | Keep browser open |
| Rate limits still apply | Don't spam requests |
| CAPTCHAs possible | May need manual intervention |

### When NOT to Use Browser Automation

- Bulk content generation (use GLM-4.7 API instead)
- Time-critical tasks (APIs are faster)
- Fully automated pipelines (APIs more reliable)

### When TO Use Browser Automation

- Comparing writing styles
- Using features only in Plus/Advanced
- Testing latest model versions
- When APIs are down

---

## Comparison Workflow

### Get Same Prompt from Multiple Sources

```
Step 1: Write with Claude (default, in this conversation)
Step 2: browser_navigate to ChatGPT, get response
Step 3: browser_navigate to Gemini, get response
Step 4: Compare all three side-by-side
```

### Example Request

```
"Compare how you, ChatGPT, and Gemini would write a tweet about
the cardiovascular benefits of SGLT2 inhibitors"
```

I will:
1. Write my version (Claude)
2. Use browser automation to get ChatGPT's version
3. Use browser automation to get Gemini's version
4. Present all three for comparison

---

## Troubleshooting

### Browser Not Responding

```
browser_close → Close current browser
Then start fresh with browser_navigate
```

### Wrong Page Loaded

```
browser_snapshot → Check current state
browser_navigate → Go to correct URL
```

### Element Not Found

```
browser_snapshot → Get fresh page state
Look for correct element reference
Retry with updated reference
```

### Session Logged Out

```
browser_navigate → Go to login page
Complete login flow
Resume automation
```

---

## Integration with Multi-Model Writer

This skill works with `multi-model-writer`:

```
API Models:
- /write-glm → Z.AI API
- /write-gpt → OpenAI API
- /write-gemini → Google AI Studio API

Browser Models:
- /browser-chatgpt → ChatGPT Plus web
- /browser-gemini → Gemini Advanced web
```

Use APIs for speed and reliability.
Use browser for subscription-only features or comparison.

---

## Example Session

```
User: "Use browser to compare how ChatGPT writes about statins"

Claude: I'll get ChatGPT's perspective using browser automation.

[Uses browser_navigate to https://chat.openai.com]
[Uses browser_snapshot to verify page state]
[Uses browser_type to enter: "Write a patient-friendly explanation of how statins work"]
[Uses browser_click to send]
[Uses browser_wait_for to wait for response]
[Uses browser_snapshot to capture response]

Here's what ChatGPT wrote:
[Response text]

Compared to my approach:
[Claude's version]

Key differences:
- ChatGPT emphasized X while I focused on Y
- Tone: ChatGPT more conversational, mine more clinical
- Length: Similar word count
```

---

*Browser automation gives you access to your paid subscriptions programmatically, complementing the API-based models in your arsenal.*