home / skills / grasseed / google-search-browser-use / gemini-research-browser-use

gemini-research-browser-use skill

/skills/gemini-research-browser-use

npx playbooks add skill grasseed/google-search-browser-use --skill gemini-research-browser-use

Review the files below or copy the command above to add this skill to your agents.

Files (1)
SKILL.md
20.1 KB
---
name: gemini-research-browser-use
description: Use Chrome DevTools Protocol to allow the AI to "ask Gemini" or "research with Gemini" directly. This uses the user's logged-in Chrome session, bypassing API limits and leveraging the web interface's reasoning capabilities.
---

# Gemini Research Browser Use

## Overview

Perform research or queries using Google Gemini via Chrome DevTools Protocol (CDP). This method reuses the user's **existing Chrome login session** to interact with the Gemini web interface (`https://gemini.google.com/`).

## Prerequisites

1. **Python + websockets**
   Verify:
   ```bash
   python3 --version
   python3 -m pip show websockets
   ```
   Install if missing:
   ```bash
   python3 -m pip install websockets
   ```

2. **Google Chrome**
   Verify:
   ```bash
   "/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" --version
   ```

3. **CDP Port Availability**
   Verify Chrome is listening (after launch in Step 2):
   ```bash
   curl -s http://localhost:9222/json | python3 -m json.tool
   ```

4. **Non-default user data directory (required by Chrome)**
   Chrome CDP **requires** a non-default profile path. Use a cloned profile so you keep login state.
   ```bash
   rm -rf /tmp/chrome-gemini-profile
   rsync -a "$HOME/Library/Application Support/Google/Chrome/" /tmp/chrome-gemini-profile/
   ```

## Method Comparison

| Method | Pros | Cons | Recommended |
|--------|------|------|-------------|
| **Chrome Remote Debugging (CDP)** | Uses existing login, full automation, reliable | Requires Chrome restart with debugging flag | ✅ **Yes** |
| `browser-use --browser real` | Simple CLI | Opens new session without login | ❌ No |
| `browser_subagent` | Visual feedback | Rate limited, may fail | ❌ No |

---

## ✅ Recommended Method: Chrome Remote Debugging (CDP)

This is the **most reliable method** that uses your system Chrome with existing Google login.

### Prerequisites

1. **Python 3** with `websockets` library
2. **Google Chrome** installed at `/Applications/Google Chrome.app/`
3. **User logged into Google** in Chrome

### Step 1: Install websockets (if needed)

```bash
pip3 install websockets
# Or in virtual environment:
python3 -m venv .venv && ./.venv/bin/pip install websockets
```

### Step 2: Launch Chrome with Remote Debugging (Non-default profile)

**Important**: Close any existing Chrome windows first, or use a different debugging port.

```bash
"/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" \
  --remote-debugging-port=9222 \
  --user-data-dir="/tmp/chrome-gemini-profile" \
  "https://gemini.google.com/" &
```

**Parameters explained:**
- `--remote-debugging-port=9222`: Enables CDP on port 9222
- `--user-data-dir`: Points to your existing Chrome profile (with login session)
- The URL opens Gemini directly

### Step 3: Verify Connection (CDP)

```bash
curl -s http://localhost:9222/json | python3 -m json.tool
```

Look for the Gemini page entry:
```json
{
  "title": "Google Gemini",
  "url": "https://gemini.google.com/app",
  "webSocketDebuggerUrl": "ws://localhost:9222/devtools/page/XXXXXXXX"
}
```

**Note**: If URL shows `/app` instead of just `/`, it means you're **logged in**.

### Step 4: Send Query to Gemini

Save this as `gemini_query.py` or run inline:

```python
import asyncio
import websockets
import json
import subprocess
import sys

async def query_gemini(query_text, wait_seconds=30):
    # Get the Gemini page WebSocket URL
    result = subprocess.run(
        ["curl", "-s", "http://localhost:9222/json"],
        capture_output=True, text=True
    )
    pages = json.loads(result.stdout)
    
    # Find Gemini page
    gemini_page = None
    for page in pages:
        if page.get("type") == "page" and "gemini.google.com" in page.get("url", ""):
            gemini_page = page
            break
    
    if not gemini_page:
        print("Error: Gemini page not found. Make sure Chrome is open with Gemini.")
        return None
    
    ws_url = gemini_page["webSocketDebuggerUrl"]
    print(f"Connecting to: {ws_url}")
    
    async with websockets.connect(ws_url) as ws:
        # Step 1: Input the query
        input_js = f'''
        const editor = document.querySelector('div[contenteditable="true"]');
        if(editor) {{
            editor.focus();
            document.execCommand('insertText', false, `{query_text}`);
            editor.dispatchEvent(new Event('input', {{bubbles: true}}));
            'success';
        }} else {{
            'editor not found';
        }}
        '''
        
        await ws.send(json.dumps({
            "id": 1,
            "method": "Runtime.evaluate",
            "params": {"expression": input_js}
        }))
        response = await ws.recv()
        result = json.loads(response)
        print(f"Input result: {result.get('result', {}).get('result', {}).get('value', 'unknown')}")
        
        # Step 2: Click send button
        await asyncio.sleep(1)
        click_js = '''
        const btn = document.querySelector('button[aria-label="傳送訊息"]');
        if(btn) { btn.click(); 'clicked'; } else { 'button not found'; }
        '''
        
        await ws.send(json.dumps({
            "id": 2,
            "method": "Runtime.evaluate",
            "params": {"expression": click_js}
        }))
        response = await ws.recv()
        result = json.loads(response)
        print(f"Click result: {result.get('result', {}).get('result', {}).get('value', 'unknown')}")
        
        # Step 3: Wait for response
        print(f"Waiting {wait_seconds} seconds for Gemini to respond...")
        await asyncio.sleep(wait_seconds)
        
        # Step 4: Extract the response
        extract_js = '''
        const markdownEls = document.querySelectorAll('.markdown');
        if(markdownEls.length > 0) {
            markdownEls[markdownEls.length - 1].innerText;
        } else {
            'No response found';
        }
        '''
        
        await ws.send(json.dumps({
            "id": 3,
            "method": "Runtime.evaluate",
            "params": {"expression": extract_js}
        }))
        response = await ws.recv()
        result = json.loads(response)
        content = result.get('result', {}).get('result', {}).get('value', 'No content')
        
        return content

# Main execution
if __name__ == "__main__":
    query = sys.argv[1] if len(sys.argv) > 1 else "範例問題:請用繁體中文回答什麼是區塊鏈?"
    result = asyncio.run(query_gemini(query, wait_seconds=30))
    print("\n" + "="*50)
    print("GEMINI RESPONSE:")
    print("="*50)
    print(result)
```

### Step 5: Run the Query

```bash
python3 gemini_query.py "範例問題:你的查詢問題"
```

Or inline for simple queries:

```bash
python3 << 'EOF'
import asyncio
import websockets
import json

async def send_to_gemini():
    # Get WebSocket URL
    import subprocess
    result = subprocess.run(["curl", "-s", "http://localhost:9222/json"], capture_output=True, text=True)
    pages = json.loads(result.stdout)
    ws_url = next(p["webSocketDebuggerUrl"] for p in pages if "gemini.google.com" in p.get("url", ""))
    
    async with websockets.connect(ws_url) as ws:
        # Input query
        await ws.send(json.dumps({
            "id": 1,
            "method": "Runtime.evaluate",
            "params": {"expression": '''
                const editor = document.querySelector('div[contenteditable="true"]');
                editor.focus();
                document.execCommand('insertText', false, '範例問題:請分析比特幣未來的價格走勢');
                editor.dispatchEvent(new Event('input', {bubbles: true}));
            '''}
        }))
        await ws.recv()
        
        # Click send
        await asyncio.sleep(1)
        await ws.send(json.dumps({
            "id": 2,
            "method": "Runtime.evaluate",
            "params": {"expression": '''document.querySelector('button[aria-label="傳送訊息"]').click()'''}
        }))
        await ws.recv()
        
        # Wait and extract
        await asyncio.sleep(30)
        await ws.send(json.dumps({
            "id": 3,
            "method": "Runtime.evaluate",
            "params": {"expression": '''
                document.querySelectorAll('.markdown')[document.querySelectorAll('.markdown').length - 1].innerText
            '''}
        }))
        response = await ws.recv()
        print(json.loads(response)['result']['result']['value'])

asyncio.run(send_to_gemini())
EOF
```

---

## Alternative Method: browser-use CLI

This method is simpler but **does not use your existing Chrome login**. You'll need to log in manually each time.

### Prerequisites

```bash
# Create virtual environment
python3 -m venv .venv

# Install browser-use
./.venv/bin/pip install browser-use
```

### Workflow

#### 1) Open Gemini

```bash
./.venv/bin/browser-use --browser real open "https://gemini.google.com/"
```

#### 2) Get Page State

```bash
./.venv/bin/browser-use --browser real state
```

Look for:
- The input textbox: `contenteditable=true role=textbox`
- The send button: `aria-label=傳送訊息`

#### 3) Input Text via JavaScript eval

```bash
./.venv/bin/browser-use --browser real eval "const editor = document.querySelector('div[contenteditable=\"true\"]'); editor.focus(); document.execCommand('insertText', false, 'YOUR QUERY HERE'); editor.dispatchEvent(new Event('input', {bubbles: true}));"
```

#### 4) Click Send Button

```bash
# Get current state to find button index
./.venv/bin/browser-use --browser real state

# Click the send button (replace INDEX with actual number)
./.venv/bin/browser-use --browser real click INDEX
```

#### 5) Close Session

```bash
./.venv/bin/browser-use close
```

---

## Troubleshooting

### Chrome Remote Debugging Issues

| Problem | Cause | Solution |
|---------|-------|----------|
| `curl: (7) Failed to connect` | Chrome not running with debugging | Restart Chrome with `--remote-debugging-port=9222` |
| WebSocket connection refused | Page ID changed | Re-fetch `/json` to get new WebSocket URL |
| "editor not found" | Page not fully loaded | Wait a few seconds before running script |
| "button not found" | Send button not visible | Check if text was actually input first |
| Login page instead of app | Wrong user-data-dir path | Verify path: `"$HOME/Library/Application Support/Google/Chrome"` |
| `DevTools remote debugging requires a non-default data directory` | Chrome disallows default profile for CDP | Launch with a cloned profile: `/tmp/chrome-gemini-profile` |
| `curl` shows connection refused even though Chrome is running | CDP not listening due to profile path | Ensure `--user-data-dir` is **not** default and the port is free |
| `No Gemini page found via CDP` | Gemini not loaded or not logged in | Open `https://gemini.google.com/` in the launched Chrome and wait for `/app` |

### browser-use Issues

| Problem | Cause | Solution |
|---------|-------|----------|
| Not logged in | browser-use creates isolated session | Use Chrome Remote Debugging method instead |
| `Unknown key: "請"` error | CLI doesn't support Unicode | Use `eval` with JavaScript `execCommand` |
| Click doesn't work | Element index changed | Re-run `state` before each click |

---

## Best Practices

1. **Always use Chrome Remote Debugging** for queries requiring authentication
2. **Wait 30+ seconds** for complex queries (Gemini's "Deep Think" mode takes longer)
3. **Check for `.markdown` elements** to verify response is complete
4. **Use inline Python** for one-off queries; use the full script for automation
5. **Close Chrome debugging session** when done to avoid port conflicts
6. **Keep profile cloned** in `/tmp/chrome-gemini-profile` to avoid CDP blocking the default profile

---

## Complete Example: Crypto Price Analysis

### 完整工作流程

```bash
# Step 1: 準備 Chrome 設定檔副本 (避免 CDP 預設目錄限制)
rm -rf /tmp/chrome-gemini-profile
rsync -a "$HOME/Library/Application Support/Google/Chrome/" /tmp/chrome-gemini-profile/

# Step 2: 啟動 Chrome 遠端除錯模式
"/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" \
  --remote-debugging-port=9222 \
  --user-data-dir="/tmp/chrome-gemini-profile" \
  "https://gemini.google.com/" > /dev/null 2>&1 &

# Step 3: 等待頁面載入並驗證連接
sleep 8
curl -s http://localhost:9222/json | python3 -c "import sys, json; pages = json.load(sys.stdin); gemini = [p for p in pages if p.get('type') == 'page' and 'gemini.google.com' in p.get('url', '')]; print(f\"找到 Gemini 頁面: {gemini[0]['url'] if gemini else '未找到'}\")"
```

### 方法 1: 完整查詢腳本 (query_gemini.py)

將以下內容儲存為 `query_gemini.py`:

```python
import asyncio
import websockets
import json
import subprocess
import sys

async def query_gemini(query_text, wait_seconds=60):
    # Get the Gemini page WebSocket URL
    result = subprocess.run(
        ["curl", "-s", "http://localhost:9222/json"],
        capture_output=True, text=True
    )
    pages = json.loads(result.stdout)
    
    # Find Gemini page
    gemini_page = None
    for page in pages:
        if page.get("type") == "page" and "gemini.google.com" in page.get("url", ""):
            gemini_page = page
            break
    
    if not gemini_page:
        print("錯誤:找不到 Gemini 頁面。請確保 Chrome 已開啟 Gemini。")
        return None
    
    ws_url = gemini_page["webSocketDebuggerUrl"]
    print(f"正在連接到: {ws_url}")
    
    async with websockets.connect(ws_url) as ws:
        # Step 1: Input the query
        input_js = f'''
        const editor = document.querySelector('div[contenteditable="true"]');
        if(editor) {{
            editor.focus();
            document.execCommand('insertText', false, `{query_text}`);
            editor.dispatchEvent(new Event('input', {{bubbles: true}}));
            'success';
        }} else {{
            'editor not found';
        }}
        '''
        
        await ws.send(json.dumps({
            "id": 1,
            "method": "Runtime.evaluate",
            "params": {"expression": input_js}
        }))
        response = await ws.recv()
        result = json.loads(response)
        print(f"輸入結果: {result.get('result', {}).get('result', {}).get('value', 'unknown')}")
        
        # Step 2: Click send button
        await asyncio.sleep(1)
        click_js = '''
        const btn = document.querySelector('button[aria-label="傳送訊息"]');
        if(btn) { btn.click(); 'clicked'; } else { 'button not found'; }
        '''
        
        await ws.send(json.dumps({
            "id": 2,
            "method": "Runtime.evaluate",
            "params": {"expression": click_js}
        }))
        response = await ws.recv()
        result = json.loads(response)
        print(f"點擊結果: {result.get('result', {}).get('result', {}).get('value', 'unknown')}")
        
        # Step 3: Wait for response
        print(f"等待 {wait_seconds} 秒讓 Gemini 回應...")
        await asyncio.sleep(wait_seconds)
        
        # Step 4: Extract the response - try to get complete content
        extract_js = '''
        const markdownEls = document.querySelectorAll('.markdown');
        if(markdownEls.length > 0) {
            const lastMarkdown = markdownEls[markdownEls.length - 1];
            // Get all text content including nested elements
            lastMarkdown.innerText || lastMarkdown.textContent || 'Empty response';
        } else {
            'No response found';
        }
        '''
        
        await ws.send(json.dumps({
            "id": 3,
            "method": "Runtime.evaluate",
            "params": {"expression": extract_js}
        }))
        response = await ws.recv()
        result = json.loads(response)
        content = result.get('result', {}).get('result', {}).get('value', 'No content')
        
        return content

# Main execution
if __name__ == "__main__":
    query = """範例問題:請詳細分析 BTC、ETH 的價格預測走勢。
需包含相關專業指標,並用繁體中文回答。"""
    
    result = asyncio.run(query_gemini(query, wait_seconds=60))
    print("\n" + "="*50)
    print("GEMINI 回應:")
    print("="*50)
    print(result)
```

**執行方式:**

```bash
python3 query_gemini.py
```

### 方法 2: 獲取已存在的回應 (get_gemini_response.py)

如果 Gemini 頁面已經有回應,可以使用此腳本直接提取:

```python
import asyncio
import websockets
import json
import subprocess

async def get_all_gemini_content():
    # Get the Gemini page WebSocket URL
    result = subprocess.run(
        ["curl", "-s", "http://localhost:9222/json"],
        capture_output=True, text=True
    )
    pages = json.loads(result.stdout)
    
    # Find Gemini page
    gemini_page = None
    for page in pages:
        if page.get("type") == "page" and "gemini.google.com" in page.get("url", ""):
            gemini_page = page
            break
    
    if not gemini_page:
        print("錯誤:找不到 Gemini 頁面。")
        return None
    
    ws_url = gemini_page["webSocketDebuggerUrl"]
    print(f"正在連接到: {ws_url}\n")
    
    async with websockets.connect(ws_url) as ws:
        # Extract all markdown content from the page
        extract_js = '''
        (function() {
            const markdownEls = document.querySelectorAll('.markdown');
            console.log('Found markdown elements:', markdownEls.length);
            
            if(markdownEls.length === 0) {
                return 'No markdown elements found';
            }
            
            // Get the last two markdown elements (user query and AI response)
            const responses = [];
            const startIdx = Math.max(0, markdownEls.length - 2);
            
            for(let i = startIdx; i < markdownEls.length; i++) {
                const text = markdownEls[i].innerText || markdownEls[i].textContent || '';
                if(text.trim()) {
                    responses.push(`[回應 ${i+1}]:\\n${text}`);
                }
            }
            
            return responses.join('\\n\\n' + '='.repeat(80) + '\\n\\n');
        })()
        '''
        
        await ws.send(json.dumps({
            "id": 1,
            "method": "Runtime.evaluate",
            "params": {"expression": extract_js, "returnByValue": True}
        }))
        response = await ws.recv()
        result = json.loads(response)
        content = result.get('result', {}).get('result', {}).get('value', 'No content')
        
        return content

# Main execution
if __name__ == "__main__":
    result = asyncio.run(get_all_gemini_content())
    print("="*80)
    print("GEMINI 對話內容:")
    print("="*80)
    print(result)
```

**執行方式:**

```bash
python3 get_gemini_response.py
```

### 實際使用範例

```bash
# 完整流程
rm -rf /tmp/chrome-gemini-profile && \
rsync -a "$HOME/Library/Application Support/Google/Chrome/" /tmp/chrome-gemini-profile/ && \
"/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" \
  --remote-debugging-port=9222 \
  --user-data-dir="/tmp/chrome-gemini-profile" \
  "https://gemini.google.com/" > /dev/null 2>&1 &

# 等待並執行查詢
sleep 8 && python3 query_gemini.py
```

### 清理資源

完成查詢後,建議清理臨時文件和資源:

```bash
# 1. 關閉 Chrome 除錯會話
pkill -9 "Google Chrome"

# 2. 清理臨時設定檔 (可選,釋放磁碟空間)
rm -rf /tmp/chrome-gemini-profile

# 3. 清理測試過程中生成的臨時腳本和輸出文件
rm -f query_gemini.py get_gemini_response.py get_all_gemini_content.py
rm -f gemini_response.txt gemini_full_response.txt
```

**最佳實踐:**

1. **每次使用後關閉 Chrome** - 避免佔用 9222 端口
2. **定期清理臨時設定檔** - `/tmp/chrome-gemini-profile` 可能佔用數百 MB
3. **保持工作目錄整潔** - 刪除測試腳本,將常用腳本整合到專案中
4. **使用完整腳本** - 將上述 `query_gemini.py` 儲存為專案文件,而非每次重新建立

---

## 注意事項

1. **等待時間調整** - 複雜查詢(如深度分析)建議 `wait_seconds=60` 或更長
2. **回應截斷問題** - 如果回應很長,可能需要多次提取或使用 `get_all_gemini_content.py` 方法
3. **登入狀態** - 確保 Chrome 設定檔中已登入 Google 帳號
4. **網路穩定性** - CDP 連接需要穩定的網路環境
5. **並發限制** - 避免同時開啟多個 Chrome 除錯會話在同一端口