home / skills / grasseed / google-search-browser-use / gemini-research-browser-use
npx playbooks add skill grasseed/google-search-browser-use --skill gemini-research-browser-useReview the files below or copy the command above to add this skill to your agents.
---
name: gemini-research-browser-use
description: Use Chrome DevTools Protocol to allow the AI to "ask Gemini" or "research with Gemini" directly. This uses the user's logged-in Chrome session, bypassing API limits and leveraging the web interface's reasoning capabilities.
---
# Gemini Research Browser Use
## Overview
Perform research or queries using Google Gemini via Chrome DevTools Protocol (CDP). This method reuses the user's **existing Chrome login session** to interact with the Gemini web interface (`https://gemini.google.com/`).
## Prerequisites
1. **Python + websockets**
Verify:
```bash
python3 --version
python3 -m pip show websockets
```
Install if missing:
```bash
python3 -m pip install websockets
```
2. **Google Chrome**
Verify:
```bash
"/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" --version
```
3. **CDP Port Availability**
Verify Chrome is listening (after launch in Step 2):
```bash
curl -s http://localhost:9222/json | python3 -m json.tool
```
4. **Non-default user data directory (required by Chrome)**
Chrome CDP **requires** a non-default profile path. Use a cloned profile so you keep login state.
```bash
rm -rf /tmp/chrome-gemini-profile
rsync -a "$HOME/Library/Application Support/Google/Chrome/" /tmp/chrome-gemini-profile/
```
## Method Comparison
| Method | Pros | Cons | Recommended |
|--------|------|------|-------------|
| **Chrome Remote Debugging (CDP)** | Uses existing login, full automation, reliable | Requires Chrome restart with debugging flag | ✅ **Yes** |
| `browser-use --browser real` | Simple CLI | Opens new session without login | ❌ No |
| `browser_subagent` | Visual feedback | Rate limited, may fail | ❌ No |
---
## ✅ Recommended Method: Chrome Remote Debugging (CDP)
This is the **most reliable method** that uses your system Chrome with existing Google login.
### Prerequisites
1. **Python 3** with `websockets` library
2. **Google Chrome** installed at `/Applications/Google Chrome.app/`
3. **User logged into Google** in Chrome
### Step 1: Install websockets (if needed)
```bash
pip3 install websockets
# Or in virtual environment:
python3 -m venv .venv && ./.venv/bin/pip install websockets
```
### Step 2: Launch Chrome with Remote Debugging (Non-default profile)
**Important**: Close any existing Chrome windows first, or use a different debugging port.
```bash
"/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" \
--remote-debugging-port=9222 \
--user-data-dir="/tmp/chrome-gemini-profile" \
"https://gemini.google.com/" &
```
**Parameters explained:**
- `--remote-debugging-port=9222`: Enables CDP on port 9222
- `--user-data-dir`: Points to your existing Chrome profile (with login session)
- The URL opens Gemini directly
### Step 3: Verify Connection (CDP)
```bash
curl -s http://localhost:9222/json | python3 -m json.tool
```
Look for the Gemini page entry:
```json
{
"title": "Google Gemini",
"url": "https://gemini.google.com/app",
"webSocketDebuggerUrl": "ws://localhost:9222/devtools/page/XXXXXXXX"
}
```
**Note**: If URL shows `/app` instead of just `/`, it means you're **logged in**.
### Step 4: Send Query to Gemini
Save this as `gemini_query.py` or run inline:
```python
import asyncio
import websockets
import json
import subprocess
import sys
async def query_gemini(query_text, wait_seconds=30):
# Get the Gemini page WebSocket URL
result = subprocess.run(
["curl", "-s", "http://localhost:9222/json"],
capture_output=True, text=True
)
pages = json.loads(result.stdout)
# Find Gemini page
gemini_page = None
for page in pages:
if page.get("type") == "page" and "gemini.google.com" in page.get("url", ""):
gemini_page = page
break
if not gemini_page:
print("Error: Gemini page not found. Make sure Chrome is open with Gemini.")
return None
ws_url = gemini_page["webSocketDebuggerUrl"]
print(f"Connecting to: {ws_url}")
async with websockets.connect(ws_url) as ws:
# Step 1: Input the query
input_js = f'''
const editor = document.querySelector('div[contenteditable="true"]');
if(editor) {{
editor.focus();
document.execCommand('insertText', false, `{query_text}`);
editor.dispatchEvent(new Event('input', {{bubbles: true}}));
'success';
}} else {{
'editor not found';
}}
'''
await ws.send(json.dumps({
"id": 1,
"method": "Runtime.evaluate",
"params": {"expression": input_js}
}))
response = await ws.recv()
result = json.loads(response)
print(f"Input result: {result.get('result', {}).get('result', {}).get('value', 'unknown')}")
# Step 2: Click send button
await asyncio.sleep(1)
click_js = '''
const btn = document.querySelector('button[aria-label="傳送訊息"]');
if(btn) { btn.click(); 'clicked'; } else { 'button not found'; }
'''
await ws.send(json.dumps({
"id": 2,
"method": "Runtime.evaluate",
"params": {"expression": click_js}
}))
response = await ws.recv()
result = json.loads(response)
print(f"Click result: {result.get('result', {}).get('result', {}).get('value', 'unknown')}")
# Step 3: Wait for response
print(f"Waiting {wait_seconds} seconds for Gemini to respond...")
await asyncio.sleep(wait_seconds)
# Step 4: Extract the response
extract_js = '''
const markdownEls = document.querySelectorAll('.markdown');
if(markdownEls.length > 0) {
markdownEls[markdownEls.length - 1].innerText;
} else {
'No response found';
}
'''
await ws.send(json.dumps({
"id": 3,
"method": "Runtime.evaluate",
"params": {"expression": extract_js}
}))
response = await ws.recv()
result = json.loads(response)
content = result.get('result', {}).get('result', {}).get('value', 'No content')
return content
# Main execution
if __name__ == "__main__":
query = sys.argv[1] if len(sys.argv) > 1 else "範例問題:請用繁體中文回答什麼是區塊鏈?"
result = asyncio.run(query_gemini(query, wait_seconds=30))
print("\n" + "="*50)
print("GEMINI RESPONSE:")
print("="*50)
print(result)
```
### Step 5: Run the Query
```bash
python3 gemini_query.py "範例問題:你的查詢問題"
```
Or inline for simple queries:
```bash
python3 << 'EOF'
import asyncio
import websockets
import json
async def send_to_gemini():
# Get WebSocket URL
import subprocess
result = subprocess.run(["curl", "-s", "http://localhost:9222/json"], capture_output=True, text=True)
pages = json.loads(result.stdout)
ws_url = next(p["webSocketDebuggerUrl"] for p in pages if "gemini.google.com" in p.get("url", ""))
async with websockets.connect(ws_url) as ws:
# Input query
await ws.send(json.dumps({
"id": 1,
"method": "Runtime.evaluate",
"params": {"expression": '''
const editor = document.querySelector('div[contenteditable="true"]');
editor.focus();
document.execCommand('insertText', false, '範例問題:請分析比特幣未來的價格走勢');
editor.dispatchEvent(new Event('input', {bubbles: true}));
'''}
}))
await ws.recv()
# Click send
await asyncio.sleep(1)
await ws.send(json.dumps({
"id": 2,
"method": "Runtime.evaluate",
"params": {"expression": '''document.querySelector('button[aria-label="傳送訊息"]').click()'''}
}))
await ws.recv()
# Wait and extract
await asyncio.sleep(30)
await ws.send(json.dumps({
"id": 3,
"method": "Runtime.evaluate",
"params": {"expression": '''
document.querySelectorAll('.markdown')[document.querySelectorAll('.markdown').length - 1].innerText
'''}
}))
response = await ws.recv()
print(json.loads(response)['result']['result']['value'])
asyncio.run(send_to_gemini())
EOF
```
---
## Alternative Method: browser-use CLI
This method is simpler but **does not use your existing Chrome login**. You'll need to log in manually each time.
### Prerequisites
```bash
# Create virtual environment
python3 -m venv .venv
# Install browser-use
./.venv/bin/pip install browser-use
```
### Workflow
#### 1) Open Gemini
```bash
./.venv/bin/browser-use --browser real open "https://gemini.google.com/"
```
#### 2) Get Page State
```bash
./.venv/bin/browser-use --browser real state
```
Look for:
- The input textbox: `contenteditable=true role=textbox`
- The send button: `aria-label=傳送訊息`
#### 3) Input Text via JavaScript eval
```bash
./.venv/bin/browser-use --browser real eval "const editor = document.querySelector('div[contenteditable=\"true\"]'); editor.focus(); document.execCommand('insertText', false, 'YOUR QUERY HERE'); editor.dispatchEvent(new Event('input', {bubbles: true}));"
```
#### 4) Click Send Button
```bash
# Get current state to find button index
./.venv/bin/browser-use --browser real state
# Click the send button (replace INDEX with actual number)
./.venv/bin/browser-use --browser real click INDEX
```
#### 5) Close Session
```bash
./.venv/bin/browser-use close
```
---
## Troubleshooting
### Chrome Remote Debugging Issues
| Problem | Cause | Solution |
|---------|-------|----------|
| `curl: (7) Failed to connect` | Chrome not running with debugging | Restart Chrome with `--remote-debugging-port=9222` |
| WebSocket connection refused | Page ID changed | Re-fetch `/json` to get new WebSocket URL |
| "editor not found" | Page not fully loaded | Wait a few seconds before running script |
| "button not found" | Send button not visible | Check if text was actually input first |
| Login page instead of app | Wrong user-data-dir path | Verify path: `"$HOME/Library/Application Support/Google/Chrome"` |
| `DevTools remote debugging requires a non-default data directory` | Chrome disallows default profile for CDP | Launch with a cloned profile: `/tmp/chrome-gemini-profile` |
| `curl` shows connection refused even though Chrome is running | CDP not listening due to profile path | Ensure `--user-data-dir` is **not** default and the port is free |
| `No Gemini page found via CDP` | Gemini not loaded or not logged in | Open `https://gemini.google.com/` in the launched Chrome and wait for `/app` |
### browser-use Issues
| Problem | Cause | Solution |
|---------|-------|----------|
| Not logged in | browser-use creates isolated session | Use Chrome Remote Debugging method instead |
| `Unknown key: "請"` error | CLI doesn't support Unicode | Use `eval` with JavaScript `execCommand` |
| Click doesn't work | Element index changed | Re-run `state` before each click |
---
## Best Practices
1. **Always use Chrome Remote Debugging** for queries requiring authentication
2. **Wait 30+ seconds** for complex queries (Gemini's "Deep Think" mode takes longer)
3. **Check for `.markdown` elements** to verify response is complete
4. **Use inline Python** for one-off queries; use the full script for automation
5. **Close Chrome debugging session** when done to avoid port conflicts
6. **Keep profile cloned** in `/tmp/chrome-gemini-profile` to avoid CDP blocking the default profile
---
## Complete Example: Crypto Price Analysis
### 完整工作流程
```bash
# Step 1: 準備 Chrome 設定檔副本 (避免 CDP 預設目錄限制)
rm -rf /tmp/chrome-gemini-profile
rsync -a "$HOME/Library/Application Support/Google/Chrome/" /tmp/chrome-gemini-profile/
# Step 2: 啟動 Chrome 遠端除錯模式
"/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" \
--remote-debugging-port=9222 \
--user-data-dir="/tmp/chrome-gemini-profile" \
"https://gemini.google.com/" > /dev/null 2>&1 &
# Step 3: 等待頁面載入並驗證連接
sleep 8
curl -s http://localhost:9222/json | python3 -c "import sys, json; pages = json.load(sys.stdin); gemini = [p for p in pages if p.get('type') == 'page' and 'gemini.google.com' in p.get('url', '')]; print(f\"找到 Gemini 頁面: {gemini[0]['url'] if gemini else '未找到'}\")"
```
### 方法 1: 完整查詢腳本 (query_gemini.py)
將以下內容儲存為 `query_gemini.py`:
```python
import asyncio
import websockets
import json
import subprocess
import sys
async def query_gemini(query_text, wait_seconds=60):
# Get the Gemini page WebSocket URL
result = subprocess.run(
["curl", "-s", "http://localhost:9222/json"],
capture_output=True, text=True
)
pages = json.loads(result.stdout)
# Find Gemini page
gemini_page = None
for page in pages:
if page.get("type") == "page" and "gemini.google.com" in page.get("url", ""):
gemini_page = page
break
if not gemini_page:
print("錯誤:找不到 Gemini 頁面。請確保 Chrome 已開啟 Gemini。")
return None
ws_url = gemini_page["webSocketDebuggerUrl"]
print(f"正在連接到: {ws_url}")
async with websockets.connect(ws_url) as ws:
# Step 1: Input the query
input_js = f'''
const editor = document.querySelector('div[contenteditable="true"]');
if(editor) {{
editor.focus();
document.execCommand('insertText', false, `{query_text}`);
editor.dispatchEvent(new Event('input', {{bubbles: true}}));
'success';
}} else {{
'editor not found';
}}
'''
await ws.send(json.dumps({
"id": 1,
"method": "Runtime.evaluate",
"params": {"expression": input_js}
}))
response = await ws.recv()
result = json.loads(response)
print(f"輸入結果: {result.get('result', {}).get('result', {}).get('value', 'unknown')}")
# Step 2: Click send button
await asyncio.sleep(1)
click_js = '''
const btn = document.querySelector('button[aria-label="傳送訊息"]');
if(btn) { btn.click(); 'clicked'; } else { 'button not found'; }
'''
await ws.send(json.dumps({
"id": 2,
"method": "Runtime.evaluate",
"params": {"expression": click_js}
}))
response = await ws.recv()
result = json.loads(response)
print(f"點擊結果: {result.get('result', {}).get('result', {}).get('value', 'unknown')}")
# Step 3: Wait for response
print(f"等待 {wait_seconds} 秒讓 Gemini 回應...")
await asyncio.sleep(wait_seconds)
# Step 4: Extract the response - try to get complete content
extract_js = '''
const markdownEls = document.querySelectorAll('.markdown');
if(markdownEls.length > 0) {
const lastMarkdown = markdownEls[markdownEls.length - 1];
// Get all text content including nested elements
lastMarkdown.innerText || lastMarkdown.textContent || 'Empty response';
} else {
'No response found';
}
'''
await ws.send(json.dumps({
"id": 3,
"method": "Runtime.evaluate",
"params": {"expression": extract_js}
}))
response = await ws.recv()
result = json.loads(response)
content = result.get('result', {}).get('result', {}).get('value', 'No content')
return content
# Main execution
if __name__ == "__main__":
query = """範例問題:請詳細分析 BTC、ETH 的價格預測走勢。
需包含相關專業指標,並用繁體中文回答。"""
result = asyncio.run(query_gemini(query, wait_seconds=60))
print("\n" + "="*50)
print("GEMINI 回應:")
print("="*50)
print(result)
```
**執行方式:**
```bash
python3 query_gemini.py
```
### 方法 2: 獲取已存在的回應 (get_gemini_response.py)
如果 Gemini 頁面已經有回應,可以使用此腳本直接提取:
```python
import asyncio
import websockets
import json
import subprocess
async def get_all_gemini_content():
# Get the Gemini page WebSocket URL
result = subprocess.run(
["curl", "-s", "http://localhost:9222/json"],
capture_output=True, text=True
)
pages = json.loads(result.stdout)
# Find Gemini page
gemini_page = None
for page in pages:
if page.get("type") == "page" and "gemini.google.com" in page.get("url", ""):
gemini_page = page
break
if not gemini_page:
print("錯誤:找不到 Gemini 頁面。")
return None
ws_url = gemini_page["webSocketDebuggerUrl"]
print(f"正在連接到: {ws_url}\n")
async with websockets.connect(ws_url) as ws:
# Extract all markdown content from the page
extract_js = '''
(function() {
const markdownEls = document.querySelectorAll('.markdown');
console.log('Found markdown elements:', markdownEls.length);
if(markdownEls.length === 0) {
return 'No markdown elements found';
}
// Get the last two markdown elements (user query and AI response)
const responses = [];
const startIdx = Math.max(0, markdownEls.length - 2);
for(let i = startIdx; i < markdownEls.length; i++) {
const text = markdownEls[i].innerText || markdownEls[i].textContent || '';
if(text.trim()) {
responses.push(`[回應 ${i+1}]:\\n${text}`);
}
}
return responses.join('\\n\\n' + '='.repeat(80) + '\\n\\n');
})()
'''
await ws.send(json.dumps({
"id": 1,
"method": "Runtime.evaluate",
"params": {"expression": extract_js, "returnByValue": True}
}))
response = await ws.recv()
result = json.loads(response)
content = result.get('result', {}).get('result', {}).get('value', 'No content')
return content
# Main execution
if __name__ == "__main__":
result = asyncio.run(get_all_gemini_content())
print("="*80)
print("GEMINI 對話內容:")
print("="*80)
print(result)
```
**執行方式:**
```bash
python3 get_gemini_response.py
```
### 實際使用範例
```bash
# 完整流程
rm -rf /tmp/chrome-gemini-profile && \
rsync -a "$HOME/Library/Application Support/Google/Chrome/" /tmp/chrome-gemini-profile/ && \
"/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" \
--remote-debugging-port=9222 \
--user-data-dir="/tmp/chrome-gemini-profile" \
"https://gemini.google.com/" > /dev/null 2>&1 &
# 等待並執行查詢
sleep 8 && python3 query_gemini.py
```
### 清理資源
完成查詢後,建議清理臨時文件和資源:
```bash
# 1. 關閉 Chrome 除錯會話
pkill -9 "Google Chrome"
# 2. 清理臨時設定檔 (可選,釋放磁碟空間)
rm -rf /tmp/chrome-gemini-profile
# 3. 清理測試過程中生成的臨時腳本和輸出文件
rm -f query_gemini.py get_gemini_response.py get_all_gemini_content.py
rm -f gemini_response.txt gemini_full_response.txt
```
**最佳實踐:**
1. **每次使用後關閉 Chrome** - 避免佔用 9222 端口
2. **定期清理臨時設定檔** - `/tmp/chrome-gemini-profile` 可能佔用數百 MB
3. **保持工作目錄整潔** - 刪除測試腳本,將常用腳本整合到專案中
4. **使用完整腳本** - 將上述 `query_gemini.py` 儲存為專案文件,而非每次重新建立
---
## 注意事項
1. **等待時間調整** - 複雜查詢(如深度分析)建議 `wait_seconds=60` 或更長
2. **回應截斷問題** - 如果回應很長,可能需要多次提取或使用 `get_all_gemini_content.py` 方法
3. **登入狀態** - 確保 Chrome 設定檔中已登入 Google 帳號
4. **網路穩定性** - CDP 連接需要穩定的網路環境
5. **並發限制** - 避免同時開啟多個 Chrome 除錯會話在同一端口