home / skills / d-oit / do-novelist-ai / gemini-websearch

gemini-websearch skill

safe

This skill performs fast, validated web searches with Gemini CLI and caching to deliver current information and reliable results.

npx playbooks add skill d-oit/do-novelist-ai --skill gemini-websearch

Review the files below or copy the command above to add this skill to your agents.

Files (4)

SKILL.md

6.6 KB

---
name: gemini-websearch
description: Performs web searches using Gemini CLI headless mode with google_web_search tool. Includes intelligent caching, result validation, and analytics. Use when searching for current information, documentation, or when the user explicitly requests web search.
capabilities: ["gemini-web-search", "content-extraction", "result-validation", "caching", "analytics"]
---

# Gemini Web Search

Advanced web search using Gemini CLI in headless mode with tool restriction. All searches use Gemini's `google_web_search` tool.

## Quick Start

**Basic search:**
```
python .claude/skills/gemini-websearch/scripts/search.py "search for React 19 features"
```

**With validation:**
```
python .claude/skills/gemini-websearch/scripts/search.py "search for TypeScript 5.4 new features" --validate
```

**Batch mode:**
```
python .claude/skills/gemini-websearch/scripts/search.py queries.txt --batch --output results/
```

**View analytics:**
```
python .claude/skills/gemini-websearch/scripts/search.py --show-analytics
```

## Search Workflow

Copy this checklist for research tasks:

```
Research Progress:
- [ ] Step 1: Check cache (1-hour TTL)
- [ ] Step 2: Formulate focused search query (prefix with "search ")
- [ ] Step 3: Execute headless Gemini search
- [ ] Step 4: Parse JSON and extract response content
- [ ] Step 5: Validate quality and relevance
- [ ] Step 6: Review search success and content quality
- [ ] Step 7: Log analytics
```

**Step 1: Check cache**
MD5-keyed cache with 1-hour TTL. Automatic cleanup on expiry. Tracks cache hit rates.

**Step 2: Formulate query**
Keep queries specific and focused. Break complex questions into multiple targeted searches.
**Important**: Always prefix queries with "search " for best results (e.g., "search for the React 19 best practice").

**Step 3: Execute headless search**
```
gemini -p "/tool:google_web_search query:\"your query\" raw:true" \
  --yolo --output-format json
```

The `--yolo` flag auto-approves tool usage. Returns structured JSON with comprehensive search results.

**Step 4: Parse response**
Extract search results and metadata from JSON output. Note: Gemini CLI doesn't provide citations/grounding metadata in JSON format.

**Step 5: Validate results**
Score based on:
- Content completeness and length
- Search tool success verification
- Response latency (bonus for faster searches)
- Relevance to original query

False positive detection prevents low-quality results. Note: Gemini CLI doesn't provide citations, so validation focuses on content quality metrics.

**Step 6: Review sources**
Verify that the search tool was called successfully and assess content quality directly from the response.

**Step 7: Log analytics**
Track cache hits, latency, quality scores, validation failures, and query patterns.

## Advanced Usage

For detailed examples see [examples.md](examples.md)

**Batch searches with validation:**
```
python .claude/skills/gemini-websearch/scripts/search.py queries.txt \
  --batch \
  --output results/ \
  --validate \
  --min-quality 0.7
```

**Research with retry logic:**
```
python .claude/skills/gemini-websearch/scripts/search.py "complex technical query" \
  --validate \
  --min-quality 0.7 \
  --min-relevance 0.6 \
  --retry-on-fail \
  --max-retries 2
```

**Disable cache for fresh results:**
```
python .claude/skills/gemini-websearch/scripts/search.py "breaking news topic" --no-cache
```

**Clear cache:**
```
python .claude/skills/gemini-websearch/scripts/search.py --clear-cache
```

## Configuration

### Required: Tool Restriction

Create/update `.gemini/settings.json` to restrict Gemini CLI to only `google_web_search`:

```
{
  "tools": {
    "exclude": [
      "file_read",
      "file_write",
      "file_search",
      "file_list",
      "web_fetch",
      "run_shell_command",
      "save_memory",
      "code_execution",
      "edit_file",
      "create_file",
      "delete_file",
      "list_directory",
      "move_file",
      "copy_file"
    ]
  }
}
```

This ensures Gemini ONLY uses the web search tool, not other capabilities.

### Optional: Authentication

```
# Option 1: API key (optional)
export GEMINI_API_KEY="your-api-key"

# Option 2: gcloud authentication
gcloud auth login

# Option 3: Application Default Credentials
gcloud auth application-default login
```

### Environment Variables

```
export SEARCH_CACHE_DIR=".cache/gemini-searches"
export SEARCH_CACHE_TTL="3600"  # 1 hour
export ANALYTICS_LOG="search_analytics.json"
export GEMINI_MODEL="gemini-2.5-flash"
```

### Config File

Optional `~/.gemini-search/config.json`:

```
{
  "model": "gemini-2.5-flash",
  "cache_enabled": true,
  "cache_ttl": 3600,
  "validation": {
    "enabled": true,
    "min_quality": 0.6,
    "min_citations": 0,
    "min_relevance": 0.5,
    "retry_on_fail": true,
    "max_retries": 2
  },
  "analytics": {
    "enabled": true,
    "log_file": "search_analytics.json",
    "track_cache_hits": true,
    "track_latency": true,
    "track_quality": true
  }
}
```

## Command Reference

**Single search:**
```
python .claude/skills/gemini-websearch/scripts/search.py "query" [options]
```

**Batch search:**
```
python .claude/skills/gemini-websearch/scripts/search.py queries.txt --batch [options]
```

**Show analytics:**
```
python .claude/skills/gemini-websearch/scripts/search.py --show-analytics
```

**Clear cache:**
```
python .claude/skills/gemini-websearch/scripts/search.py --clear-cache
```

### Options

- `--model MODEL` - Gemini model (default: gemini-2.5-flash)
- `--no-cache` - Disable caching
- `--validate` - Enable result validation
- `--min-quality FLOAT` - Minimum quality score (0-1, default: 0.6)
- `--min-citations INT` - Minimum citation count (default: 0, not applicable for Gemini CLI)
- `--min-relevance FLOAT` - Minimum relevance score (0-1, default: 0.5)
- `--retry-on-fail` - Retry if validation fails
- `--max-retries INT` - Maximum retry attempts (default: 2)
- `--output PATH` - Output file (single) or directory (batch)
- `--batch` - Batch mode: query arg is file path

## Best Practices

- Use headless mode for programmatic searches
- Restrict tools in settings.json for security
- Enable caching for repeated/related queries
- Validate critical searches before using results
- Monitor analytics to optimize query patterns
- Use batch mode for multi-query research
- Set quality thresholds based on use case
- Always prefix queries with "search " for optimal results
- Assess content quality directly from response text and metadata

Overview

This skill performs advanced web searches using the Gemini CLI in headless mode with the restricted google_web_search tool. It adds intelligent caching, automated result validation, retry logic, and analytics to make programmatic web research faster and more reliable. Use it when you need current information, documentation, or repeatable search workflows.

How this skill works

The skill invokes the Gemini CLI with a tool restriction so only google_web_search runs, then parses the JSON output produced by the headless call. It checks a MD5-keyed cache (default 1-hour TTL) before running queries, scores results for quality and relevance, and logs analytics like latency and cache hits. Validation flags and retry parameters can enforce minimum quality and relevance thresholds.

When to use it

Looking up current events, breaking news, or recently published docs
Automating repeated or batch research tasks
When you need programmatic, reproducible web searches with caching
When you require basic quality validation and analytics on search results
When you want to restrict tool usage to web search only

Best practices

Prefix queries with "search " and keep queries focused and specific
Enable caching for repeated queries; disable for one-off breaking news
Set quality and relevance thresholds appropriate to your use case
Use batch mode for large multi-query research and enable validation for critical tasks
Restrict tools in .gemini/settings.json to ensure only google_web_search is used
Monitor analytics and adjust cache TTL, min-quality, and retry settings based on patterns

Example use cases

Single quick lookup: run a headless search for a library or API change and validate results
Batch research: run hundreds of targeted queries from a file and export structured results
Quality-gated research: require min-quality and retry on validation failures for critical findings
Analytics-driven optimization: use logged latency and cache hit rates to tune queries and caching
Breaking news: disable cache to force fresh web searches for time-sensitive topics

FAQ

Do searches include source citations in output?

Gemini CLI JSON output does not include citations or grounding metadata; validation relies on content quality metrics, completeness, and relevance scoring.

How do I force fresh results instead of cached ones?

Use the --no-cache option or set cache_enabled to false in the config to bypass the MD5 cache and always run a live Gemini search.