home / skills / melodic-software / claude-code-plugins / gemini-token-optimization
npx playbooks add skill melodic-software/claude-code-plugins --skill gemini-token-optimizationReview the files below or copy the command above to add this skill to your agents.
---
name: gemini-token-optimization
description: Optimize token usage when delegating to Gemini CLI. Covers token caching, batch queries, model selection (Flash vs Pro), and cost tracking. Use when planning bulk Gemini operations.
allowed-tools: Read, Skill
---
# Gemini Token Optimization
## 🚨 MANDATORY: Invoke gemini-cli-docs First
> **STOP - Before providing ANY response about Gemini token usage:**
>
> 1. **INVOKE** `gemini-cli-docs` skill
> 2. **QUERY** for the specific token or pricing topic
> 3. **BASE** all responses EXCLUSIVELY on official documentation loaded
## Overview
Skill for optimizing cost and token usage when delegating to Gemini CLI. Essential for efficient bulk operations and cost-conscious workflows.
## When to Use This Skill
**Keywords:** token usage, cost optimization, gemini cost, model selection, flash vs pro, caching, batch queries, reduce tokens
**Use this skill when:**
- Planning bulk Gemini operations
- Optimizing costs for large-scale analysis
- Choosing between Flash and Pro models
- Understanding token caching benefits
- Tracking usage across sessions
## Token Caching
Gemini CLI automatically caches context to reduce costs by reusing previously processed content.
### Availability
| Auth Method | Caching Available |
| --- | --- |
| API key (Gemini API) | YES |
| Vertex AI | YES |
| OAuth (personal/enterprise) | NO |
### How It Works
- System instructions and repeated context are cached
- Cached tokens don't count toward billing
- View savings via `/stats` command or JSON output
### Maximizing Cache Hits
1. **Use consistent system prompts** - Same prefix increases cache reuse
2. **Batch similar queries** - Group related analysis together
3. **Reuse context files** - Same files in same order
### Monitoring Cache Usage
```bash
result=$(gemini "query" --output-format json)
total=$(echo "$result" | jq '.stats.models | to_entries | map(.value.tokens.total) | add // 0')
cached=$(echo "$result" | jq '.stats.models | to_entries | map(.value.tokens.cached) | add // 0')
billable=$((total - cached))
savings=$((cached * 100 / total))
echo "Total: $total tokens"
echo "Cached: $cached tokens ($savings% savings)"
echo "Billable: $billable tokens"
```
## Model Selection
### Model Comparison
| Model | Context Window | Speed | Cost | Quality |
| --- | --- | --- | --- | --- |
| gemini-2.5-flash | Large | Fast | Lower | Good |
| gemini-2.5-pro | Very large | Slower | Higher | Best |
### Selection Criteria
**Use Flash (`-m gemini-2.5-flash`) when:**
- Processing large files (bulk analysis)
- Simple extraction tasks
- Cost is a primary concern
- Speed is critical
- Task is straightforward
**Use Pro (`-m gemini-2.5-pro`) when:**
- Complex reasoning required
- Quality is critical
- Nuanced analysis needed
- Task requires deep understanding
- Context exceeds 1M tokens
### Model Selection Examples
```bash
# Bulk file analysis - use Flash
for file in src/*.ts; do
gemini "List all exports" -m gemini-2.5-flash --output-format json < "$file"
done
# Security audit - use Pro for quality
gemini "Deep security analysis" -m gemini-2.5-pro --output-format json < critical-auth.ts
# Cost tracking with model info
result=$(gemini "query" --output-format json)
model=$(echo "$result" | jq -r '.stats.models | keys[0]')
tokens=$(echo "$result" | jq '.stats.models | to_entries[0].value.tokens.total')
echo "Used $model: $tokens tokens"
```
## Batching Strategy
### Why Batch?
- Reduces API overhead
- Increases cache hit rate
- Provides consistent context
### Batching Patterns
#### Pattern 1: Concatenate Files
```bash
# Instead of N separate calls
# Do one call with all files
cat src/*.ts | gemini "Analyze all TypeScript files for patterns" --output-format json
```
#### Pattern 2: Batch Prompts
```bash
# Combine related questions
gemini "Answer these questions about the codebase:
1. What is the main architecture pattern?
2. How is authentication handled?
3. What database is used?" --output-format json
```
#### Pattern 3: Staged Analysis
```bash
# First pass: Quick overview with Flash
overview=$(cat src/*.ts | gemini "List all modules" -m gemini-2.5-flash --output-format json)
# Second pass: Deep dive critical areas with Pro
echo "$overview" | jq -r '.response' | grep "auth\|security" | while read module; do
gemini "Deep analysis of $module" -m gemini-2.5-pro --output-format json
done
```
## Cost Tracking
### Per-Query Tracking
```bash
result=$(gemini "query" --output-format json)
# Extract all cost-relevant stats
total_tokens=$(echo "$result" | jq '.stats.models | to_entries | map(.value.tokens.total) | add // 0')
cached_tokens=$(echo "$result" | jq '.stats.models | to_entries | map(.value.tokens.cached) | add // 0')
models_used=$(echo "$result" | jq -r '.stats.models | keys | join(", ")')
tool_calls=$(echo "$result" | jq '.stats.tools.totalCalls // 0')
latency=$(echo "$result" | jq '.stats.models | to_entries | map(.value.api.totalLatencyMs) | add // 0')
echo "$(date): tokens=$total_tokens cached=$cached_tokens models=$models_used tools=$tool_calls latency=${latency}ms" >> usage.log
```
### Session Tracking
```bash
# Track cumulative usage across a session
total_session_tokens=0
total_session_cached=0
total_session_calls=0
track_usage() {
local result="$1"
local tokens=$(echo "$result" | jq '.stats.models | to_entries | map(.value.tokens.total) | add // 0')
local cached=$(echo "$result" | jq '.stats.models | to_entries | map(.value.tokens.cached) | add // 0')
total_session_tokens=$((total_session_tokens + tokens))
total_session_cached=$((total_session_cached + cached))
total_session_calls=$((total_session_calls + 1))
}
# Use in workflow
result=$(gemini "query 1" --output-format json)
track_usage "$result"
result=$(gemini "query 2" --output-format json)
track_usage "$result"
echo "Session total: $total_session_tokens tokens ($total_session_cached cached) in $total_session_calls calls"
```
## Optimization Checklist
### Before Large Operations
- [ ] Choose appropriate model (Flash vs Pro)
- [ ] Check if caching is available (API key or Vertex)
- [ ] Plan batching strategy
- [ ] Set up usage tracking
### During Operations
- [ ] Monitor cache hit rates
- [ ] Track per-query costs
- [ ] Adjust model if quality insufficient
- [ ] Batch similar queries
### After Operations
- [ ] Review total usage
- [ ] Calculate effective cost
- [ ] Identify optimization opportunities
- [ ] Document learnings
## Quick Reference
### Cost-Saving Commands
```bash
# Use Flash for bulk
gemini "query" -m gemini-2.5-flash --output-format json
# Check cache effectiveness
gemini "query" --output-format json | jq '{total: .stats.models | to_entries | map(.value.tokens.total) | add, cached: .stats.models | to_entries | map(.value.tokens.cached) | add}'
# Minimal output (fewer output tokens)
gemini "Answer in one sentence: {question}" --output-format json
```
### Cost Estimation
Rough token estimates:
- 1 token ~ 4 characters (English)
- 1 page of code ~ 500-1000 tokens
- Typical source file ~ 200-2000 tokens
## Keyword Registry (Delegates to gemini-cli-docs)
| Topic | Query Keywords |
| --- | --- |
| Caching | `token caching`, `cached tokens`, `/stats` |
| Model selection | `model routing`, `flash vs pro`, `-m flag` |
| Costs | `quota pricing`, `token usage`, `billing` |
| Output control | `output format`, `json output` |
## Test Scenarios
### Scenario 1: Check Token Usage
**Query**: "How do I see how many tokens Gemini used?"
**Expected Behavior**:
- Skill activates on "token usage" or "gemini cost"
- Provides JSON stats extraction pattern
**Success Criteria**: User receives jq commands to extract token counts
### Scenario 2: Reduce Costs
**Query**: "How do I reduce Gemini CLI costs for bulk analysis?"
**Expected Behavior**:
- Skill activates on "cost optimization" or "reduce tokens"
- Recommends Flash model and batching
**Success Criteria**: User receives cost optimization strategies
### Scenario 3: Model Selection
**Query**: "Should I use Flash or Pro for this task?"
**Expected Behavior**:
- Skill activates on "flash vs pro" or "model selection"
- Provides decision criteria table
**Success Criteria**: User receives model comparison and recommendation
## References
Query `gemini-cli-docs` for official documentation on:
- "token caching"
- "model selection"
- "quota and pricing"
## Version History
- v1.1.0 (2025-12-01): Added Test Scenarios section
- v1.0.0 (2025-11-25): Initial release