home / skills / samhvw8 / dotfiles / ai-tools

ai-tools skill

/dot_ccp/hub/skills/ai-tools

This skill enables multimodal AI analysis and research by integrating Gemini API, Gemini CLI, and NotebookLM for transcription, extraction, and querying.

npx playbooks add skill samhvw8/dotfiles --skill ai-tools

Review the files below or copy the command above to add this skill to your agents.

Files (29)
SKILL.md
7.0 KB
---
name: ai-tools
description: "Google AI tools integration. Modules: Gemini API (multimodal: audio/image/video/PDF, 2M context), Gemini CLI (second opinions, Google Search, code review), NotebookLM (source-grounded Q&A). Capabilities: transcription, OCR, video analysis, image generation, web search, document queries. Actions: transcribe, analyze, extract, generate, query, search with Google AI. Keywords: Gemini, Gemini API, Gemini CLI, NotebookLM, audio transcription, image captioning, video analysis, PDF extraction, Google Search, second opinion, source-grounded, multimodal, web research. Use when: processing media files, needing second AI opinion, searching current web info, querying uploaded documents, generating images."
allowed-tools:
  - Bash
  - Read
  - Write
  - Edit
  - Grep
  - Glob
---

# Google AI Tools

Unified integration for Google's AI ecosystem: Gemini API (multimodal), Gemini CLI, and NotebookLM.

## Module Selection

| Need | Module | When to Use |
|------|--------|-------------|
| **Media Processing** | Gemini API | Audio/image/video/PDF analysis, generation |
| **Second Opinion** | Gemini CLI | Code review, cross-validation, alternative perspective |
| **Web Research** | Gemini CLI | Current info via Google Search grounding |
| **Doc-Grounded Q&A** | NotebookLM | Questions from uploaded documents |

---

## Gemini API (Multimodal)

Process audio, images, videos, documents, and generate images.

### Prerequisites

```bash
export GEMINI_API_KEY="your-key"  # Get from https://aistudio.google.com/apikey
pip install google-genai python-dotenv pillow
```

### Quick Commands

**Transcribe Audio:**
```bash
python scripts/gemini_batch_process.py --files audio.mp3 --task transcribe --model gemini-2.5-flash
```

**Analyze Image:**
```bash
python scripts/gemini_batch_process.py --files image.jpg --task analyze --prompt "Describe this" --output output.md
```

**Process Video:**
```bash
python scripts/gemini_batch_process.py --files video.mp4 --task analyze --prompt "Summarize with timestamps"
```

**Extract from PDF:**
```bash
python scripts/gemini_batch_process.py --files doc.pdf --task extract --prompt "Extract tables as JSON" --format json
```

**Generate Image:**
```bash
python scripts/gemini_batch_process.py --task generate --prompt "A futuristic city" --model gemini-2.5-flash-image
```

### Model Selection

| Model | Use Case | Context |
|-------|----------|---------|
| gemini-2.5-flash | General (best price/perf) | 1-2M tokens |
| gemini-2.5-pro | Highest quality | 1-2M tokens |
| gemini-2.5-flash-image | Image generation | - |

### Supported Formats

- **Audio:** WAV, MP3, AAC, FLAC, OGG (up to 9.5 hrs)
- **Images:** PNG, JPEG, WEBP, HEIC (up to 3,600 images)
- **Video:** MP4, MOV, AVI, WebM (up to 6 hrs)
- **Documents:** PDF (up to 1,000 pages)

**References:** `references/audio-processing.md`, `references/vision-understanding.md`, `references/video-analysis.md`, `references/document-extraction.md`, `references/image-generation.md`

---

## Gemini CLI

Orchestrate Gemini for code review, web search, and parallel tasks.

### Verify Installation

```bash
command -v gemini || which gemini
```

### Quick Commands

**Code Generation:**
```bash
gemini "Create [description]. Output complete file." --yolo -o text
```

**Code Review:**
```bash
gemini "Review [file] for bugs and security issues" -o text
```

**Web Research:**
```bash
gemini "What are the latest [topic]? Use Google Search." -o text
```

**Architecture Analysis:**
```bash
gemini "Use codebase_investigator to analyze this project" -o text
```

**Faster Model:**
```bash
gemini "[prompt]" -m gemini-2.5-flash -o text
```

### Key Flags

- `--yolo` / `-y`: Auto-approve tool calls
- `-o text`: Human-readable output
- `-o json`: Structured output
- `-m gemini-2.5-flash`: Faster model

### When to Use

✅ Second opinion on code
✅ Current web information
✅ Codebase architecture analysis
✅ Parallel code generation

❌ Simple quick tasks
❌ Interactive refinement

**References:** `references/gemini-reference.md`, `references/gemini-patterns.md`, `references/gemini-templates.md`, `references/gemini-tools.md`

---

## NotebookLM

Query uploaded documents with source-grounded answers.

### Prerequisites

```bash
python scripts/run.py auth_manager.py status  # Check auth
python scripts/run.py auth_manager.py setup   # One-time setup (browser visible)
```

### Quick Commands

**List Notebooks:**
```bash
python scripts/run.py notebook_manager.py list
```

**Add Notebook:**
```bash
python scripts/run.py notebook_manager.py add \
  --url "https://notebooklm.google.com/notebook/..." \
  --name "Name" --description "What it contains" --topics "topic1,topic2"
```

**Ask Question:**
```bash
python scripts/run.py ask_question.py --question "Your question" --notebook-id ID
```

**Search Notebooks:**
```bash
python scripts/run.py notebook_manager.py search --query "keyword"
```

### Critical Notes

1. **Always use `run.py` wrapper** - Handles venv automatically
2. **Browser visible for auth** - Required for Google login
3. **Follow-up questions** - Don't stop at first answer
4. **Rate limit:** 50 queries/day on free accounts

**References:** `references/notebooklm-api.md`, `references/notebooklm-troubleshooting.md`

---

## Scripts Overview

### Gemini API Scripts (in `scripts/`)

| Script | Purpose |
|--------|---------|
| `gemini_batch_process.py` | Batch process media files |
| `media_optimizer.py` | Prepare media for API limits |
| `document_converter.py` | Convert docs to PDF |

### NotebookLM Scripts (via `run.py`)

| Script | Purpose |
|--------|---------|
| `auth_manager.py` | Authentication management |
| `notebook_manager.py` | Library CRUD |
| `ask_question.py` | Query interface |
| `cleanup_manager.py` | Data cleanup |

---

## Cost Optimization

### Gemini API Pricing

| Model | Input | Output |
|-------|-------|--------|
| 2.5 Flash | $1.00/1M | $0.10/1M |
| 2.5 Pro | $3.00/1M | $12.00/1M |

### Token Rates

- Audio: 32 tokens/sec (1 min = 1,920 tokens)
- Video: ~300 tokens/sec
- PDF: 258 tokens/page
- Image: 258-1,548 tokens

### Best Practices

1. Use `gemini-2.5-flash` for most tasks
2. Use File API for files >20MB
3. Optimize media before upload
4. Process specific segments, not full videos

---

## Error Handling

| Error | Solution |
|-------|----------|
| 401 | Check API key |
| 429 | Rate limit - wait or use flash model |
| ModuleNotFoundError | Use `run.py` wrapper |
| Auth fails | Browser must be visible |

---

## References

### Gemini API
- `references/audio-processing.md`
- `references/vision-understanding.md`
- `references/video-analysis.md`
- `references/document-extraction.md`
- `references/image-generation.md`

### Gemini CLI
- `references/gemini-reference.md`
- `references/gemini-patterns.md`
- `references/gemini-templates.md`
- `references/gemini-tools.md`

### NotebookLM
- `references/notebooklm-api.md`
- `references/notebooklm-troubleshooting.md`
- `references/notebooklm-usage.md`

---

## Resources

- [Gemini API Key](https://aistudio.google.com/apikey)
- [Gemini API Docs](https://ai.google.dev/gemini-api)
- [NotebookLM](https://notebooklm.google.com)

Overview

This skill integrates Google AI tools—Gemini API, Gemini CLI, and NotebookLM—into a single workflow for multimodal processing, web-grounded research, and document Q&A. It handles audio, image, video, and PDF processing, plus image generation and Google-powered search/second opinions. Use it to transcribe, extract, analyze, generate, or query content with source grounding and current web context.

How this skill works

Gemini API performs multimodal tasks: transcription, OCR, video analysis, PDF extraction, and image generation using models like gemini-2.5-flash and gemini-2.5-pro. Gemini CLI orchestrates CLI-driven code review, Google Search grounding, and parallel tasks or second opinions. NotebookLM provides source-grounded Q&A over uploaded notebooks and documents, returning citations and follow-up context.

When to use it

  • Transcribing long audio or extracting captions and timestamps from video
  • Extracting structured data or tables from PDFs and documents
  • Analyzing images for descriptions, captions, or OCR
  • Generating images from prompts and iterating on visuals
  • Getting a second AI opinion on code, architecture, or research via Google Search grounding
  • Asking source-grounded questions over uploaded notebooks or corpora

Best practices

  • Prefer gemini-2.5-flash for cost-effective tasks and gemini-2.5-pro for highest quality
  • Optimize and trim media before upload; use File API for files >20MB
  • Process segments instead of entire long videos to save tokens and cost
  • Use the CLI for web-grounded queries and second opinions; use NotebookLM for document-sourced answers
  • Keep follow-up questions to refine NotebookLM answers and verify sources

Example use cases

  • Batch-transcribe a set of interviews, then extract speaker timestamps and summaries
  • Extract tables from a 200-page PDF as JSON for downstream analysis
  • Run an automated code review and request a second opinion with Google Search context
  • Generate concept art iterations with gemini-2.5-flash-image for creative briefs
  • Upload internal documentation to NotebookLM and run source-grounded Q&A for onboarding

FAQ

Which model should I pick for routine tasks?

Use gemini-2.5-flash for most routine multimodal tasks; switch to gemini-2.5-pro when you need maximal quality.

How do I handle large media files?

Optimize and compress media, process segments, and use the File API for uploads larger than 20MB.

Can I get up-to-date web info?

Yes—use the Gemini CLI with Google Search grounding to retrieve current information and a second opinion.