home / skills / openclaw / skills / gemini-reader
/skills/shigo-45/gemini-reader
This skill analyzes local PDF, video, and audio files via Gemini API to read, summarize, or transcribe content.
npx playbooks add skill openclaw/skills --skill gemini-readerReview the files below or copy the command above to add this skill to your agents.
---
name: gemini-reader
description: Understand local non-text files (PDF, video, audio) using Gemini API. Use when the user asks to read, summarize, or analyze a PDF document, video file (mp4/mov/webm), or audio file (mp3/wav/m4a/ogg), including audio transcription. NOT for images — the main model already has vision capabilities, prefer using it directly for image understanding.
metadata:
{
"openclaw":
{
"emoji": "📄",
"requires": { "env": ["GEMINI_API_KEY"], "pip": ["google-genai"] },
},
}
---
# Gemini Reader
Analyze local PDF, video, and audio files via Gemini API (Python SDK `google-genai`).
## Prerequisites
- `google-genai` Python package installed (`pip install google-genai`)
- `GEMINI_API_KEY` environment variable set
- Supported: PDF, video (mp4/webm/mov/avi/mkv), audio (mp3/wav/m4a/ogg)
## Usage
```bash
python3 scripts/gemini_read.py <file> "<prompt>" [--model MODEL] [--output PATH]
```
### Examples
```bash
# Summarize a PDF
python3 scripts/gemini_read.py paper.pdf "Summarize the key findings of this paper"
# Analyze a video
python3 scripts/gemini_read.py lecture.mp4 "List the main topics covered in this video"
# Transcribe audio
python3 scripts/gemini_read.py recording.m4a "Transcribe this audio verbatim"
# Save output to file
python3 scripts/gemini_read.py report.pdf "Extract all data tables" --output tables.txt
```
### Model selection
| Alias | Full name | Best for |
|-------|-----------|----------|
| `3-flash` (default) | gemini-3-flash-preview | Fast, cheap, everyday use |
| `2.5-flash` | gemini-2.5-flash | Stable, good balance |
| `2.5-pro` | gemini-2.5-pro | Deep analysis, long docs |
| `3-pro` | gemini-3-pro-preview | Advanced reasoning |
| `3.1-pro` | gemini-3.1-pro-preview | Latest pro capabilities |
Use alias with `-m`: `gemini_read.py file.pdf "prompt" -m 2.5-pro`
## Notes
- Files are uploaded to Google's Gemini API for processing and deleted after use. Do not use with confidential or sensitive files.
- The script enforces a file extension whitelist (PDF/video/audio only), blocks known sensitive paths, and rejects symlinks.
- All files go through File Upload API (upload -> generate -> cleanup), unified flow regardless of size
- For files on remote nodes (e.g. Mac), transfer to VM first using Tailscale or scp
- The script auto-detects MIME type from file extension
- API calls are direct — no sandbox restrictions, no CLI overhead
- Requires `GEMINI_API_KEY` env var or `google-genai` configured auth
This skill helps you read, summarize, transcribe, and analyze local PDF, video, and audio files using the Gemini API via the google-genai Python SDK. It supports common formats (PDF, mp4/webm/mov/avi/mkv, mp3/wav/m4a/ogg) and is tuned for fast, practical extraction and analysis. Use it to convert non-text media into concise, actionable text outputs.
The tool uploads a local file to the Gemini File Upload API, issues a generation request with your prompt and chosen model alias, retrieves the generated text, and cleans up the temporary upload. It auto-detects MIME type from the file extension and supports model selection for speed or deeper analysis. Typical flows include summarization, transcription, table extraction, and key-point listing.
Which file formats are supported?
PDF, video (mp4, webm, mov, avi, mkv) and audio (mp3, wav, m4a, ogg) are supported; images are not the target for this skill.
How do I change models?
Use the model alias flag (e.g., -m 2.5-pro). Aliases map to Gemini models for speed vs. deeper reasoning.
Do I need an API key?
Yes. Set GEMINI_API_KEY in your environment or configure authentication for the google-genai SDK.