home / skills / krishagel / geoffrey / multi-model-research
This skill orchestrates multi-model research using a council pattern to synthesize insights from several frontier models for comprehensive analysis.
npx playbooks add skill krishagel/geoffrey --skill multi-model-researchReview the files below or copy the command above to add this skill to your agents.
---
name: multi-model-research
description: Orchestrate multiple frontier LLMs (Claude, GPT-5.1, Gemini 3.0 Pro, Perplexity Sonar, Grok 4.1) for comprehensive research using LLM Council pattern with peer review and synthesis
triggers:
- "deep dive"
- "research council"
- "multi-model research"
- "comprehensive research"
- "council research"
allowed-tools: Bash, Read, mcp__obsidian-vault__create_vault_file
version: 0.1.0
---
# Multi-Model Research Agent
Implements Karpathy's LLM Council pattern for superior research through parallel queries, peer review, and chairman synthesis.
## Architecture
**Geoffrey/Claude (Native Council Member):**
- Routes simple vs complex queries
- Calls external API orchestrator (`research.py`)
- Provides my own research response
- Conducts peer review phase
- Requests GPT-5.1 synthesis (chairman)
- Saves final report to Obsidian
**Python External API Orchestrator:**
- Fetches responses from GPT-5.1, Gemini 3.0 Pro, Perplexity Sonar, Grok 4.1
- Returns JSON with all external responses
- I handle all orchestration and synthesis
## When to Use This Skill
Use multi-model research when:
- **Complex analysis needed** - Multiple perspectives valuable
- **Factual verification critical** - Cross-model validation
- **Comprehensive coverage required** - No single model sufficient
- **Current information essential** - Perplexity provides web grounding
- **Contested topics** - Benefit from diverse model perspectives
## Simple vs Council Mode
**Simple Mode** (Perplexity only):
- Factual lookups
- Current events
- Quick research with citations
- Completes in <15 seconds
**Council Mode** (Full council):
- Comparative analysis
- Deep research
- Multiple perspectives needed
- Strategic questions
- Completes in <90 seconds
## Workflow
### Simple Query
```
User: "What are the latest developments in quantum computing?"
↓
I decide: Simple query (factual, current)
↓
I call: uv run scripts/research.py --query "..." --models perplexity
↓
I read: JSON response from Perplexity
↓
I format: Markdown report with citations
↓
I save: To Obsidian Geoffrey/Research folder
↓
I return: Summary to user with Obsidian link
```
### Council Query
```
User: "Compare the AI strategies of OpenAI, Anthropic, and Google"
↓
I decide: Council query (comparative, complex)
↓
I call: uv run scripts/research.py --query "..." --models gpt,gemini,perplexity,grok
↓
I read: JSON with all external responses
↓
I provide: My own (Claude) research response
↓
I conduct: Peer review (each model ranks others)
↓
I request: GPT-5.1 chairman synthesis
↓
I format: Comprehensive markdown report
↓
I save: To Obsidian Geoffrey/Research folder
↓
I return: Summary with Obsidian link
```
## Output Format
All research reports saved to Obsidian include:
- **Executive Summary** (2-3 paragraphs)
- **Key Findings** (organized by theme, inline citations)
- **Confidence Assessment** (what's certain vs debated)
- **References Section** (all sources with URLs and dates)
Citations use numeric format: [1], [2], etc.
## Technical Details
**Python Script:**
```bash
cd skills/multi-model-research
uv run scripts/research.py --query "Your question" --models perplexity --output /tmp/responses.json
```
**Config:**
- `config.yaml` - Model settings, routing rules
- `prompts/system_prompts.yaml` - Per-model system prompts
- `prompts/peer_review.md` - Peer review template
- `prompts/chairman_synthesis.md` - GPT-5.1 synthesis template
**Dependencies:**
- httpx (async HTTP client)
- pyyaml (config parsing)
- python-dotenv (env vars)
- python-frontmatter (Obsidian frontmatter)
**API Keys Required:**
- OPENAI_API_KEY (GPT-5.1)
- GEMINI_API_KEY (Gemini 3.0 Pro)
- PERPLEXITY_API_KEY (Sonar Pro)
- XAI_API_KEY (Grok 4.1)
All keys configured in `~/.env` file.
## Examples
**Simple Research:**
```
User: "What is RAG in AI?"
I route to: Simple mode (Perplexity)
Output: Concise explanation with current examples and citations
Time: ~10 seconds
```
**Council Research:**
```
User: "Compare serverless vs containers for production ML workloads"
I route to: Council mode (all 4 external + me)
Process:
1. GPT-5.1: Provides comprehensive technical comparison
2. Gemini 3.0: Analyzes cost and performance trade-offs
3. Perplexity: Current industry trends and case studies
4. Grok 4.1: Developer sentiment from X/Twitter
5. Claude (me): Synthesize with nuanced analysis
6. Peer review: Each model ranks others
7. GPT-5.1 (chairman): Final synthesis
Output: Multi-perspective analysis with citations
Time: ~60 seconds
```
## Limitations
- **Cost**: Council mode uses 4-5 API calls per query
- **Latency**: Council mode takes 60-90 seconds
- **API Limits**: Rate limits may throttle parallel requests
- **Citation Quality**: Non-Perplexity models require URL extraction
## Future Enhancements
- Streaming responses during deliberation
- Cost tracking and budget limits
- Query history and versioning
- Custom model weights based on topic
- Integration with Geoffrey's knowledge base
---
*This skill implements Karpathy's LLM Council pattern released November 22, 2025.*
This skill orchestrates multiple frontier LLMs (Claude, GPT-5.1, Gemini 3.0 Pro, Perplexity Sonar, Grok 4.1) to deliver rigorous, multi-perspective research using an LLM Council pattern. It runs parallel inquiries, performs model peer review, and requests a dedicated chairman synthesis to produce a single coherent report saved to an Obsidian vault.
A Python orchestrator queries selected external models in parallel and returns structured JSON responses. Claude performs a native analysis, runs a peer-review phase where models evaluate each other, and asks GPT-5.1 to act as chairman for final synthesis. The final output is formatted into an executive summary, key findings, confidence assessment, and references, then saved to Obsidian.
What is the difference between Simple and Council modes?
Simple Mode queries Perplexity for fast, web-grounded lookups. Council Mode queries multiple models in parallel, conducts peer review, and requests a chairman synthesis for deeper comparative research.
How long do reports take and what are the costs?
Simple queries typically complete in under 15 seconds and are low cost. Council queries usually complete in 60–90 seconds and incur multiple API calls, so expect higher cost and potential rate-limit constraints.