home / skills / krishagel / geoffrey / multi-model-research

multi-model-research skill

/skills/multi-model-research

This skill orchestrates multi-model research using a council pattern to synthesize insights from several frontier models for comprehensive analysis.

npx playbooks add skill krishagel/geoffrey --skill multi-model-research

Review the files below or copy the command above to add this skill to your agents.

Files (10)
SKILL.md
5.1 KB
---
name: multi-model-research
description: Orchestrate multiple frontier LLMs (Claude, GPT-5.1, Gemini 3.0 Pro, Perplexity Sonar, Grok 4.1) for comprehensive research using LLM Council pattern with peer review and synthesis
triggers:
  - "deep dive"
  - "research council"
  - "multi-model research"
  - "comprehensive research"
  - "council research"
allowed-tools: Bash, Read, mcp__obsidian-vault__create_vault_file
version: 0.1.0
---

# Multi-Model Research Agent

Implements Karpathy's LLM Council pattern for superior research through parallel queries, peer review, and chairman synthesis.

## Architecture

**Geoffrey/Claude (Native Council Member):**
- Routes simple vs complex queries
- Calls external API orchestrator (`research.py`)
- Provides my own research response
- Conducts peer review phase
- Requests GPT-5.1 synthesis (chairman)
- Saves final report to Obsidian

**Python External API Orchestrator:**
- Fetches responses from GPT-5.1, Gemini 3.0 Pro, Perplexity Sonar, Grok 4.1
- Returns JSON with all external responses
- I handle all orchestration and synthesis

## When to Use This Skill

Use multi-model research when:
- **Complex analysis needed** - Multiple perspectives valuable
- **Factual verification critical** - Cross-model validation
- **Comprehensive coverage required** - No single model sufficient
- **Current information essential** - Perplexity provides web grounding
- **Contested topics** - Benefit from diverse model perspectives

## Simple vs Council Mode

**Simple Mode** (Perplexity only):
- Factual lookups
- Current events
- Quick research with citations
- Completes in <15 seconds

**Council Mode** (Full council):
- Comparative analysis
- Deep research
- Multiple perspectives needed
- Strategic questions
- Completes in <90 seconds

## Workflow

### Simple Query

```
User: "What are the latest developments in quantum computing?"
     ↓
I decide: Simple query (factual, current)
     ↓
I call: uv run scripts/research.py --query "..." --models perplexity
     ↓
I read: JSON response from Perplexity
     ↓
I format: Markdown report with citations
     ↓
I save: To Obsidian Geoffrey/Research folder
     ↓
I return: Summary to user with Obsidian link
```

### Council Query

```
User: "Compare the AI strategies of OpenAI, Anthropic, and Google"
     ↓
I decide: Council query (comparative, complex)
     ↓
I call: uv run scripts/research.py --query "..." --models gpt,gemini,perplexity,grok
     ↓
I read: JSON with all external responses
     ↓
I provide: My own (Claude) research response
     ↓
I conduct: Peer review (each model ranks others)
     ↓
I request: GPT-5.1 chairman synthesis
     ↓
I format: Comprehensive markdown report
     ↓
I save: To Obsidian Geoffrey/Research folder
     ↓
I return: Summary with Obsidian link
```

## Output Format

All research reports saved to Obsidian include:

- **Executive Summary** (2-3 paragraphs)
- **Key Findings** (organized by theme, inline citations)
- **Confidence Assessment** (what's certain vs debated)
- **References Section** (all sources with URLs and dates)

Citations use numeric format: [1], [2], etc.

## Technical Details

**Python Script:**
```bash
cd skills/multi-model-research
uv run scripts/research.py --query "Your question" --models perplexity --output /tmp/responses.json
```

**Config:**
- `config.yaml` - Model settings, routing rules
- `prompts/system_prompts.yaml` - Per-model system prompts
- `prompts/peer_review.md` - Peer review template
- `prompts/chairman_synthesis.md` - GPT-5.1 synthesis template

**Dependencies:**
- httpx (async HTTP client)
- pyyaml (config parsing)
- python-dotenv (env vars)
- python-frontmatter (Obsidian frontmatter)

**API Keys Required:**
- OPENAI_API_KEY (GPT-5.1)
- GEMINI_API_KEY (Gemini 3.0 Pro)
- PERPLEXITY_API_KEY (Sonar Pro)
- XAI_API_KEY (Grok 4.1)

All keys configured in `~/.env` file.

## Examples

**Simple Research:**
```
User: "What is RAG in AI?"

I route to: Simple mode (Perplexity)
Output: Concise explanation with current examples and citations
Time: ~10 seconds
```

**Council Research:**
```
User: "Compare serverless vs containers for production ML workloads"

I route to: Council mode (all 4 external + me)
Process:
  1. GPT-5.1: Provides comprehensive technical comparison
  2. Gemini 3.0: Analyzes cost and performance trade-offs
  3. Perplexity: Current industry trends and case studies
  4. Grok 4.1: Developer sentiment from X/Twitter
  5. Claude (me): Synthesize with nuanced analysis
  6. Peer review: Each model ranks others
  7. GPT-5.1 (chairman): Final synthesis

Output: Multi-perspective analysis with citations
Time: ~60 seconds
```

## Limitations

- **Cost**: Council mode uses 4-5 API calls per query
- **Latency**: Council mode takes 60-90 seconds
- **API Limits**: Rate limits may throttle parallel requests
- **Citation Quality**: Non-Perplexity models require URL extraction

## Future Enhancements

- Streaming responses during deliberation
- Cost tracking and budget limits
- Query history and versioning
- Custom model weights based on topic
- Integration with Geoffrey's knowledge base

---

*This skill implements Karpathy's LLM Council pattern released November 22, 2025.*

Overview

This skill orchestrates multiple frontier LLMs (Claude, GPT-5.1, Gemini 3.0 Pro, Perplexity Sonar, Grok 4.1) to deliver rigorous, multi-perspective research using an LLM Council pattern. It runs parallel inquiries, performs model peer review, and requests a dedicated chairman synthesis to produce a single coherent report saved to an Obsidian vault.

How this skill works

A Python orchestrator queries selected external models in parallel and returns structured JSON responses. Claude performs a native analysis, runs a peer-review phase where models evaluate each other, and asks GPT-5.1 to act as chairman for final synthesis. The final output is formatted into an executive summary, key findings, confidence assessment, and references, then saved to Obsidian.

When to use it

  • Deep comparative analysis across models
  • High-stakes or contested topics requiring cross-checks
  • Research that needs current web-grounded facts
  • Comprehensive reports with citations and confidence levels
  • Situations where diverse model perspectives reduce bias

Best practices

  • Start with a clear query and indicate desired depth (simple vs council).
  • Use Simple Mode (Perplexity only) for quick factual lookups and current events.
  • Use Council Mode for strategic, technical, or multi-angle research despite higher cost and latency.
  • Limit council runs to questions that benefit from multiple perspectives to control expense.
  • Review the confidence assessment and source URLs before acting on contested conclusions.

Example use cases

  • Compare corporate AI strategies, synthesizing technical, business, and public sentiment perspectives.
  • Produce a literature-style report on emerging technology trends with web-grounded citations.
  • Validate contested claims by cross-checking model outputs and highlighting disagreements.
  • Generate a strategic briefing for executives with an executive summary, key findings, and confidence levels.
  • Create a developer-facing comparison of architectures (e.g., serverless vs containers) with pros/cons and references.

FAQ

What is the difference between Simple and Council modes?

Simple Mode queries Perplexity for fast, web-grounded lookups. Council Mode queries multiple models in parallel, conducts peer review, and requests a chairman synthesis for deeper comparative research.

How long do reports take and what are the costs?

Simple queries typically complete in under 15 seconds and are low cost. Council queries usually complete in 60–90 seconds and incur multiple API calls, so expect higher cost and potential rate-limit constraints.