home / skills / petekp / claude-code-setup / multi-model-meta-analysis

multi-model-meta-analysis skill

safe

This skill synthesizes multi-model outputs into a verified assessment by cross-checking claims against the codebase and resolving conflicts with evidence.

npx playbooks add skill petekp/claude-code-setup --skill multi-model-meta-analysis

Review the files below or copy the command above to add this skill to your agents.

Files (1)

SKILL.md

4.6 KB

---
name: multi-model-meta-analysis
description: |
  Synthesize outputs from multiple AI models into a comprehensive, verified assessment. Use when: (1) User pastes feedback/analysis from multiple LLMs (Claude, GPT, Gemini, etc.) about code or a project, (2) User wants to consolidate model outputs into a single reliable document, (3) User needs conflicting model claims resolved against actual source code. This skill verifies model claims against the codebase, resolves contradictions with evidence, and produces a more reliable assessment than any single model.
---

# Multi-Model Synthesis

Combine outputs from multiple AI models into a verified, comprehensive assessment by cross-referencing claims against the actual codebase.

## Core Principle

Models hallucinate and contradict each other. The source code is the source of truth. Every significant claim must be verified before inclusion in the final assessment.

## Process

### 1. Extract Claims

Parse each model's output and extract discrete claims:
- Factual assertions about the code ("function X does Y", "there's no error handling in Z")
- Recommendations ("should add validation", "refactor this pattern")
- Identified issues ("bug in line N", "security vulnerability")

Tag each claim with its source model.

### 2. Deduplicate

Group semantically equivalent claims:
- "Lacks input validation" = "No sanitization" = "User input not checked"
- "Should use async/await" = "Convert to promises" = "Make asynchronous"

Create canonical phrasing. Track which models mentioned each.

### 3. Verify Against Source

For each factual claim or identified issue:

```
CLAIM: "The auth middleware doesn't check token expiry"
VERIFY: Read the auth middleware file
FINDING: [Confirmed | Refuted | Partially true | Cannot verify]
EVIDENCE: [Quote relevant code or explain why claim is wrong]
```

Use Grep, Glob, and Read tools to locate and examine relevant code. Do not trust model claims without verification.

### 4. Resolve Conflicts

When models contradict each other:

1. Identify the specific disagreement
2. Examine the actual code
3. Determine which model (if any) is correct
4. Document the resolution with evidence

```
CONFLICT: Model A says "uses SHA-256", Model B says "uses MD5"
INVESTIGATION: Read crypto.js lines 45-60
RESOLUTION: Model B is correct - line 52 shows MD5 usage
EVIDENCE: `const hash = crypto.createHash('md5')`
```

### 5. Synthesize Assessment

Produce a final document that:
- States verified facts (not model opinions)
- Cites evidence for significant claims
- Notes where verification wasn't possible
- Preserves valuable insights that don't require verification (e.g., design suggestions)

## Output Format

```markdown
# Synthesized Assessment: [Topic]

## Summary
[2-3 sentences describing the verified findings]

## Verified Findings

### Confirmed Issues
| Issue | Severity | Evidence | Models |
|-------|----------|----------|--------|
| [Issue] | High/Med/Low | [file:line or quote] | Claude, GPT |

### Refuted Claims
| Claim | Source | Reality |
|-------|--------|---------|
| [What model said] | GPT-4 | [What code actually shows] |

### Unverifiable Claims
| Claim | Source | Why Unverifiable |
|-------|--------|------------------|
| [Claim] | Claude | [Requires runtime testing / external system / etc.] |

## Consensus Recommendations
[Items where 2+ models agree AND verification supports the suggestion]

## Unique Insights Worth Considering
[Valuable suggestions from single models that weren't contradicted]

## Conflicts Resolved
| Topic | Model A | Model B | Verdict | Evidence |
|-------|---------|---------|---------|----------|
| [Topic] | [Position] | [Position] | [Which is correct] | [Code reference] |

## Action Items

### Critical (Verified, High Impact)
- [ ] [Item] — Evidence: [file:line]

### Important (Verified, Medium Impact)
- [ ] [Item] — Evidence: [file:line]

### Suggested (Unverified but Reasonable)
- [ ] [Item] — Source: [Models]
```

## Verification Guidelines

**Always verify:**
- Bug reports and security issues
- Claims about what code does or doesn't do
- Assertions about missing functionality
- Performance or complexity claims

**Trust but note source:**
- Style and readability suggestions
- Architectural recommendations
- Best practice suggestions

**Mark as unverifiable:**
- Runtime behavior claims (without tests)
- Performance benchmarks (without profiling)
- External API behavior
- User experience claims

## Anti-Patterns

- Blindly merging model outputs without checking code
- Treating model consensus as proof (all models can be wrong)
- Omitting refuted claims (document what was wrong - it's valuable)
- Skipping verification because claims "sound right"

Overview

This skill synthesizes outputs from multiple AI models into a single, evidence-backed assessment of a codebase or project. It verifies each model claim against the actual source, resolves contradictions with concrete evidence, and produces a reliable, consolidated report. Use it to reduce hallucinations and produce actionable, auditable findings.

How this skill works

The skill parses each model's output to extract discrete claims, recommendations, and identified issues, and tags each claim with its source. It deduplicates semantically equivalent statements, then verifies factual claims and bug/security reports by reading the relevant files. Contradictions are resolved by inspecting the source code and documenting which model (if any) was correct plus supporting evidence. Finally, it synthesizes a structured assessment that separates confirmed findings, refuted claims, unverifiable items, consensus recommendations, and action items.

When to use it

You have feedback or analyses from multiple LLMs (Claude, GPT, Gemini, etc.) about a codebase.
You need a single, reliable document consolidating model outputs for engineering or review.
Models disagree about behavior or vulnerabilities and you must resolve contradictions against source code.
You want to convert noisy model suggestions into verified, prioritized action items.
You need an audit-ready artifact that cites code evidence for claims and fixes.

Best practices

Provide raw model outputs and point to the repository root or file paths for fast verification.
Focus verification on factual claims, bugs, and security issues; treat style suggestions as lower priority for strict verification.
Use Grep/Glob to narrow candidate files before reading lines to minimize false negatives.
Document evidence with file:line references or short quoted snippets; mark runtime or external-system claims as unverifiable.
Keep canonical phrasing for deduplicated claims and track which models mentioned each to show consensus or uniqueness.

Example use cases

Merge code review comments from multiple LLM assistants into a single verified report for a pull request.
Resolve conflicting claims about a security vulnerability by inspecting the relevant crypto/auth modules and citing exact lines.
Consolidate different refactor suggestions into prioritized action items with verification of affected code paths.
Produce an audit summary showing which model claims were confirmed, refuted, or unverifiable for compliance reviewers.
Turn multi-model static-analysis outputs into a reproducible remediation checklist with evidence links.

FAQ

What kinds of claims does the skill always verify?

It always verifies factual assertions about code behavior, bug/security reports, missing functionality, and claims about specific files or lines.

When is a claim marked unverifiable?

Claims requiring runtime testing, profiling, external API behavior, or unavailable environment context are marked unverifiable and documented as such.