home / skills / zpankz / mcp-skillset / network-meta-analysis-appraisal

network-meta-analysis-appraisal skill

/network-meta-analysis-appraisal

This skill systematically appraises network meta-analysis papers using integrated checklists and automated extraction to deliver reproducible, high-quality

npx playbooks add skill zpankz/mcp-skillset --skill network-meta-analysis-appraisal

Review the files below or copy the command above to add this skill to your agents.

Files (15)
SKILL.md
10.2 KB
---
name: network-meta-analysis-appraisal
description: Systematically appraise network meta-analysis papers using integrated 200-point checklist (PRISMA-NMA, NICE DSU TSD 7, ISPOR-AMCP-NPC, CINeMA) with triple-validation methodology, automated PDF extraction, semantic evidence matching, and concordance analysis. Use when evaluating NMA quality for peer review, guideline development, HTA, or reimbursement decisions.
---

# Network Meta-Analysis Comprehensive Appraisal

## Overview

This skill enables systematic, reproducible appraisal of network meta-analysis (NMA) papers through:

1. **Automated PDF intelligence** - Extract text, tables, and statistical content from NMA PDFs
2. **Semantic evidence matching** - Map 200+ checklist criteria to PDF content using AI similarity
3. **Triple-validation methodology** - Two independent concurrent appraisals + meta-review consensus
4. **Comprehensive frameworks** - PRISMA-NMA, NICE DSU TSD 7, ISPOR-AMCP-NPC, CINeMA integration
5. **Professional reports** - Generate markdown checklists and structured YAML outputs

The skill transforms a complex, time-intensive manual process (~6-8 hours) into a systematic, partially-automated workflow (~2-3 hours).

## When to Use This Skill

Apply this skill when:
- Conducting peer review for journal submissions containing NMA
- Evaluating evidence for clinical guideline development
- Assessing NMA for health technology assessment (HTA)
- Reviewing NMA for reimbursement/formulary decisions
- Training on systematic NMA critical appraisal methodology
- Comparing Bayesian vs Frequentist NMA approaches

## Workflow: PDF to Appraisal Report

Follow this sequential 5-step workflow for comprehensive appraisal:

### Step 1: Setup & Prerequisites

**Install Required Libraries:**
```bash
cd scripts/
pip install -r requirements.txt

# Download semantic model (first time only)
python -c "from sentence_transformers import SentenceTransformer; SentenceTransformer('all-MiniLM-L6-v2')"
```

**Verify Checklist Availability:**
Confirm all 8 checklist sections are in `references/checklist_sections/`:
- SECTION I - STUDY RELEVANCE and APPLICABILITY.md
- SECTION II - REPORTING TRANSPARENCY and COMPLETENESS - PRISMA-NMA.md
- SECTION III - METHODOLOGICAL RIGOR - NICE DSU TSD 7.md
- SECTION IV - CREDIBILITY ASSESSMENT - ISPOR-AMCP-NPC.md
- SECTION V - CERTAINTY OF EVIDENCE - CINeMA Framework.md
- SECTION VI - SYNTHESIS and OVERALL JUDGMENT.md
- SECTION VII - APPRAISER INFORMATION.md
- SECTION VIII - APPENDICES.md

**Select Framework Scope:**
Choose based on appraisal purpose (see `references/frameworks_overview.md` for details):
- `comprehensive`: All 4 frameworks (~200 items, 4-6 hours)
- `reporting`: PRISMA-NMA only (~90 items, 2-3 hours)
- `methodology`: NICE + CINeMA (~30 items, 2-3 hours)
- `decision`: Relevance + ISPOR + CINeMA (~30 items, 2-3 hours)

### Step 2: Extract PDF Content

Run `pdf_intelligence.py` to extract structured content from the NMA paper:

```bash
python scripts/pdf_intelligence.py path/to/nma_paper.pdf --output pdf_extraction.json
```

**What This Does:**
- Extracts text with section detection (abstract, methods, results, discussion)
- Parses tables using multiple libraries (Camelot, pdfplumber)
- Extracts metadata (title, page count, etc.)
- Calculates extraction quality scores

**Outputs:**
- `pdf_extraction.json` - Structured PDF content for evidence matching

**Quality Check:**
- Verify `extraction_quality` scores ≥ 0.6 for text_coverage and sections_detected
- Low scores indicate poor PDF quality - may require manual supplementation

### Step 3: Match Evidence to Checklist Criteria

**Prepare Checklist Criteria JSON:**
Extract checklist items from markdown sections into machine-readable format:

```python
import json
from pathlib import Path

# Example: Extract criteria from Section II
criteria = []
section_file = Path("references/checklist_sections/SECTION II - REPORTING TRANSPARENCY and COMPLETENESS - PRISMA-NMA.md")
# Parse markdown table rows to extract item IDs and criteria text
# Format: [{"id": "4.1", "text": "Does the title identify the study as a systematic review and network meta-analysis?"},...]

Path("checklist_criteria.json").write_text(json.dumps(criteria, indent=2))
```

**Run Semantic Evidence Matching:**
```bash
python scripts/semantic_search.py pdf_extraction.json checklist_criteria.json --output evidence_matches.json
```

**What This Does:**
- Encodes each checklist criterion as semantic vector
- Searches PDF sections for matching paragraphs
- Calculates similarity scores (0.0-1.0)
- Assigns confidence levels (high/moderate/low/unable)

**Outputs:**
- `evidence_matches.json` - Evidence mapped to each criterion with confidence scores

### Step 4: Conduct Triple-Validation Appraisal

**Manual Appraisal with Evidence Support:**

For each checklist section:

1. Load evidence matches for that section's criteria
2. Review PDF content highlighted by semantic search
3. Apply triple-validation methodology (see `references/triple_validation_methodology.md`):

   **Appraiser #1 (Critical Reviewer)**:
   - Evidence threshold: 0.75 (high)
   - Stance: Skeptical, conservative
   - For each item: Assign rating (✓/⚠/✗/N/A) based on evidence quality

   **Appraiser #2 (Methodologist)**:
   - Evidence threshold: 0.70 (moderate)
   - Stance: Technical rigor emphasis
   - For each item: Assign rating independently

4. **Meta-Review Concordance Analysis:**
   - Compare ratings between appraisers
   - Calculate agreement levels (perfect/minor/major discordance)
   - Apply resolution strategy (evidence-weighted by default)
   - Flag major discordances for manual review

**Structure Appraisal Results:**
```json
{
  "pdf_metadata": {...},
  "appraisal": {
    "sections": [
      {
        "id": "section_ii",
        "name": "REPORTING TRANSPARENCY & COMPLETENESS",
        "items": [
          {
            "id": "4.1",
            "criterion": "Title identification...",
            "rating": "✓",
            "confidence": "high",
            "evidence": "The title explicitly states...",
            "source": "methods section",
            "appraiser_1_rating": "✓",
            "appraiser_2_rating": "✓",
            "concordance": "perfect"
          },
          ...
        ]
      },
      ...
    ]
  }
}
```

Save as `appraisal_results.json`.

### Step 5: Generate Reports

**Create Markdown and YAML Reports:**
```bash
python scripts/report_generator.py appraisal_results.json --format both --output-dir ./reports
```

**Outputs:**
- `reports/nma_appraisal_report.md` - Human-readable checklist with ratings, evidence, concordance
- `reports/nma_appraisal_report.yaml` - Machine-readable structured data

**Report Contents:**
- Executive summary with overall quality ratings
- Detailed checklist tables (all 8 sections)
- Concordance analysis summary
- Recommendations for decision-makers and authors
- Evidence citations and confidence scores

**Quality Validation:**
- Review major discordance items flagged in concordance analysis
- Verify evidence confidence ≥ moderate for ≥50% of items
- Check overall agreement rate ≥ 65%
- Manually review any critical items with low confidence

## Methodological Decision Points

### Bayesian vs Frequentist Detection

The skill automatically detects statistical approach by scanning for keywords:

**Bayesian Indicators**: MCMC, posterior, prior, credible interval, WinBUGS, JAGS, Stan, burn-in, convergence diagnostic
**Frequentist Indicators**: confidence interval, p-value, I², τ², netmeta, prediction interval

Apply appropriate checklist items based on detected approach:
- Item 18.3 (Bayesian specifications) - only if Bayesian detected
- Items on heterogeneity metrics (I², τ²) - primarily Frequentist
- Convergence diagnostics - only Bayesian

### Handling Missing Evidence

When semantic search returns low confidence (<0.45):

1. Manually search PDF for the criterion
2. Check supplementary materials (if accessible)
3. If truly absent, rate as ⚠ or ✗ depending on item criticality
4. Document "No evidence found in main text" in evidence field

### Resolution Strategy Selection

Choose concordance resolution strategy based on appraisal purpose:

- **Evidence-weighted** (default): Most objective, prefers stronger evidence
- **Conservative**: For high-stakes decisions (regulatory submissions)
- **Optimistic**: For formative assessments or educational purposes

See `references/triple_validation_methodology.md` for detailed guidance.

## Resources

### scripts/

Production-ready Python scripts for automated tasks:

- **pdf_intelligence.py** - Multi-library PDF extraction (PyMuPDF, pdfplumber, Camelot)
- **semantic_search.py** - AI-powered evidence-to-criterion matching
- **report_generator.py** - Markdown + YAML report generation
- **requirements.txt** - Python dependencies

**Usage:** Scripts can be run standalone via CLI or orchestrated programmatically.

### references/

Comprehensive documentation for appraisal methodology:

- **checklist_sections/** - All 8 integrated checklist sections (PRISMA/NICE/ISPOR/CINeMA)
- **frameworks_overview.md** - Framework selection guide, rating scales, key references
- **triple_validation_methodology.md** - Appraiser roles, concordance analysis, resolution strategies

**Usage:** Load relevant references when conducting specific appraisal steps or interpreting results.

## Best Practices

1. **Always run pdf_intelligence.py first** - Extraction quality affects all downstream steps
2. **Review low-confidence matches manually** - Semantic search is not perfect
3. **Document resolution rationale** - For major discordances, explain meta-review decision
4. **Maintain appraiser independence** - Conduct Appraiser #1 and #2 evaluations without cross-reference
5. **Validate critical items** - Manually verify evidence for high-impact methodological criteria
6. **Use appropriate framework scope** - Comprehensive for peer review, targeted for specific assessments

## Limitations

- **PDF quality dependent**: Poor scans or complex layouts reduce extraction accuracy
- **Semantic matching not perfect**: May miss evidence phrased in unexpected ways
- **No external validation**: Cannot verify PROSPERO registration or check author COI databases
- **Language**: Optimized for English-language papers
- **Human oversight required**: Final appraisal should be reviewed by domain expert

Overview

This skill systematically appraises network meta-analysis (NMA) papers using an integrated 200-point checklist drawn from PRISMA-NMA, NICE DSU TSD 7, ISPOR-AMCP-NPC, and CINeMA. It combines automated PDF extraction, AI semantic evidence matching, and a triple-validation appraisal workflow to produce reproducible, structured reports for decision-making. The output includes human-readable markdown and machine-readable YAML for integration into HTA or guideline processes.

How this skill works

The skill extracts structured text, tables, and metadata from NMA PDFs and scores extraction quality. It encodes checklist criteria as semantic vectors and matches them to PDF content, producing evidence snippets with similarity and confidence scores. Two independent appraisers rate each item and a meta-reviewer conducts concordance analysis using an evidence-weighted resolution (or alternate strategies) to generate final ratings and recommendations.

When to use it

  • Peer review of manuscripts reporting network meta-analyses
  • Developing clinical practice guidelines that incorporate NMA evidence
  • Health technology assessment (HTA) or reimbursement decision appraisals
  • Formulary or coverage decision reviews requiring structured NMA quality assessment
  • Training or calibration exercises for systematic NMA critical appraisal

Best practices

  • Run automated PDF extraction first and confirm extraction_quality ≥ 0.6 for reliable downstream matching
  • Select an appropriate framework scope (comprehensive, reporting, methodology, decision) based on appraisal purpose
  • Manually review matches with low confidence and check supplementary materials when needed
  • Keep Appraiser #1 and Appraiser #2 independent before meta-review to avoid bias
  • Document rationale for resolution decisions, especially for major discordances and high-stakes items
  • Validate critical methodological items (e.g., Bayesian convergence, heterogeneity metrics) manually even when automated evidence is available

Example use cases

  • Rapidly appraise an NMA submitted to a journal and produce reviewer-ready checklist and commentary
  • Provide structured evidence of NMA quality for an HTA dossier or reimbursement submission
  • Support guideline panels by summarizing certainty of NMA evidence and methodological concerns
  • Compare Bayesian versus Frequentist NMA implementations across a set of studies
  • Train junior reviewers in checklist-driven appraisal using dual-appraiser exercises and concordance reports

FAQ

How accurate is the semantic evidence matching?

Semantic matching flags relevant passages with similarity and confidence scores; it speeds review but is not perfect. Low-confidence matches (<0.45) require manual search and verification.

Can the tool handle Bayesian and Frequentist NMAs?

Yes. The skill detects statistical approach via keywords and applies relevant checklist items (e.g., Bayesian convergence diagnostics or Frequentist heterogeneity metrics) accordingly.