home / skills / kthorn / research-superpower / getting-started

getting-started skill

/skills/getting-started

This skill guides you through systematic literature search and review, enabling targeted screening, data extraction, and citation traversal to support

npx playbooks add skill kthorn/research-superpower --skill getting-started

Review the files below or copy the command above to add this skill to your agents.

Files (1)
SKILL.md
4.8 KB
---
name: Getting Started with Research Superpowers
description: Introduction to literature search & review skills - systematic paper finding, screening, extraction, and citation traversal
when_to_use: At start of each Claude Code session. When user asks literature search questions. When searching scientific literature. When reviewing papers or citations.
version: 1.1.0
---

# Getting Started with Research Superpowers

Research Superpowers gives Claude Code systematic workflows for **literature searching and review**.

**Focus:** Finding, screening, and extracting data from published papers. NOT for analyzing experimental data or designing experiments.

## What You Can Do

Use these skills for **systematic literature reviews**:

- **Search literature** - PubMed and Semantic Scholar integration
- **Build screening rubrics** - Define and test relevance criteria collaboratively
- **Screen papers** - Two-stage screening (abstract → deep dive) with scoring
- **Extract data** - Find specific methods, results, measurements from papers
- **Traverse citations** - Smart backward/forward citation following
- **Large-scale screening** - Parallel subagent processing for 50+ papers
- **Track findings** - Organized research sessions with summaries, PDFs, and deduplication

## Available Skills

**Literature Search & Review Skills** (`skills/research/`)
- **answering-research-questions** - Main orchestration workflow (search → screen → extract → synthesize)
- **building-screening-rubrics** - Collaborative rubric design with test-driven refinement
- **searching-literature** - PubMed search with keyword optimization
- **evaluating-paper-relevance** - Two-stage screening (abstract → deep dive)
- **subagent-driven-review** - Parallel screening for large searches (50+ papers)
- **checking-chembl** - Check if medicinal chemistry papers have curated SAR data in ChEMBL
- **traversing-citations** - Semantic Scholar citation network traversal
- **finding-open-access-papers** - Unpaywall API to find free versions of paywalled papers
- **cleaning-up-research-sessions** - Safe cleanup of intermediate files after research complete

## Basic Workflow

When user asks a **literature search question**:

1. **Read answering-research-questions skill** - Main orchestration
2. **Announce**: "I'm using the Answering Research Questions skill"
3. **Parse query** - Extract keywords, data types, constraints
4. **Create research folder** - Propose name, initialize tracking
5. **Optional: Build rubric** - For large searches (50+ papers), use building-screening-rubrics skill
6. **Search → Screen → Extract → Traverse** - Follow the workflow
7. **Check in regularly** - Every 10 papers, checkpoint every 50

## Research Session Folders

Each query creates a folder in `research-sessions/`:

```
research-sessions/YYYY-MM-DD-query-description/
├── SUMMARY.md              # Main findings
├── papers-reviewed.json    # Deduplication tracking (DOI → status)
├── papers/                 # Downloaded PDFs and supplementary data
└── citations/              # Citation graph tracking
```

## Core Principles

For **systematic literature review**:

- **Precision over breadth** - Find papers with specific data you need, not just topical matches
- **Test-driven screening** - Build and validate rubrics before bulk processing
- **Smart citation following** - Only traverse relevant citations to avoid exponential explosion
- **Deduplicate aggressively** - Track ALL reviewed papers by DOI (even non-relevant)
- **Cache abstracts** - Save for re-screening when rubrics change
- **Report progress** - Update user every 10 papers as work proceeds
- **Checkpoint frequently** - Ask to continue or stop every 50 papers
- **Reproducible** - Save rubrics, queries, and methodology with research sessions

## API Information

**PubMed E-utilities** (no key required):
- Search: `https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi`
- Details: `https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi`
- Full text: `https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi`

**Semantic Scholar** (free tier works, optional key for higher limits):
- Paper: `https://api.semanticscholar.org/graph/v1/paper/DOI:{doi}`
- References: `https://api.semanticscholar.org/graph/v1/paper/{id}/references`
- Citations: `https://api.semanticscholar.org/graph/v1/paper/{id}/citations`

## Finding Skills

Use the find-skills script to search for relevant skills:

```bash
# From project directory
./scripts/find-skills              # List all skills
./scripts/find-skills literature   # Search for "literature"
./scripts/find-skills 'cite|ref'   # Regex search
```

## Remember

- **Always start** by reading the relevant research skill
- **Announce skill usage** when you begin
- **Track everything** in the research folder
- **Check in with user** regularly during long searches
- **Deduplicate** using papers-reviewed.json (DOI as key)

Overview

This skill introduces systematic literature search and review workflows for finding, screening, extracting, and following citations in published papers. It focuses on reproducible, test-driven methods to locate papers with the exact data you need rather than broad topical discovery. The skill coordinates searches, screening rubrics, data extraction, and organized research sessions.

How this skill works

The workflow orchestrates search → screen → extract → traverse: run targeted searches (PubMed, Semantic Scholar), build and validate screening rubrics, perform two-stage screening (abstract then full text), extract methods/results, and follow citations selectively. It manages research session folders with summaries, deduplication tracking by DOI, cached abstracts, and checkpointed progress updates during large reviews.

When to use it

  • Conducting a systematic literature review focused on extracting specific measurements, methods, or results
  • Running large-scale screening of 50+ papers using parallel subagents
  • Building and validating screening criteria before bulk screening
  • Tracing backward/forward citations to expand a focused evidence base
  • Finding open-access copies of paywalled papers for extraction and archiving

Best practices

  • Prioritize precision: craft queries to find papers likely to contain the exact data you need
  • Design and test screening rubrics on a small set before bulk processing
  • Deduplicate by DOI at the start and log all reviewed papers in papers-reviewed.json
  • Cache abstracts and saved metadata to allow re-screening when rubrics change
  • Checkpoint progress regularly (report every 10 papers, ask to continue every 50)

Example use cases

  • Extracting numeric outcome measures and methods from clinical trial reports for a meta-analysis
  • Screening thousands of abstracts with parallel subagents, then deep-diving on selected papers
  • Building a validated rubric to identify studies that report a specific assay or measurement
  • Following key citations from a seminal paper to build a focused citation network
  • Finding free PDFs of paywalled articles via Unpaywall before extracting supplementary data

FAQ

Which data sources are used?

Primary integrations are PubMed (E-utilities) and Semantic Scholar; Unpaywall is used to locate open-access PDFs.

How are duplicates handled?

All papers are deduplicated by DOI and tracked in a papers-reviewed.json file to prevent reprocessing.

When should I build a screening rubric?

Create and test rubrics before bulk screening, especially for searches that will return 50+ papers.