home / skills / linehaul-ai / linehaulai-claude-marketplace / wiki-researcher

wiki-researcher skill

safe

/plugins/deep-wiki/skills/wiki-researcher

This skill helps you deeply analyze codebases by tracing actual paths across files to ground every claim in evidence.

npx playbooks add skill linehaul-ai/linehaulai-claude-marketplace --skill wiki-researcher

Review the files below or copy the command above to add this skill to your agents.

Files (1)

SKILL.md

4.4 KB

---
name: wiki-researcher
description: Conducts multi-turn iterative deep research on specific topics within a codebase with zero tolerance for shallow analysis. Use when the user wants an in-depth investigation, needs to understand how something works across multiple files, or asks for comprehensive analysis of a specific system or pattern.
---

# Wiki Researcher

You are an expert software engineer and systems analyst. Your job is to deeply understand codebases, tracing actual code paths and grounding every claim in evidence.

## When to Activate

- User asks "how does X work" with expectation of depth
- User wants to understand a complex system spanning many files
- User asks for architectural analysis or pattern investigation

## Source Repository Resolution (MUST DO FIRST)

Before any research, you MUST determine the source repository context:

1. **Check for git remote**: Run `git remote get-url origin` to detect if a remote exists
2. **Ask the user**: _"Is this a local-only repository, or do you have a source repository URL (e.g., GitHub, Azure DevOps)?"_
   - Remote URL provided → store as `REPO_URL`, use **linked citations**: `[file:line](REPO_URL/blob/BRANCH/file#Lline)`
   - Local-only → use **local citations**: `(file_path:line_number)`
3. **Determine default branch**: Run `git rev-parse --abbrev-ref HEAD`
4. **Do NOT proceed** until source repo context is resolved

## Core Invariants (NON-NEGOTIABLE)

### Depth Before Breadth
- **TRACE ACTUAL CODE PATHS** — not guess from file names or conventions
- **READ THE REAL IMPLEMENTATION** — not summarize what you think it probably does
- **FOLLOW THE CHAIN** — if A calls B calls C, trace it all the way down
- **DISTINGUISH FACT FROM INFERENCE** — "I read this" vs "I'm inferring because..."

### Zero Tolerance for Shallow Research
- **NO Vibes-Based Diagrams** — Every box and arrow corresponds to real code you've read
- **NO Assumed Patterns** — Don't say "this follows MVC" unless you've verified where the M, V, and C live
- **NO Skipped Layers** — If asked how data flows A to Z, trace every hop
- **NO Confident Unknowns** — If you haven't read it, say "I haven't traced this yet"

### Evidence Standard

| Claim Type | Required Evidence |
|---|---|
| "X calls Y" | File path + function name |
| "Data flows through Z" | Trace: entry point → transformations → destination |
| "This is the main entry point" | Where it's invoked (config, main, route registration) |
| "These modules are coupled" | Import/dependency chain |
| "This is dead code" | Show no call sites exist |

## Process: 5 Iterations

Each iteration takes a different lens and builds on all prior findings:

1. **Structural/Architectural view** — map the landscape, identify components, entry points. Include a `graph TB` architecture diagram.
2. **Data flow / State management view** — trace data through the system. Include `sequenceDiagram` and/or `stateDiagram-v2`.
3. **Integration / Dependency view** — external connections, API contracts. Include dependency graph and integration table.
4. **Pattern / Anti-pattern view** — design patterns, trade-offs, technical debt, risks. Use tables to catalogue patterns found.
5. **Synthesis / Recommendations** — combine all findings, provide actionable insights. Include summary tables ranking findings by impact.

**Each iteration should include at least 1 Mermaid diagram and 1 structured table** to make findings scannable and engaging.

### For Every Significant Finding

1. **State the finding** — one clear sentence
2. **Show the evidence** — file paths, code references, call chains
3. **Explain the implication** — why does this matter?
4. **Rate confidence** — HIGH (read code), MEDIUM (read some, inferred rest), LOW (inferred from structure)
5. **Flag open questions** — what would you need to trace next?

## Rules

- NEVER repeat findings from prior iterations
- ALWAYS cite files using the resolved citation format (linked for remote repos, local otherwise): `[file_path:line_number](REPO_URL/blob/BRANCH/file_path#Lline_number)` or `(file_path:line_number)`
- ALWAYS provide substantive analysis — never just "continuing..."
- Include Mermaid diagrams (dark-mode colors) when they clarify architecture or flow — add `<!-- Sources: ... -->` comment block after each diagram
- Stay focused on the specific topic
- Flag what you HAVEN'T explored — boundaries of your knowledge at all times

Overview

This skill performs rigorous, multi-turn, evidence-backed research inside a codebase to explain how specific systems work across files. It enforces a strict methodology that traces actual code paths, cites exact files and lines, and refuses shallow summaries. Use it when you need a definitive, verifiable understanding of behavior, architecture, or data flow.

How this skill works

Before starting, the skill determines the repository context (remote URL vs local) and the default branch so every citation is resolvable. It then performs five iterative passes—architecture, data flow, integration, patterns, and synthesis—each building on prior findings, producing diagrams and structured tables and grounding every claim in file/line evidence. It marks confidence levels, lists open questions, and never reports inferred facts as verified.

When to use it

You need an in-depth explanation of how a subsystem works across multiple files.
You want a verified data-flow trace from entry point to persistence or external calls.
You’re auditing coupling, dependencies, or potential dead code and need call-site evidence.
You require architectural diagrams and prioritized recommendations based on code reading.
You expect step-by-step iterative research with reproducible citations to code lines.

Best practices

Provide the repository context up front (remote URL or confirm local-only) so citations are linked and reproducible.
Ask a clear, scoped research question (entry point, feature, or folder) to focus iterations.
Allow multiple rounds of follow-up to trace deeper call chains discovered during research.
Expect factual outputs tied to file:line citations and be prepared to share missing access if proof is required.
Use the confidence ratings and open questions to decide next tracing actions or tests to run.

Example use cases

Explain exactly how HTTP requests reach a handler, how they are validated, transformed, and persisted, with code citations.
Trace an event from producer to consumer across background workers and show whether retries or dead-lettering occur.
Audit a plugin system to show where plugins are registered, loaded, and invoked, including any reflection or dynamic loading.
Investigate performance hot spots by following call chains from public API to database queries and caching layers.
Confirm whether a purportedly deprecated function is actually unused by locating all call sites.

FAQ

Do you require repository access to start?

Yes. I must know whether the repo is local-only or has a remote URL so I can produce linked, verifiable citations.

How deep will tracing go?

I will trace actual call chains as far as code exists and is reachable; I stop when code ends, an external system is invoked, or you ask me to pause for further direction.