home / skills / outfitter-dev / agents / find-root-causes

This skill guides formal root cause analysis by documenting RCA trails, applying elimination techniques, and delivering prevention-focused reports.

npx playbooks add skill outfitter-dev/agents --skill find-root-causes

Review the files below or copy the command above to add this skill to your agents.

Files (4)
SKILL.md
4.3 KB
---
name: find-root-causes
description: This skill should be used when diagnosing failures, investigating incidents, finding root causes, or when "root cause", "diagnosis", "investigate", or "--rca" are mentioned.
agent: debugger
context: fork
metadata:
  version: "2.0.0"
  related-skills:
    - debugging
    - codebase-recon
    - report-findings
---

# Root Cause Analysis

Delegated investigation: symptom → hypothesis → elimination → root cause → prevention.

## Steps

1. Load the `outfitter:debugging` skill for systematic investigation
2. Apply elimination techniques from this skill's references
3. Document investigation trail using RCA templates
4. Deliver root cause report with prevention recommendations

<when_to_use>

- Diagnosing system failures or unexpected behavior
- Investigating incidents or outages
- Finding the actual cause vs surface symptoms
- Preventing recurrence through understanding
- Post-incident reviews requiring formal documentation

NOT for: known issues with documented fixes, simple configuration errors, routine debugging (use `debugging` skill directly)

</when_to_use>

<rca_focus>

This skill extends `debugging` with formal RCA practices:

| Aspect | Debugging | Root Cause Analysis |
|--------|-----------|---------------------|
| Scope | Fix the immediate issue | Understand why it happened |
| Output | Working code | RCA report + prevention |
| Documentation | Investigation notes | Formal templates |
| Goal | Resolution | Prevention of recurrence |

Use `debugging` for day-to-day bug fixes. Use `find-root-causes` for incidents requiring formal investigation and documentation.

</rca_focus>

<elimination_techniques>

Three core techniques for narrowing to root cause:

| Technique | When to Use | Method |
|-----------|-------------|--------|
| **Binary Search** | Large problem space, ordered changes | Bisect the change range |
| **Variable Isolation** | Multiple variables, need causation | Control all but one |
| **Process of Elimination** | Finite set of possible causes | Rule out systematically |

See [elimination-techniques.md](references/elimination-techniques.md) for detailed methods and examples.

</elimination_techniques>

<documentation>

## Investigation Trail

Log every step for handoff and pattern recognition:

```
[TIME] STAGE: Action → Result
[10:15] DISCOVERY: Gathered error logs → Found NullPointerException
[10:22] HYPOTHESIS: User object not initialized
[10:28] TEST: Added null check logging → Confirmed user is null
```

## RCA Report Structure

1. **Summary** — one-sentence root cause
2. **Timeline** — events leading to incident
3. **Impact** — what was affected, duration
4. **Root Cause** — why it happened (not just what)
5. **Contributing Factors** — conditions that enabled it
6. **Prevention** — changes to prevent recurrence
7. **Detection** — how to catch it earlier next time

See [documentation-templates.md](references/documentation-templates.md) for full templates.

</documentation>

<common_pitfalls>

| Trap | Counter |
|------|---------|
| "I already looked at that" | Re-examine with fresh evidence |
| "That can't be the issue" | Test anyway, let evidence decide |
| "We need to fix this quickly" | Methodical investigation is faster |
| Confirmation bias | Actively seek disconfirming evidence |
| Correlation = causation | Test direct causal mechanism |

See [pitfalls.md](references/pitfalls.md) for detailed resistance patterns and recovery.

</common_pitfalls>

<rules>

ALWAYS:
- Load debugging skill for systematic investigation methodology
- Use elimination techniques to narrow root cause
- Document investigation trail as you go
- Produce formal RCA report for incidents
- Include prevention recommendations
- Identify contributing factors, not just root cause

NEVER:
- Skip formal documentation for incidents
- Stop at "what happened" without "why"
- Propose fixes without understanding root cause
- Omit prevention recommendations
- Blame individuals (focus on systems)

</rules>

<references>

- [elimination-techniques.md](references/elimination-techniques.md) — binary search, variable isolation, process of elimination
- [pitfalls.md](references/pitfalls.md) — cognitive biases and resistance patterns
- [documentation-templates.md](references/documentation-templates.md) — investigation logs and RCA reports

</references>

Overview

This skill helps teams conduct formal root cause analysis for incidents, outages, and complex failures. It extends everyday debugging with structured elimination techniques, a documented investigation trail, and a formal RCA report that includes prevention recommendations. Use it when you need to move from symptom handling to preventing recurrence.

How this skill works

The skill guides a delegated investigation flow: symptom → hypothesis → elimination → root cause → prevention. It recommends loading the systematic debugging methodology, applying three core elimination techniques (binary search, variable isolation, process of elimination), and logging each step. The output is an RCA report with a one-line summary, timeline, impact, root cause, contributing factors, prevention steps, and detection guidance.

When to use it

  • Diagnosing system failures or unexpected, unexplained behavior
  • Investigating incidents or outages that require post-incident review
  • When you need to distinguish surface symptoms from the actual cause
  • Preparing formal documentation and prevention recommendations after an incident
  • Any situation where recurrence must be prevented and systemic factors identified

Best practices

  • Always pair this skill with a systematic debugging method at the start of investigation
  • Log an investigation trail with time-stamped stages (discovery, hypothesis, tests, results)
  • Apply elimination techniques: bisect changes, isolate variables, and rule out finite causes
  • Explicitly document contributing factors, not just the triggering defect
  • Deliver an RCA report that includes detection strategies and concrete prevention actions
  • Avoid assigning blame; focus on system and process improvements

Example use cases

  • Major outage where root cause and prevention are required for stakeholders
  • Intermittent failures with multiple plausible causes that need systematic narrowing
  • Post-incident reviews to produce a timeline, impact assessment, and prevention plan
  • Handing off an investigation to another team with a clear, time-stamped trail
  • Incident analyses where regulatory or organizational audits require formal RCA documentation

FAQ

When should I use regular debugging instead of this skill?

Use day-to-day debugging for simple fixes and known issues. Use this skill for incidents that need formal investigation, documentation, and prevention recommendations.

What elimination technique should I start with?

Choose based on problem shape: use binary search for ordered change ranges, variable isolation when many variables may interact, and process of elimination for a finite candidate set.

What must the RCA report always include?

A one-sentence summary, timeline, impact, clear explanation of why it happened, contributing factors, prevention actions, and detection methods.