home / skills / willsigmon / sigstack / find-bug-root-cause

find-bug-root-cause skill

safe

/plugins/app-dev/skills/find-bug-root-cause

This skill helps identify the true root cause of failures across UI, service, and backend, not just superficial symptoms.

npx playbooks add skill willsigmon/sigstack --skill find-bug-root-cause

Review the files below or copy the command above to add this skill to your agents.

Files (1)

SKILL.md

1.8 KB

---
name: find-bug-root-cause
description: Deep investigation to find actual root cause (not just symptoms)
model: sonnet
---

# Root Cause Analysis Protocol

When a feature is broken, don't just add logging - find and fix the ACTUAL problem.

## Methodology

1. **Understand the Symptom:**
   - What does the user see/experience?
   - What SHOULD happen?
   - What ACTUALLY happens?

2. **Trace the Data Flow:**
   - Follow code execution from UI → ViewModel → Service → Backend
   - Check each layer for failures
   - Use breakpoint logic (mental trace)

3. **Check Dependencies:**
   - Is the service initialized? (`DIContainer.shared.service`)
   - Are databases/indexes ready?
   - Are async operations completing?
   - Are errors being swallowed?

4. **Find the Bug:**
   - Don't stop at "add try/catch" or "add logging"
   - Find the EXACT line where logic fails
   - Understand WHY it fails

5. **Fix It Properly:**
   - Fix root cause, not symptoms
   - Add defensive checks only AFTER fixing core issue
   - Test the fix logic

6. **Verify:**
   - Does the fix address the root cause?
   - Are there edge cases?
   - Will this prevent recurrence?

## Example: "Search returns no results"

❌ **Bad approach:**
```swift
// Just add logging
AppLog.info("Search started")
let results = await search(query)
AppLog.info("Search returned \(results.count) results")
```

✅ **Good approach:**
```swift
// Find WHY search returns nothing:
// 1. Is ScriptureDatabase initialized? → Check init
// 2. Are indexes built? → Check index creation
// 3. Is query being processed? → Check query transformation
// 4. FIX the actual issue (e.g., index not built on first launch)

// THEN add logging to prevent future issues
```

**Return the root cause and the actual fix that solves it.**

Overview

This skill performs disciplined root-cause analysis for broken features, focusing on the actual fault rather than superficial symptoms. It guides investigators through reproducible steps to trace behavior, inspect dependencies, and deliver a corrective fix that prevents recurrence.

How this skill works

The skill walks you from symptom to root cause by clarifying expected vs actual behavior, tracing data and control flow across layers, and validating external dependencies. It prioritizes finding the exact failing line or condition, documenting why it fails, applying a targeted fix, and then verifying edge cases and resilience.

When to use it

A feature intermittently fails despite added logging or retries
Bug reports describe symptoms but root cause is unclear
Incidents where a quick patch recurs or causes regressions
New deployments that introduce unexpected behavior
When tests pass but users still observe incorrect results

Best practices

Start by reproducing the symptom and writing a minimal failing test or scenario
Trace execution across UI → ViewModel → Service → Backend; inspect state at each boundary
Validate all dependencies (initialization, DB/index readiness, async completion)
Avoid quick catch-all fixes; find and fix the exact failing condition first
Add defensive checks and logging only after the core bug is corrected
Verify fixes against edge cases and document the preventive measures

Example use cases

Search returns zero results on first launch — find index build timing and fix initialization
Intermittent auth failures — trace token refresh flow and identify race condition
Background job not processing items — inspect queue initialization and worker startup order
Configuration value missing in certain environments — trace DI container registration and load order
Silent error swallowed in promise chain — locate the unhandled rejection and repair error propagation

FAQ

Should I add logging first?

Log strategically to aid tracing, but don’t rely on logging alone; use it to confirm hypotheses while you follow the data flow to the failing logic.

When are defensive checks appropriate?

Add them after you fix the root cause to guard against regressions or external faults, not as a substitute for resolving the underlying bug.