home / skills / willsigmon / sigstack / find-bug-root-cause

find-bug-root-cause skill

/plugins/app-dev/skills/find-bug-root-cause

This skill helps identify the true root cause of failures across UI, service, and backend, not just superficial symptoms.

npx playbooks add skill willsigmon/sigstack --skill find-bug-root-cause

Review the files below or copy the command above to add this skill to your agents.

Files (1)
SKILL.md
1.8 KB
---
name: find-bug-root-cause
description: Deep investigation to find actual root cause (not just symptoms)
model: sonnet
---

# Root Cause Analysis Protocol

When a feature is broken, don't just add logging - find and fix the ACTUAL problem.

## Methodology

1. **Understand the Symptom:**
   - What does the user see/experience?
   - What SHOULD happen?
   - What ACTUALLY happens?

2. **Trace the Data Flow:**
   - Follow code execution from UI → ViewModel → Service → Backend
   - Check each layer for failures
   - Use breakpoint logic (mental trace)

3. **Check Dependencies:**
   - Is the service initialized? (`DIContainer.shared.service`)
   - Are databases/indexes ready?
   - Are async operations completing?
   - Are errors being swallowed?

4. **Find the Bug:**
   - Don't stop at "add try/catch" or "add logging"
   - Find the EXACT line where logic fails
   - Understand WHY it fails

5. **Fix It Properly:**
   - Fix root cause, not symptoms
   - Add defensive checks only AFTER fixing core issue
   - Test the fix logic

6. **Verify:**
   - Does the fix address the root cause?
   - Are there edge cases?
   - Will this prevent recurrence?

## Example: "Search returns no results"

❌ **Bad approach:**
```swift
// Just add logging
AppLog.info("Search started")
let results = await search(query)
AppLog.info("Search returned \(results.count) results")
```

✅ **Good approach:**
```swift
// Find WHY search returns nothing:
// 1. Is ScriptureDatabase initialized? → Check init
// 2. Are indexes built? → Check index creation
// 3. Is query being processed? → Check query transformation
// 4. FIX the actual issue (e.g., index not built on first launch)

// THEN add logging to prevent future issues
```

**Return the root cause and the actual fix that solves it.**

Overview

This skill performs disciplined root-cause analysis for broken features, focusing on the actual fault rather than superficial symptoms. It guides investigators through reproducible steps to trace behavior, inspect dependencies, and deliver a corrective fix that prevents recurrence.

How this skill works

The skill walks you from symptom to root cause by clarifying expected vs actual behavior, tracing data and control flow across layers, and validating external dependencies. It prioritizes finding the exact failing line or condition, documenting why it fails, applying a targeted fix, and then verifying edge cases and resilience.

When to use it

  • A feature intermittently fails despite added logging or retries
  • Bug reports describe symptoms but root cause is unclear
  • Incidents where a quick patch recurs or causes regressions
  • New deployments that introduce unexpected behavior
  • When tests pass but users still observe incorrect results

Best practices

  • Start by reproducing the symptom and writing a minimal failing test or scenario
  • Trace execution across UI → ViewModel → Service → Backend; inspect state at each boundary
  • Validate all dependencies (initialization, DB/index readiness, async completion)
  • Avoid quick catch-all fixes; find and fix the exact failing condition first
  • Add defensive checks and logging only after the core bug is corrected
  • Verify fixes against edge cases and document the preventive measures

Example use cases

  • Search returns zero results on first launch — find index build timing and fix initialization
  • Intermittent auth failures — trace token refresh flow and identify race condition
  • Background job not processing items — inspect queue initialization and worker startup order
  • Configuration value missing in certain environments — trace DI container registration and load order
  • Silent error swallowed in promise chain — locate the unhandled rejection and repair error propagation

FAQ

Should I add logging first?

Log strategically to aid tracing, but don’t rely on logging alone; use it to confirm hypotheses while you follow the data flow to the failing logic.

When are defensive checks appropriate?

Add them after you fix the root cause to guard against regressions or external faults, not as a substitute for resolving the underlying bug.