home / skills / technickai / ai-coding-config / systematic-debugging

systematic-debugging skill

safe

/plugins/core/skills/systematic-debugging

This skill guides you through systematic debugging to identify root causes before fixes, reducing guessing and preventing new issues.

npx playbooks add skill technickai/ai-coding-config --skill systematic-debugging

Review the files below or copy the command above to add this skill to your agents.

Files (1)

SKILL.md

3.9 KB

---
name: systematic-debugging
# prettier-ignore
description: "Use when debugging bugs, test failures, unexpected behavior, or needing to find root cause before fixing"
version: 1.2.0
category: debugging
triggers:
  - "debug"
  - "investigate"
  - "root cause"
  - "why is this"
  - "not working"
  - "test failing"
  - "unexpected behavior"
  - "error"
---

<objective>
Find the root cause before writing fixes. Understanding why something breaks leads to correct fixes. Guessing wastes time and creates new problems.

Core principle: If you can't explain WHY it's broken, you're not ready to fix it. Every
fix must address a specific, understood root cause. </objective>

<when-to-use>
Use for any technical issue: test failures, build errors, bugs, unexpected behavior, performance problems. Especially valuable when previous attempts haven't worked or when tempted to try a "quick fix."
</when-to-use>

<start-with-evidence>
Read error messages completely. Stack traces, line numbers, and error codes contain valuable information. The error message often points directly to the problem.

Work to reproduce the issue reliably. If you can't trigger it consistently, gather more
data before proposing solutions. Document the exact steps that trigger the failure.

Check what changed recently. Review commits, new dependencies, configuration changes,
environmental differences. Most bugs correlate with recent changes.
</start-with-evidence>

<trace-the-problem>
Follow the data flow backward from the error. Where does the bad value originate? Work through the call stack until you find the source. Understanding the complete path from source to symptom reveals the true problem.

When multiple components interact, add diagnostic output at each boundary to identify
which component fails. This narrows the investigation to the specific failing layer.
</trace-the-problem>

<compare-with-working-code>
Find similar code that works correctly. Compare the working and broken versions systematically. Every difference matters until proven otherwise.

When implementing a pattern, read reference implementations thoroughly. Understand their
dependencies, settings, and environmental requirements. </compare-with-working-code>

<test-understanding>
Form a clear hypothesis: "X causes the problem because Y." Test with the smallest possible change. Change one variable at a time to isolate the cause.

When a hypothesis proves wrong, form a new one based on what you learned. Don't layer
fixes on top of failed attempts. </test-understanding>

<implement-fix>
Create a test that reproduces the issue before fixing it. This ensures you understand the problem and can verify the fix works.

Apply a single, focused fix that addresses the root cause. Resist bundling other
improvements or refactoring.

Verify the fix resolves the issue without breaking other functionality. </implement-fix>

<recognizing-architectural-problems>
When multiple fix attempts fail in different ways, the architecture might be the problem. Signs include:
- Each fix reveals new coupling or shared state issues
- Fixes require extensive refactoring to work properly
- Each attempted fix creates new symptoms elsewhere

These patterns suggest reconsidering the fundamental approach rather than continuing to
patch symptoms. </recognizing-architectural-problems>

<warning-signs>
Stop and investigate properly when thinking:
- "Try this and see if it works"
- "Quick fix for now, investigate later"
- "I don't fully understand but this might help"
- "Here are several things to try"

These thoughts signal you're guessing rather than debugging systematically.
</warning-signs>

<when-stuck>
If you don't understand something, say so clearly. Ask for help or research more. Understanding the problem before attempting fixes saves time and prevents introducing new bugs.

Systematic debugging finds and fixes the real problem. Random attempts waste time and
create new issues. </when-stuck>

Overview

This skill teaches a disciplined approach to debugging: find the root cause before writing fixes. It emphasizes evidence, reproducible steps, and minimal, test-backed changes so fixes resolve the true problem and avoid introducing new regressions.

How this skill works

Start by collecting concrete evidence: full error messages, stack traces, and exact reproduction steps. Trace data flow backward through the call stack and component boundaries to locate the origin of the bad value. Form clear hypotheses, test them with the smallest possible change, and create a regression test before applying a focused fix.

When to use it

Investigating failing tests, build errors, or crashes
Diagnosing unexpected behavior or performance regressions
When multiple quick fixes have already failed
Before applying hotfixes to production systems
When uncertain about the root cause and tempted to guess

Best practices

Read error messages and stack traces completely; they often point directly to the issue
Reproduce the problem reliably and document exact steps before changing code
Compare broken code with a known-working implementation and treat every difference as suspect
Test one hypothesis at a time; change only a single variable to isolate cause
Write a failing test for the bug first, then implement a single focused fix and verify no regressions

Example use cases

A unit test intermittently fails—trace upstream data and add diagnostic logs to find where the state diverges
A new dependency causes runtime errors—compare working and broken environments and inspect recent commits
Performance spike after a release—reproduce load profile, add metrics at component boundaries, and identify the bottleneck
Complex integration failure—inject test doubles at interfaces to determine which service introduces the bad input

FAQ

What if I can't reproduce the bug reliably?

Gather more data: logs, environment snapshots, and timing information. Add lightweight instrumentation or increase sampling to capture the failure context before attempting fixes.

When should I consider architecture changes instead of fixes?

If multiple focused fixes keep revealing shared-state coupling, cascading failures, or require extensive refactoring to succeed, reassess the architecture rather than layering more patches.