home / skills / laurigates / claude-plugins / debugging-methodology

debugging-methodology skill

safe

/code-quality-plugin/skills/debugging-methodology

This skill guides you through a methodical debugging workflow with evidence preservation, hypothesis tracking, and targeted tool recommendations for memory,

npx playbooks add skill laurigates/claude-plugins --skill debugging-methodology

Review the files below or copy the command above to add this skill to your agents.

Files (1)

SKILL.md

4.7 KB

---
model: opus
name: debugging-methodology
description: Systematic debugging approach with tool recommendations for memory, performance, and system-level issues.
allowed-tools: Bash, Read, Grep, Glob
created: 2025-12-27
modified: 2025-12-27
reviewed: 2025-12-27
---

# Debugging Methodology

Systematic approach to finding and fixing bugs.

## Core Principles

1. **Occam's Razor** - Start with the simplest explanation
2. **Binary Search** - Isolate the problem area systematically
3. **Preserve Evidence** - Understand state before making changes
4. **Document Hypotheses** - Track what was tried and didn't work

## Debugging Workflow

```
1. Understand → What is expected vs actual behavior?
2. Reproduce → Can you trigger the bug reliably?
3. Locate → Where in the code does it happen?
4. Diagnose → Why does it happen? (root cause)
5. Fix → Minimal change to resolve
6. Verify → Confirm fix works, no regressions
```

## Common Bug Patterns

| Symptom | Likely Cause | Check First |
|---------|--------------|-------------|
| TypeError/null | Missing null check | Input validation |
| Off-by-one | Loop bounds, array index | Boundary conditions |
| Race condition | Async timing | Await/promise handling |
| Import error | Path/module resolution | File paths, exports |
| Type mismatch | Wrong type passed | Function signatures |
| Flaky test | Timing, shared state | Test isolation |

## System-Level Tools

### Memory Analysis
```bash
# Valgrind (C/C++/Rust)
valgrind --leak-check=full --show-leak-kinds=all ./program
valgrind --tool=massif ./program  # Heap profiling

# Python
python -m memory_profiler script.py
```

### Performance Profiling
```bash
# Linux perf
perf record -g ./program
perf report
perf top  # Real-time CPU usage

# Python
python -m cProfile -s cumtime script.py
```

### System Tracing (Traditional)
```bash
# System calls (ptrace-based, high overhead)
strace -f -e trace=all -p PID

# Library calls
ltrace -f -S ./program

# Open files/sockets
lsof -p PID

# Memory mapping
pmap -x PID
```

### eBPF Tracing (Modern, Production-Safe)

eBPF is the modern replacement for strace/ptrace-based tracing. Key advantages:
- **Low overhead**: Safe for production use
- **No recompilation**: Works on running binaries
- **Non-intrusive**: Doesn't stop program execution
- **Kernel-verified**: Bounded execution, can't crash the system

```bash
# BCC tools (install: apt install bpfcc-tools)
# Trace syscalls with timing (like strace but faster)
sudo syscount -p PID              # Count syscalls
sudo opensnoop -p PID             # Trace file opens
sudo execsnoop                    # Trace new processes
sudo tcpconnect                   # Trace TCP connections
sudo funccount 'vfs_*'            # Count kernel function calls

# bpftrace (install: apt install bpftrace)
# One-liner tracing scripts
sudo bpftrace -e 'tracepoint:syscalls:sys_enter_open { printf("%s %s\n", comm, str(args->filename)); }'
sudo bpftrace -e 'uprobe:/bin/bash:readline { printf("readline\n"); }'

# Trace function arguments in Go/other languages
sudo bpftrace -e 'uprobe:./myapp:main.handleRequest { printf("called\n"); }'
```

**eBPF Tool Hierarchy**:
| Level | Tool | Use Case |
|-------|------|----------|
| High | BCC tools | Pre-built tracing scripts |
| Medium | bpftrace | One-liner custom traces |
| Low | libbpf/gobpf | Custom eBPF programs |

**When to use eBPF over strace**:
- Production systems (strace adds 10-100x overhead)
- Long-running traces
- High-frequency syscalls
- When you can't afford to slow down the process

### Network Debugging
```bash
# Packet capture
tcpdump -i any port 8080

# Connection status
ss -tuln
netstat -tuln
```

## Language-Specific Debugging

### Python
```python
# Quick debug
import pdb; pdb.set_trace()

# Better: ipdb or pudb
import ipdb; ipdb.set_trace()

# Print with context
print(f"{var=}")  # Python 3.8+
```

### JavaScript/TypeScript
```javascript
// Browser/Node
debugger;

// Structured logging
console.log({ var1, var2, context: 'function_name' });
```

### Rust
```rust
// Debug print
dbg!(&variable);

// Backtrace on panic
RUST_BACKTRACE=1 cargo run
```

## Debugging Questions

When stuck, ask:
1. What changed recently that could cause this?
2. Does it happen in all environments or just one?
3. Is the bug in my code or a dependency?
4. What assumptions am I making that might be wrong?
5. Can I write a minimal reproduction?

## Effective Debugging Practices

- **Targeted changes**: Form a hypothesis, change one thing at a time
- **Use proper debuggers**: Step through code with breakpoints when possible
- **Find root causes**: Trace issues to their origin, fix the source
- **Reproduce first**: Create a minimal reproduction before attempting a fix
- **Verify the fix**: Confirm the fix resolves the issue and passes tests

Overview

This skill codifies a practical, systematic debugging methodology for memory, performance, and system-level issues and recommends tools for each stage. It combines time-tested principles (Occam's Razor, binary search, evidence preservation) with concrete commands for memory profilers, tracers, eBPF tools, and language-specific tips. The goal is faster root-cause discovery and safer diagnostics in production and development environments.

How this skill works

The skill walks you through a repeatable workflow: understand the expected behavior, reproduce the issue, locate the failing component, diagnose the root cause, apply a minimal fix, and verify no regressions. It pairs investigative steps with tool recommendations: valgrind and memory_profiler for leaks, perf and cProfile for CPU hotspots, strace/ltrace and eBPF (BCC/bpftrace) for syscall and kernel-level tracing. It also highlights targeted diagnostics for networks and common language ecosystems like Python, JavaScript, and Rust.

When to use it

When a bug can be reproduced but its location or cause is unclear
When memory leaks, excessive CPU usage, or high syscall volume appear
When troubleshooting production systems where overhead must be minimal
When tests are flaky or race conditions are suspected
When you need a structured, auditable debugging process

Best practices

Reproduce reliably and preserve state (logs, dumps) before changing behavior
Form one hypothesis at a time and document what you try and why
Use binary search to isolate the problem area quickly
Prefer non-intrusive tools (eBPF) in production to avoid high overhead
Verify fixes with regression tests and minimal reproducible examples

Example use cases

Find a memory leak in a C or Rust service using valgrind or massif
Trace high-frequency syscalls in production with bpftrace or BCC tools
Profile Python CPU hotspots with cProfile and memory usage with memory_profiler
Diagnose intermittent races by adding targeted logging, breakpoints, and controlled replays
Investigate network connection failures with tcpdump and ss/netstat

FAQ

When should I choose eBPF over strace?

Use eBPF for production or long-running traces and high-frequency syscalls because it has low overhead and is non-intrusive; use strace for quick, local debugging when overhead is acceptable.

How do I avoid making things worse while debugging?

Preserve evidence (logs, dumps), change one thing at a time, document hypotheses and steps, and verify each change with tests or a reproducible scenario before deploying.