home / skills / laurigates / claude-plugins / debugging-methodology
This skill guides you through a methodical debugging workflow with evidence preservation, hypothesis tracking, and targeted tool recommendations for memory,
npx playbooks add skill laurigates/claude-plugins --skill debugging-methodologyReview the files below or copy the command above to add this skill to your agents.
---
model: opus
name: debugging-methodology
description: Systematic debugging approach with tool recommendations for memory, performance, and system-level issues.
allowed-tools: Bash, Read, Grep, Glob
created: 2025-12-27
modified: 2025-12-27
reviewed: 2025-12-27
---
# Debugging Methodology
Systematic approach to finding and fixing bugs.
## Core Principles
1. **Occam's Razor** - Start with the simplest explanation
2. **Binary Search** - Isolate the problem area systematically
3. **Preserve Evidence** - Understand state before making changes
4. **Document Hypotheses** - Track what was tried and didn't work
## Debugging Workflow
```
1. Understand → What is expected vs actual behavior?
2. Reproduce → Can you trigger the bug reliably?
3. Locate → Where in the code does it happen?
4. Diagnose → Why does it happen? (root cause)
5. Fix → Minimal change to resolve
6. Verify → Confirm fix works, no regressions
```
## Common Bug Patterns
| Symptom | Likely Cause | Check First |
|---------|--------------|-------------|
| TypeError/null | Missing null check | Input validation |
| Off-by-one | Loop bounds, array index | Boundary conditions |
| Race condition | Async timing | Await/promise handling |
| Import error | Path/module resolution | File paths, exports |
| Type mismatch | Wrong type passed | Function signatures |
| Flaky test | Timing, shared state | Test isolation |
## System-Level Tools
### Memory Analysis
```bash
# Valgrind (C/C++/Rust)
valgrind --leak-check=full --show-leak-kinds=all ./program
valgrind --tool=massif ./program # Heap profiling
# Python
python -m memory_profiler script.py
```
### Performance Profiling
```bash
# Linux perf
perf record -g ./program
perf report
perf top # Real-time CPU usage
# Python
python -m cProfile -s cumtime script.py
```
### System Tracing (Traditional)
```bash
# System calls (ptrace-based, high overhead)
strace -f -e trace=all -p PID
# Library calls
ltrace -f -S ./program
# Open files/sockets
lsof -p PID
# Memory mapping
pmap -x PID
```
### eBPF Tracing (Modern, Production-Safe)
eBPF is the modern replacement for strace/ptrace-based tracing. Key advantages:
- **Low overhead**: Safe for production use
- **No recompilation**: Works on running binaries
- **Non-intrusive**: Doesn't stop program execution
- **Kernel-verified**: Bounded execution, can't crash the system
```bash
# BCC tools (install: apt install bpfcc-tools)
# Trace syscalls with timing (like strace but faster)
sudo syscount -p PID # Count syscalls
sudo opensnoop -p PID # Trace file opens
sudo execsnoop # Trace new processes
sudo tcpconnect # Trace TCP connections
sudo funccount 'vfs_*' # Count kernel function calls
# bpftrace (install: apt install bpftrace)
# One-liner tracing scripts
sudo bpftrace -e 'tracepoint:syscalls:sys_enter_open { printf("%s %s\n", comm, str(args->filename)); }'
sudo bpftrace -e 'uprobe:/bin/bash:readline { printf("readline\n"); }'
# Trace function arguments in Go/other languages
sudo bpftrace -e 'uprobe:./myapp:main.handleRequest { printf("called\n"); }'
```
**eBPF Tool Hierarchy**:
| Level | Tool | Use Case |
|-------|------|----------|
| High | BCC tools | Pre-built tracing scripts |
| Medium | bpftrace | One-liner custom traces |
| Low | libbpf/gobpf | Custom eBPF programs |
**When to use eBPF over strace**:
- Production systems (strace adds 10-100x overhead)
- Long-running traces
- High-frequency syscalls
- When you can't afford to slow down the process
### Network Debugging
```bash
# Packet capture
tcpdump -i any port 8080
# Connection status
ss -tuln
netstat -tuln
```
## Language-Specific Debugging
### Python
```python
# Quick debug
import pdb; pdb.set_trace()
# Better: ipdb or pudb
import ipdb; ipdb.set_trace()
# Print with context
print(f"{var=}") # Python 3.8+
```
### JavaScript/TypeScript
```javascript
// Browser/Node
debugger;
// Structured logging
console.log({ var1, var2, context: 'function_name' });
```
### Rust
```rust
// Debug print
dbg!(&variable);
// Backtrace on panic
RUST_BACKTRACE=1 cargo run
```
## Debugging Questions
When stuck, ask:
1. What changed recently that could cause this?
2. Does it happen in all environments or just one?
3. Is the bug in my code or a dependency?
4. What assumptions am I making that might be wrong?
5. Can I write a minimal reproduction?
## Effective Debugging Practices
- **Targeted changes**: Form a hypothesis, change one thing at a time
- **Use proper debuggers**: Step through code with breakpoints when possible
- **Find root causes**: Trace issues to their origin, fix the source
- **Reproduce first**: Create a minimal reproduction before attempting a fix
- **Verify the fix**: Confirm the fix resolves the issue and passes tests
This skill codifies a practical, systematic debugging methodology for memory, performance, and system-level issues and recommends tools for each stage. It combines time-tested principles (Occam's Razor, binary search, evidence preservation) with concrete commands for memory profilers, tracers, eBPF tools, and language-specific tips. The goal is faster root-cause discovery and safer diagnostics in production and development environments.
The skill walks you through a repeatable workflow: understand the expected behavior, reproduce the issue, locate the failing component, diagnose the root cause, apply a minimal fix, and verify no regressions. It pairs investigative steps with tool recommendations: valgrind and memory_profiler for leaks, perf and cProfile for CPU hotspots, strace/ltrace and eBPF (BCC/bpftrace) for syscall and kernel-level tracing. It also highlights targeted diagnostics for networks and common language ecosystems like Python, JavaScript, and Rust.
When should I choose eBPF over strace?
Use eBPF for production or long-running traces and high-frequency syscalls because it has low overhead and is non-intrusive; use strace for quick, local debugging when overhead is acceptable.
How do I avoid making things worse while debugging?
Preserve evidence (logs, dumps), change one thing at a time, document hypotheses and steps, and verify each change with tests or a reproducible scenario before deploying.