home / skills / laurigates / claude-plugins / debugging-methodology

This skill guides you through a methodical debugging workflow with evidence preservation, hypothesis tracking, and targeted tool recommendations for memory,

npx playbooks add skill laurigates/claude-plugins --skill debugging-methodology

Review the files below or copy the command above to add this skill to your agents.

Files (1)
SKILL.md
4.7 KB
---
model: opus
name: debugging-methodology
description: Systematic debugging approach with tool recommendations for memory, performance, and system-level issues.
allowed-tools: Bash, Read, Grep, Glob
created: 2025-12-27
modified: 2025-12-27
reviewed: 2025-12-27
---

# Debugging Methodology

Systematic approach to finding and fixing bugs.

## Core Principles

1. **Occam's Razor** - Start with the simplest explanation
2. **Binary Search** - Isolate the problem area systematically
3. **Preserve Evidence** - Understand state before making changes
4. **Document Hypotheses** - Track what was tried and didn't work

## Debugging Workflow

```
1. Understand → What is expected vs actual behavior?
2. Reproduce → Can you trigger the bug reliably?
3. Locate → Where in the code does it happen?
4. Diagnose → Why does it happen? (root cause)
5. Fix → Minimal change to resolve
6. Verify → Confirm fix works, no regressions
```

## Common Bug Patterns

| Symptom | Likely Cause | Check First |
|---------|--------------|-------------|
| TypeError/null | Missing null check | Input validation |
| Off-by-one | Loop bounds, array index | Boundary conditions |
| Race condition | Async timing | Await/promise handling |
| Import error | Path/module resolution | File paths, exports |
| Type mismatch | Wrong type passed | Function signatures |
| Flaky test | Timing, shared state | Test isolation |

## System-Level Tools

### Memory Analysis
```bash
# Valgrind (C/C++/Rust)
valgrind --leak-check=full --show-leak-kinds=all ./program
valgrind --tool=massif ./program  # Heap profiling

# Python
python -m memory_profiler script.py
```

### Performance Profiling
```bash
# Linux perf
perf record -g ./program
perf report
perf top  # Real-time CPU usage

# Python
python -m cProfile -s cumtime script.py
```

### System Tracing (Traditional)
```bash
# System calls (ptrace-based, high overhead)
strace -f -e trace=all -p PID

# Library calls
ltrace -f -S ./program

# Open files/sockets
lsof -p PID

# Memory mapping
pmap -x PID
```

### eBPF Tracing (Modern, Production-Safe)

eBPF is the modern replacement for strace/ptrace-based tracing. Key advantages:
- **Low overhead**: Safe for production use
- **No recompilation**: Works on running binaries
- **Non-intrusive**: Doesn't stop program execution
- **Kernel-verified**: Bounded execution, can't crash the system

```bash
# BCC tools (install: apt install bpfcc-tools)
# Trace syscalls with timing (like strace but faster)
sudo syscount -p PID              # Count syscalls
sudo opensnoop -p PID             # Trace file opens
sudo execsnoop                    # Trace new processes
sudo tcpconnect                   # Trace TCP connections
sudo funccount 'vfs_*'            # Count kernel function calls

# bpftrace (install: apt install bpftrace)
# One-liner tracing scripts
sudo bpftrace -e 'tracepoint:syscalls:sys_enter_open { printf("%s %s\n", comm, str(args->filename)); }'
sudo bpftrace -e 'uprobe:/bin/bash:readline { printf("readline\n"); }'

# Trace function arguments in Go/other languages
sudo bpftrace -e 'uprobe:./myapp:main.handleRequest { printf("called\n"); }'
```

**eBPF Tool Hierarchy**:
| Level | Tool | Use Case |
|-------|------|----------|
| High | BCC tools | Pre-built tracing scripts |
| Medium | bpftrace | One-liner custom traces |
| Low | libbpf/gobpf | Custom eBPF programs |

**When to use eBPF over strace**:
- Production systems (strace adds 10-100x overhead)
- Long-running traces
- High-frequency syscalls
- When you can't afford to slow down the process

### Network Debugging
```bash
# Packet capture
tcpdump -i any port 8080

# Connection status
ss -tuln
netstat -tuln
```

## Language-Specific Debugging

### Python
```python
# Quick debug
import pdb; pdb.set_trace()

# Better: ipdb or pudb
import ipdb; ipdb.set_trace()

# Print with context
print(f"{var=}")  # Python 3.8+
```

### JavaScript/TypeScript
```javascript
// Browser/Node
debugger;

// Structured logging
console.log({ var1, var2, context: 'function_name' });
```

### Rust
```rust
// Debug print
dbg!(&variable);

// Backtrace on panic
RUST_BACKTRACE=1 cargo run
```

## Debugging Questions

When stuck, ask:
1. What changed recently that could cause this?
2. Does it happen in all environments or just one?
3. Is the bug in my code or a dependency?
4. What assumptions am I making that might be wrong?
5. Can I write a minimal reproduction?

## Effective Debugging Practices

- **Targeted changes**: Form a hypothesis, change one thing at a time
- **Use proper debuggers**: Step through code with breakpoints when possible
- **Find root causes**: Trace issues to their origin, fix the source
- **Reproduce first**: Create a minimal reproduction before attempting a fix
- **Verify the fix**: Confirm the fix resolves the issue and passes tests

Overview

This skill codifies a practical, systematic debugging methodology for memory, performance, and system-level issues and recommends tools for each stage. It combines time-tested principles (Occam's Razor, binary search, evidence preservation) with concrete commands for memory profilers, tracers, eBPF tools, and language-specific tips. The goal is faster root-cause discovery and safer diagnostics in production and development environments.

How this skill works

The skill walks you through a repeatable workflow: understand the expected behavior, reproduce the issue, locate the failing component, diagnose the root cause, apply a minimal fix, and verify no regressions. It pairs investigative steps with tool recommendations: valgrind and memory_profiler for leaks, perf and cProfile for CPU hotspots, strace/ltrace and eBPF (BCC/bpftrace) for syscall and kernel-level tracing. It also highlights targeted diagnostics for networks and common language ecosystems like Python, JavaScript, and Rust.

When to use it

  • When a bug can be reproduced but its location or cause is unclear
  • When memory leaks, excessive CPU usage, or high syscall volume appear
  • When troubleshooting production systems where overhead must be minimal
  • When tests are flaky or race conditions are suspected
  • When you need a structured, auditable debugging process

Best practices

  • Reproduce reliably and preserve state (logs, dumps) before changing behavior
  • Form one hypothesis at a time and document what you try and why
  • Use binary search to isolate the problem area quickly
  • Prefer non-intrusive tools (eBPF) in production to avoid high overhead
  • Verify fixes with regression tests and minimal reproducible examples

Example use cases

  • Find a memory leak in a C or Rust service using valgrind or massif
  • Trace high-frequency syscalls in production with bpftrace or BCC tools
  • Profile Python CPU hotspots with cProfile and memory usage with memory_profiler
  • Diagnose intermittent races by adding targeted logging, breakpoints, and controlled replays
  • Investigate network connection failures with tcpdump and ss/netstat

FAQ

When should I choose eBPF over strace?

Use eBPF for production or long-running traces and high-frequency syscalls because it has low overhead and is non-intrusive; use strace for quick, local debugging when overhead is acceptable.

How do I avoid making things worse while debugging?

Preserve evidence (logs, dumps), change one thing at a time, document hypotheses and steps, and verify each change with tests or a reproducible scenario before deploying.