home / skills / secondsky / claude-skills / root-cause-tracing

root-cause-tracing skill

safe

/plugins/root-cause-tracing/skills/root-cause-tracing

This skill helps you trace bugs backward through the call stack to identify the original trigger and fix at source.

npx playbooks add skill secondsky/claude-skills --skill root-cause-tracing

Review the files below or copy the command above to add this skill to your agents.

Files (1)

SKILL.md

4.0 KB

---
name: root-cause-tracing
description: Systematically trace bugs backward through call stack to find original trigger. Use when errors occur deep in execution and you need to trace back to find the original trigger.
version: 1.1.0
---

# Root Cause Tracing

## Overview

Bugs often manifest deep in the call stack (git init in wrong directory, file created in wrong location, database opened with wrong path). Your instinct is to fix where the error appears, but that's treating a symptom.

**Core principle:** Trace backward through the call chain until you find the original trigger, then fix at the source.

## When to Use

**Use when:**
- Error happens deep in execution (not at entry point)
- Stack trace shows long call chain
- Unclear where invalid data originated
- Need to find which test/code triggers the problem

## The Tracing Process

### 1. Observe the Symptom
```
Error: git init failed in ~/project/packages/core
```

### 2. Find Immediate Cause
**What code directly causes this?**
```typescript
await execFileAsync('git', ['init'], { cwd: projectDir });
```

### 3. Ask: What Called This?
```typescript
WorktreeManager.createSessionWorktree(projectDir, sessionId)
  → called by Session.initializeWorkspace()
  → called by Session.create()
  → called by test at Project.create()
```

### 4. Keep Tracing Up
**What value was passed?**
- `projectDir = ''` (empty string!)
- Empty string as `cwd` resolves to `process.cwd()`
- That's the source code directory!

### 5. Find Original Trigger
**Where did empty string come from?**
```typescript
const context = setupCoreTest(); // Returns { tempDir: '' }
Project.create('name', context.tempDir); // Accessed before beforeEach!
```

## Adding Stack Traces

When you can't trace manually, add instrumentation:

```typescript
// Before the problematic operation
async function gitInit(directory: string) {
  const stack = new Error().stack;
  console.error('DEBUG git init:', {
    directory,
    cwd: process.cwd(),
    nodeEnv: process.env.NODE_ENV,
    stack,
  });

  await execFileAsync('git', ['init'], { cwd: directory });
}
```

**Critical:** Use `console.error()` in tests (not logger - may not show)

**Run and capture:**
```bash
bun test 2>&1 | grep 'DEBUG git init'
```

**Analyze stack traces:**
- Look for test file names
- Find the line number triggering the call
- Identify the pattern (same test? same parameter?)

## Finding Which Test Causes Pollution

If something appears during tests but you don't know which test:

Use the bisection script to run tests one-by-one:

```bash
# Example: find which test creates .git in wrong place
bun test --run --bail 2>&1 | tee test-output.log
```

Runs tests one-by-one, stops at first polluter.

## Real Example: Empty projectDir

**Symptom:** `.git` created in `packages/core/` (source code)

**Trace chain:**
1. `git init` runs in `process.cwd()` ← empty cwd parameter
2. WorktreeManager called with empty projectDir
3. Session.create() passed empty string
4. Test accessed `context.tempDir` before beforeEach
5. setupCoreTest() returns `{ tempDir: '' }` initially

**Root cause:** Top-level variable initialization accessing empty value

**Fix:** Made tempDir a getter that throws if accessed before beforeEach

**Also added defense-in-depth:**
- Layer 1: Project.create() validates directory
- Layer 2: WorkspaceManager validates not empty
- Layer 3: NODE_ENV guard refuses git init outside tmpdir
- Layer 4: Stack trace logging before git init

## Key Principle

**NEVER fix just where the error appears.** Trace back to find the original trigger.

## Stack Trace Tips

**In tests:** Use `console.error()` not logger - logger may be suppressed
**Before operation:** Log before the dangerous operation, not after it fails
**Include context:** Directory, cwd, environment variables, timestamps
**Capture stack:** `new Error().stack` shows complete call chain

## Real-World Impact

From debugging session:
- Found root cause through 5-level trace
- Fixed at source (getter validation)
- Added 4 layers of defense
- 1847 tests passed, zero pollution

Overview

This skill systematically traces bugs backward through the call stack to identify the original trigger instead of treating symptoms. It guides you to inspect immediate failures, follow call chains up to the true source, and add targeted instrumentation to capture stack context. The outcome is fixes at the origin and layered defenses to prevent recurrence.

How this skill works

The skill inspects runtime failures and their stack traces, locating the code that directly caused the error and then asking "what called this?" repeatedly until the initial trigger is found. When manual tracing is insufficient, it prescribes lightweight instrumentation (console.error with new Error().stack and contextual data) and targeted test bisection to identify which test or initialization produced the bad input. Finally, it recommends fixes at the source plus defensive validation layers.

When to use it

An error appears deep in the call stack with unclear origin
Long or nested stack traces point to indirect causes
You see incorrect state (wrong directory, path, or resource) propagated from tests or setup
You need to find which test or initialization pollutes shared state
You want to avoid patch fixes that hide the underlying bug

Best practices

Trace upward through callers: fix at the origin, not at the symptom
Log before the risky operation using console.error in tests to ensure output visibility
Include directory, cwd, NODE_ENV and new Error().stack in debug logs
Add defensive validation at API entry points (validate not-empty paths)
Introduce environment guards (refuse destructive ops outside temp dirs) and layered checks

Example use cases

A git init runs in the source tree because an empty cwd resolved to process.cwd(); trace callers to find empty parameter origin
A test creates files in the repo root; bisect tests to find the polluting test and instrument stack traces to locate premature access to tempDir
A database opens the wrong path; add pre-operation logs with stack to show which constructor passed the bad path
A CI flake caused by top-level initialization; convert top-level value to a guarded getter that throws if used before setup
Hard-to-reproduce pollution across many tests: run tests one-by-one with --bail to catch the first offender

FAQ

What if the stack trace is large or minimized?

Instrument with new Error().stack placed just before the risky operation; capture and filter for test file names and line numbers to identify callers.

Should I always add defensive checks after fixing the root cause?

Yes. Add API input validation and environment guards so a single mistake cannot reintroduce the issue; layered defenses reduce blast radius.