home / skills / thebushidocollective / han / legacy-code-safety

legacy-code-safety skill

/plugins/core/skills/legacy-code-safety

This skill guides you through the legacy code safety workflow, adding characterization tests before modifying untested code to reduce risk.

npx playbooks add skill thebushidocollective/han --skill legacy-code-safety

Review the files below or copy the command above to add this skill to your agents.

Files (1)
SKILL.md
14.2 KB
---
name: legacy-code-safety
user-invocable: false
description: Use when modifying, removing, or refactoring code that lacks test coverage. Emphasizes the danger of untested changes and the RGR workflow to add characterization tests before modifications.
allowed-tools:
  - Read
  - Write
  - Edit
  - Bash
  - Grep
  - Glob
---

# Legacy Code Safety

> "Changing untested code is like performing surgery blindfolded."

## DANGER: The Risk of Modifying Untested Code

**Modifying code without tests is one of the HIGHEST-RISK activities in software development.**

When you change untested code:

- You CANNOT know what behaviors you're breaking
- You have NO safety net to catch mistakes
- You're relying on hope instead of evidence
- You're creating technical debt on top of technical debt

**In dynamically-typed languages (Python, JavaScript, Ruby, PHP), this risk is EXPONENTIALLY higher:**

- No compiler to catch type errors
- No static analysis to verify correctness
- Tests are your ONLY safety mechanism
- Runtime is when you discover your mistakes

## The Golden Rule

**NEVER modify untested code without adding tests first.**

If code lacks tests:

1. STOP
2. Add characterization tests
3. Verify tests pass
4. THEN make your changes

## The RGR Workflow for Legacy Code

This is NOT standard TDD. This is **characterization testing** - capturing current behavior before changing it.

### Standard TDD (New Features)

1. RED: Write failing test for NEW behavior
2. GREEN: Implement to make test pass
3. REFACTOR: Clean up implementation

### Legacy Code RGR (Existing Code)

1. RED: Write test that SHOULD pass (captures current behavior)
2. GREEN: Verify it passes (confirms you captured reality correctly)
3. REFACTOR: NOW you can safely modify with a safety net

### The Critical Difference

**TDD:** You know what the behavior SHOULD be, write test for ideal

**Legacy RGR:** You DON'T know all behaviors, capture what EXISTS

### Why "RED" for Passing Tests?

The "RED" step means "you're in the red zone" - working with dangerous untested code. Even though you expect the test to pass, you're at risk until you have that passing test as proof.

## Characterization Testing Process

### Step 1: Identify All Behaviors

Before writing tests, understand what the code currently does:

```typescript
// Legacy function - no tests
function calculateShipping(order) {
  let cost = 9.99

  if (order.total > 100) {
    cost = 0
  }

  if (order.weight > 50) {
    cost += 25
  }

  if (order.destination === 'HI' || order.destination === 'AK') {
    cost += 15
  }

  return cost
}
```

**Questions to answer:**

- What inputs does it accept?
- What outputs does it produce?
- What are the edge cases?
- Are there side effects?
- What happens with invalid input?

### Step 2: Write Tests for Current Behavior

**Document what EXISTS, not what should exist:**

```typescript
describe('calculateShipping (characterization)', () => {
  // Capture base case
  test('standard shipping', () => {
    const order = { total: 50, weight: 10, destination: 'CA' }
    expect(calculateShipping(order)).toBe(9.99)
  })

  // Capture free shipping threshold
  test('free shipping over $100', () => {
    const order = { total: 150, weight: 10, destination: 'CA' }
    expect(calculateShipping(order)).toBe(0)
  })

  // Capture weight surcharge
  test('heavy package surcharge', () => {
    const order = { total: 50, weight: 75, destination: 'CA' }
    expect(calculateShipping(order)).toBe(34.99) // 9.99 + 25
  })

  // Capture regional surcharge
  test('Hawaii surcharge', () => {
    const order = { total: 50, weight: 10, destination: 'HI' }
    expect(calculateShipping(order)).toBe(24.99) // 9.99 + 15
  })

  // Discovered edge case: free shipping + weight
  test('free shipping with heavy package', () => {
    const order = { total: 150, weight: 75, destination: 'CA' }
    expect(calculateShipping(order)).toBe(25) // 0 + 25 (is this right?)
  })

  // Discovered edge case: all conditions
  test('all surcharges with free shipping threshold', () => {
    const order = { total: 150, weight: 75, destination: 'HI' }
    expect(calculateShipping(order)).toBe(40) // 0 + 25 + 15
  })
})
```

### Step 3: Verify Tests Pass

**These tests MUST pass immediately:**

```bash
npm test
```

If they don't pass:

- You misunderstood the current behavior
- Fix the test, not the code
- The test should document reality

### Step 4: NOW You Can Refactor

**With tests in place, you have a safety net:**

```typescript
// NOW we can safely refactor
function calculateShipping(order: Order): number {
  const BASE_COST = 9.99
  const FREE_SHIPPING_THRESHOLD = 100
  const HEAVY_PACKAGE_WEIGHT = 50
  const HEAVY_PACKAGE_SURCHARGE = 25
  const REMOTE_DESTINATIONS = ['HI', 'AK']
  const REMOTE_SURCHARGE = 15

  let cost = order.total > FREE_SHIPPING_THRESHOLD ? 0 : BASE_COST

  if (order.weight > HEAVY_PACKAGE_WEIGHT) {
    cost += HEAVY_PACKAGE_SURCHARGE
  }

  if (REMOTE_DESTINATIONS.includes(order.destination)) {
    cost += REMOTE_SURCHARGE
  }

  return cost
}
```

**Run tests again:**

```bash
npm test  # All tests still pass
```

You've improved the code WITHOUT breaking existing behavior.

## When This Skill Applies

Use this skill when:

- [ ] Deleting code that has no tests
- [ ] Modifying functions without test coverage
- [ ] Refactoring untested modules
- [ ] Changing behavior in legacy systems
- [ ] Test coverage is below 80% for code you're touching
- [ ] You inherit code from previous developers
- [ ] You're working in a codebase with poor test discipline

**Even if the change seems "simple", if there are no tests, use this workflow.**

## Language-Specific Risk Levels

### HIGH RISK: Dynamic/Weakly-Typed Languages

**Python, JavaScript, Ruby, PHP:**

- NO compile-time type checking
- NO static analysis of type correctness
- Tests are your ONLY safety net
- Runtime is when errors appear
- User discovers bugs, not compiler

**Extra caution required:**

- Write MORE tests
- Test edge cases thoroughly
- Check for None/null/undefined
- Verify types explicitly
- Add type hints/annotations when possible

### MEDIUM RISK: Static/Strongly-Typed Languages

**Go, Rust, TypeScript (strict mode), Java:**

- Type system catches some errors
- Compiler validates correctness
- Still need tests for BEHAVIOR
- Types don't test business logic

**Type safety helps, but:**

- Types verify structure, not logic
- Tests still required for correctness
- Don't rely solely on type system

## The Strangler Fig Pattern

For large legacy systems, don't try to test everything at once:

### What Is It?

Named after the strangler fig vine that grows around a tree:

1. Vine grows alongside tree
2. Eventually replaces tree entirely
3. Original tree can be removed

Applied to code:

1. Wrap legacy code with new interface
2. Add tests to the wrapper
3. Gradually move logic to new code
4. Eventually remove legacy code

### Implementation

**Phase 1: Wrap**

```typescript
// Legacy code (untested, scary)
function legacyCalculatePrice(item) {
  // ... 500 lines of spaghetti ...
}

// New wrapper (tested)
function calculatePrice(item: Item): number {
  // Add validation and tests here
  validateItem(item)

  // Call legacy for now
  return legacyCalculatePrice(item)
}
```

**Phase 2: Test the Boundary**

```typescript
describe('calculatePrice wrapper', () => {
  test('validates input', () => {
    expect(() => calculatePrice(null)).toThrow()
  })

  test('handles normal items', () => {
    const item = { id: 1, price: 10 }
    expect(calculatePrice(item)).toBe(10)
  })

  // Characterization tests for legacy behavior
  test('matches legacy for discounts', () => {
    const item = { id: 1, price: 10, discount: 0.2 }
    expect(calculatePrice(item)).toBe(8)
  })
})
```

**Phase 3: Replace Internals**

```typescript
function calculatePrice(item: Item): number {
  validateItem(item)

  // New implementation (tested, clean)
  const basePrice = item.price
  const discount = item.discount ?? 0
  return basePrice * (1 - discount)

  // Legacy code removed!
}
```

**Phase 4: Expand**

Now that one function is safe, repeat for next function.

### Benefits

- Incremental progress
- Always have working system
- Tests added gradually
- Risk minimized
- Can stop anytime

### When to Use

- Large untested codebase
- Can't stop for months to add tests
- Need to deliver features while improving
- High-risk production system

## Pre-Modification Checklist

Before touching ANY untested code:

- [ ] Identified all entry points (who calls this code?)
- [ ] Identified all exit points (what does this code call?)
- [ ] Understood current behavior (what does it do now?)
- [ ] Written characterization tests for current behavior
- [ ] All characterization tests pass
- [ ] Documented any surprising behaviors found
- [ ] Considered using Strangler Fig for large changes

**ONLY AFTER ALL CHECKS PASS:**

- [ ] Make your intended changes
- [ ] Run tests (verify behavior preserved or intentionally changed)
- [ ] Add tests for NEW behavior if applicable
- [ ] Document breaking changes if any

## Red Flags

Stop immediately if:

- "I'll just make a quick change, no need for tests"
- "It's a small change, what could go wrong?"
- "I don't have time to write tests"
- "The code is too complex to test"
- "I'll add tests later" (you won't)

**These are rationalizations for dangerous behavior.**

## Common Objections (and Responses)

### "But tests will take longer than the change!"

**Response:** The initial change is fast. Finding and fixing the bugs you introduced takes much longer. Tests save time.

### "The code is too complex to test!"

**Response:** If it's too complex to test, it's too complex to change safely. Simplify first, or use Strangler Fig pattern.

### "We need to ship this feature NOW!"

**Response:** Shipping a broken feature is slower than shipping a tested feature late. Technical debt compounds.

### "This code has worked fine without tests for years!"

**Response:** Until now. The moment you change it, that safety is gone. Past stability doesn't predict future stability.

### "I understand the code, I won't break it!"

**Response:** Confidence is not a substitute for evidence. Even experts make mistakes.

## Success Metrics

How to know you've done this right:

**Green flags:**

- All characterization tests pass before changes
- All tests still pass after changes
- New behavior has new tests
- Test coverage increased
- Code is more readable
- Confidence is high

**Red flags:**

- Tests fail after changes (and you don't know why)
- Skipped edge cases "to save time"
- Changed behavior without updating tests
- Test coverage decreased or stagnant
- More confused than before

## Integration with Other Skills

Use these related skills:

- **test-driven-development**: For new behavior, use TDD after characterization tests exist
- **refactoring**: With tests in place, refactor safely
- **boy-scout-rule**: While adding tests, improve code quality
- **proof-of-work**: Document test results as evidence
- **debugging**: When characterization tests reveal unexpected behavior

## Tools and Techniques

### Code Coverage Tools

Use coverage tools to identify untested code:

```bash
# JavaScript/TypeScript
npm test -- --coverage

# Python
pytest --cov=mypackage

# Ruby
bundle exec rspec --require spec_helper

# Go
go test -cover ./...
```

**Target coverage:**

- Critical paths: 100%
- Business logic: 90%+
- Overall codebase: 80%+

### Mutation Testing

Verify tests actually catch bugs:

```bash
# JavaScript
npm install -D stryker

# Python
pip install mutmut
```

Mutation testing changes your code slightly and verifies tests fail. If tests still pass after mutation, they're not effective.

### Approval Testing

For complex outputs (HTML, JSON, reports):

```typescript
test('generates invoice', () => {
  const invoice = generateInvoice(order)
  expect(invoice).toMatchSnapshot()
})
```

First run captures current output. Future runs verify it hasn't changed.

## Real-World Example

### Scenario

Legacy payment processor, 2000 lines, no tests, needs to add new payment method.

### Step 1: Identify Scope

Don't test everything - test what you'll touch:

```typescript
// Will modify this
function processPayment(payment, method) {
  // ... many lines ...
}

// Will NOT modify these
function logTransaction(transaction) { }
function sendEmail(user, content) { }
```

### Step 2: Add Characterization Tests

```typescript
describe('processPayment (characterization)', () => {
  test('processes credit card payment', () => {
    const payment = { amount: 100, currency: 'USD' }
    const method = { type: 'credit_card', number: '4111...' }

    const result = processPayment(payment, method)

    expect(result.success).toBe(true)
    expect(result.transactionId).toBeDefined()
    expect(result.amount).toBe(100)
  })

  // More tests for existing payment methods...
})
```

### Step 3: Tests Pass

```bash
npm test
# ✅ All tests passing
```

### Step 4: Add New Feature

```typescript
function processPayment(payment, method) {
  // Existing code preserved
  if (method.type === 'credit_card') {
    return processCreditCard(payment, method)
  }

  // NEW: Add cryptocurrency support
  if (method.type === 'crypto') {
    return processCrypto(payment, method)
  }

  throw new Error(`Unknown payment method: ${method.type}`)
}
```

### Step 5: Add Tests for New Behavior

```typescript
test('processes cryptocurrency payment', () => {
  const payment = { amount: 100, currency: 'USD' }
  const method = { type: 'crypto', address: '0x123...' }

  const result = processPayment(payment, method)

  expect(result.success).toBe(true)
  expect(result.transactionId).toBeDefined()
})
```

### Step 6: All Tests Pass

```bash
npm test
# ✅ All tests passing (old + new)
```

### Result

- New feature added safely
- Existing behavior preserved
- Tests ensure no regression
- Future changes safer

## Remember

1. **Untested code is dangerous code** - especially in dynamic languages
2. **Characterize before changing** - Capture current behavior first
3. **RGR workflow for legacy** - Different from TDD, specific to untested code
4. **Strangler Fig for large systems** - Gradual improvement beats big bang
5. **Tests are proof** - Confidence without evidence is just hope
6. **No shortcuts** - "Quick changes" create slow bugs

**Legacy code is not an excuse for reckless changes. It's a reason for EXTRA caution.**

Overview

This skill helps you safely modify, remove, or refactor legacy code that lacks test coverage. It emphasizes the high risk of changing untested code and guides you through the RGR (Red, Green, Refactor) characterization-testing workflow so you add a safety net before any modification. The outcome is reduced regressions, clearer behavior documentation, and safer incremental improvement.

How this skill works

The skill inspects the target code and guides you to identify inputs, outputs, edge cases, and side effects. You write characterization tests that capture current behavior, verify they pass, then perform refactoring or feature changes with the tests protecting regressions. It also offers strategies like the Strangler Fig pattern and tooling suggestions (coverage, mutation, approval testing).

When to use it

  • Deleting or removing code with no tests
  • Modifying functions or modules that lack tests
  • Refactoring legacy modules that are untested
  • Adding features to code you inherited from others
  • When test coverage is low for the code you will touch
  • When working in dynamically-typed languages with higher runtime risk

Best practices

  • Stop and characterize — never change untested code without tests first
  • Write tests that document existing behavior, not what you assume it should be
  • Verify characterization tests pass before making changes; fix tests if they miscapture behavior
  • Use the Strangler Fig pattern for large systems: wrap, test the boundary, replace internally
  • Use coverage and mutation testing to ensure tests are meaningful
  • After changes, add tests for new behavior and keep coverage improved

Example use cases

  • Refactor a legacy calculateShipping function: add characterization tests, then simplify constants and logic
  • Add a new payment method to a large untested payment processor by testing only the touched boundary first
  • Wrap an untested pricing function with a validated wrapper and add tests to the wrapper before replacing internals
  • Safely remove dead code by first writing tests that capture callers and expected outputs
  • Incrementally modernize a monolith using the Strangler Fig approach while preserving behavior

FAQ

Why write tests that just assert current, possibly incorrect behavior?

Characterization tests record existing behavior so you can change code safely; they prevent accidental regressions and reveal surprising cases to address deliberately.

Is this the same as TDD?

No. TDD targets desired behavior for new code. RGR characterizes unknown existing behavior: write tests that should pass, confirm they pass, then refactor.