home / skills / petekp / claude-code-setup / tdd

tdd skill

safe

This skill guides test-driven development with red-green-refactor loops, enabling you to write integration tests first and drive minimal code changes.

npx playbooks add skill petekp/claude-code-setup --skill tdd

Review the files below or copy the command above to add this skill to your agents.

Files (6)

SKILL.md

4.1 KB

---
name: tdd
description: Test-driven development with red-green-refactor loop. Use when user wants to build features or fix bugs using TDD, mentions "red-green-refactor", wants integration tests, or asks for test-first development.
---

# Test-Driven Development

## Philosophy

**Core principle**: Tests should verify behavior through public interfaces, not implementation details. Code can change entirely; tests shouldn't.

**Good tests** are integration-style: they exercise real code paths through public APIs. They describe _what_ the system does, not _how_ it does it. A good test reads like a specification - "user can checkout with valid cart" tells you exactly what capability exists. These tests survive refactors because they don't care about internal structure.

**Bad tests** are coupled to implementation. They mock internal collaborators, test private methods, or verify through external means (like querying a database directly instead of using the interface). The warning sign: your test breaks when you refactor, but behavior hasn't changed. If you rename an internal function and tests fail, those tests were testing implementation, not behavior.

See [tests.md](tests.md) for examples and [mocking.md](mocking.md) for mocking guidelines.

## Anti-Pattern: Horizontal Slices

**DO NOT write all tests first, then all implementation.** This is "horizontal slicing" - treating RED as "write all tests" and GREEN as "write all code."

This produces **crap tests**:

- Tests written in bulk test _imagined_ behavior, not _actual_ behavior
- You end up testing the _shape_ of things (data structures, function signatures) rather than user-facing behavior
- Tests become insensitive to real changes - they pass when behavior breaks, fail when behavior is fine
- You outrun your headlights, committing to test structure before understanding the implementation

**Correct approach**: Vertical slices via tracer bullets. One test → one implementation → repeat. Each test responds to what you learned from the previous cycle. Because you just wrote the code, you know exactly what behavior matters and how to verify it.

```
WRONG (horizontal):
  RED:   test1, test2, test3, test4, test5
  GREEN: impl1, impl2, impl3, impl4, impl5

RIGHT (vertical):
  RED→GREEN: test1→impl1
  RED→GREEN: test2→impl2
  RED→GREEN: test3→impl3
  ...
```

## Workflow

### 1. Planning

Before writing any code:

- [ ] Confirm with user what interface changes are needed
- [ ] Confirm with user which behaviors to test (prioritize)
- [ ] Identify opportunities for [deep modules](deep-modules.md) (small interface, deep implementation)
- [ ] Design interfaces for [testability](interface-design.md)
- [ ] List the behaviors to test (not implementation steps)
- [ ] Get user approval on the plan

Ask: "What should the public interface look like? Which behaviors are most important to test?"

**You can't test everything.** Confirm with the user exactly which behaviors matter most. Focus testing effort on critical paths and complex logic, not every possible edge case.

### 2. Tracer Bullet

Write ONE test that confirms ONE thing about the system:

```
RED:   Write test for first behavior → test fails
GREEN: Write minimal code to pass → test passes
```

This is your tracer bullet - proves the path works end-to-end.

### 3. Incremental Loop

For each remaining behavior:

```
RED:   Write next test → fails
GREEN: Minimal code to pass → passes
```

Rules:

- One test at a time
- Only enough code to pass current test
- Don't anticipate future tests
- Keep tests focused on observable behavior

### 4. Refactor

After all tests pass, look for [refactor candidates](refactoring.md):

- [ ] Extract duplication
- [ ] Deepen modules (move complexity behind simple interfaces)
- [ ] Apply SOLID principles where natural
- [ ] Consider what new code reveals about existing code
- [ ] Run tests after each refactor step

**Never refactor while RED.** Get to GREEN first.

## Checklist Per Cycle

```
[ ] Test describes behavior, not implementation
[ ] Test uses public interface only
[ ] Test would survive internal refactor
[ ] Code is minimal for this test
[ ] No speculative features added
```

Overview

This skill guides developers through test-driven development using the red-green-refactor loop, focused on behavior-first, integration-style tests. It emphasizes vertical tracer-bullet cycles: one failing test, minimal implementation to pass, then refactor. Use it to build features or fix bugs with test-first discipline.

How this skill works

I steer you through planning, writing a single end-to-end test (the tracer bullet), implementing the smallest amount of code to make that test pass, and then refactoring safely. Tests should exercise public interfaces and describe observable behavior so they survive internal changes. The loop repeats until the targeted behaviors are covered, then you run a disciplined refactor step with tests green.

When to use it

Starting a new feature and you want test-first development
Fixing a bug and verifying behavior before changing code
When you need integration-style tests that exercise public APIs
When you want to avoid speculative or implementation-coupled tests
When you need a repeatable red→green→refactor workflow for quality

Best practices

Write tests against the public interface only; avoid mocking internal collaborators
Do one test → one implement → repeat; avoid writing many tests up front
Keep each implementation minimal to satisfy the current test—no speculative features
Use tracer bullets: start with a single end-to-end test that validates the path
Only refactor after tests are green and run tests after each refactor step

Example use cases

Add checkout flow: write a test that a user can complete checkout, implement minimal handlers, then refactor
Fix flaky failure: write a failing integration test that reproduces the bug, implement fix, keep tests focused on behavior
Expose a new public API: design the interface first, write tests for expected behavior, implement incrementally
Convert brittle unit tests to integration-style tests that validate end-to-end behavior
Iteratively develop complex logic behind a small interface using deep modules

FAQ

What counts as a good test in this workflow?

A good test verifies observable behavior through the public interface, reads like a spec, and continues to pass after internal refactors.

Should I mock dependencies?

Prefer real integrations for behavior tests; only mock external systems when isolation or speed is required, and avoid mocking internal collaborators.