home / skills / rshankras / claude-code-apple-skills / characterization-test-generator
This skill generates characterization tests that capture current code behavior to safely guide refactoring and protect legacy code.
npx playbooks add skill rshankras/claude-code-apple-skills --skill characterization-test-generatorReview the files below or copy the command above to add this skill to your agents.
---
name: characterization-test-generator
description: Generates tests that capture current behavior of existing code before refactoring. Use when you need a safety net before AI-assisted refactoring or modifying legacy code.
allowed-tools: [Read, Write, Edit, Glob, Grep, Bash, AskUserQuestion]
---
# Characterization Test Generator
Generate tests that document what existing code **actually does** — not what it should do. These tests capture current behavior so you can refactor with confidence, especially when using AI to modify code.
## When This Skill Activates
Use this skill when the user:
- Says "I need to refactor this" or "help me refactor"
- Wants to "add tests before changing code"
- Asks for "characterization tests" or "golden master tests"
- Says "I want to safely modify this class/module"
- Mentions "legacy code" or "untested code"
- Wants AI to refactor but is worried about breaking things
## Why This Matters for AI-Generated Code
AI refactoring is powerful but risky:
- AI doesn't remember **why** code works a certain way
- AI may "improve" code and break subtle behavior
- Characterization tests freeze current behavior as a regression suite
- If any test fails after refactoring, you know something changed
## Process
### Phase 1: Discover Code Under Test
```
Glob: **/*.swift (in the target module)
Grep: "class |struct |enum |protocol |func " (public API surface)
```
Identify:
- [ ] Public types and their public methods
- [ ] Input types and return types
- [ ] Side effects (network, disk, UserDefaults, notifications)
- [ ] Dependencies (injected or created internally)
- [ ] State mutations (properties that change)
### Phase 2: Classify Behavior
For each public method, classify:
| Category | Example | Test Strategy |
|----------|---------|---------------|
| Pure computation | `calculate(a, b) -> c` | Input/output pairs |
| State mutation | `addItem(_:)` modifies array | Before/after state checks |
| Async operation | `fetchData() async -> [Item]` | Mock dependencies, verify results |
| Side effect | `save()` writes to disk | Verify mock was called |
| Event emission | `delegate?.didUpdate()` | Capture delegate calls |
| Error path | Throws on invalid input | Verify correct error type |
### Phase 3: Generate Tests
#### Test Naming Convention
Characterization tests should be clearly labeled:
```swift
import Testing
@testable import YourApp
@Suite("Characterization: ItemManager")
struct ItemManagerCharacterizationTests {
// Tests document CURRENT behavior, not ideal behavior
}
```
#### Template: Pure Computation
```swift
@Test("current behavior: calculates total with tax")
func calculatesTotal() {
let calculator = PriceCalculator()
let result = calculator.total(subtotal: 100.0, taxRate: 0.08)
// Document actual behavior — even if the rounding seems wrong
#expect(result == 108.0)
}
```
#### Template: State Mutation
```swift
@Test("current behavior: addItem appends and sorts by date")
func addItemSortsbyDate() {
let manager = ItemManager()
let older = Item(title: "A", date: .distantPast)
let newer = Item(title: "B", date: .now)
manager.addItem(older)
manager.addItem(newer)
// Document actual ordering behavior
#expect(manager.items.first?.title == "A")
#expect(manager.items.last?.title == "B")
#expect(manager.items.count == 2)
}
```
#### Template: Async with Dependencies
```swift
@Test("current behavior: loadItems returns cached data when offline")
func loadItemsOffline() async throws {
let mockNetwork = MockNetworkClient(shouldFail: true)
let mockCache = MockCache(items: [Item.sample])
let service = ItemService(network: mockNetwork, cache: mockCache)
let items = try await service.loadItems()
// Document: falls back to cache when network fails
#expect(items.count == 1)
#expect(items.first?.title == Item.sample.title)
}
```
#### Template: Side Effects
```swift
@Test("current behavior: save writes to UserDefaults")
func saveWritesToDefaults() {
let defaults = MockUserDefaults()
let settings = SettingsManager(defaults: defaults)
settings.setTheme(.dark)
settings.save()
// Document: saves as string, not as enum raw value
#expect(defaults.lastSetValue as? String == "dark")
#expect(defaults.lastSetKey == "app_theme")
}
```
#### Template: Error Paths
```swift
@Test("current behavior: throws on empty title")
func throwsOnEmptyTitle() {
let validator = ItemValidator()
#expect(throws: ValidationError.emptyField("title")) {
try validator.validate(Item(title: "", date: .now))
}
}
```
#### Template: Edge Cases
```swift
@Test("current behavior: handles nil optional gracefully")
func handlesNilOptional() {
let parser = DataParser()
let result = parser.parse(data: nil)
// Document: returns empty array on nil, doesn't crash
#expect(result.isEmpty)
}
@Test("current behavior: handles empty collection")
func handlesEmptyCollection() {
let aggregator = StatsAggregator()
let stats = aggregator.compute(values: [])
// Document: returns zeroes, not NaN or crash
#expect(stats.average == 0.0)
#expect(stats.count == 0)
}
```
### Phase 4: Fill in Actual Values
This is the key step. For each test:
1. **Read the source code** to understand what the method actually returns
2. **Trace the logic** for the given inputs
3. **Write the assertion with the actual value**, even if it seems wrong
```swift
// ❌ Wrong — this is what you WANT it to do
#expect(result == expectedCorrectValue)
// ✅ Right — this is what it ACTUALLY does
#expect(result == actualCurrentValue) // Note: off-by-one, but current behavior
```
**Add comments for surprising behavior:**
```swift
// CHARACTERIZATION: This returns 11 not 10 due to inclusive range.
// Don't "fix" this until intentionally changing behavior.
#expect(range.count == 11)
```
### Phase 5: Verify and Lock
1. **Run all characterization tests** — they must ALL pass
2. **If any fail**, adjust assertions to match actual behavior
3. **Tag them** so they're easy to find later:
```swift
@Suite("Characterization: ItemManager")
@Tag(.characterization)
struct ItemManagerCharacterizationTests { ... }
// Define the tag
extension Tag {
@Tag static var characterization: Self
}
```
4. **Run selectively:**
```bash
# Run only characterization tests
xcodebuild test -scheme YourApp \
-only-testing "YourAppTests/ItemManagerCharacterizationTests"
```
## Output Format
```markdown
## Characterization Tests Generated
**Module**: [Module name]
**Classes tested**: [List]
**Tests generated**: [Count]
### Coverage Summary
| Class | Methods Covered | Edge Cases | Notes |
|-------|----------------|------------|-------|
| ItemManager | 5/7 | 3 | 2 private methods skipped |
| PriceCalculator | 3/3 | 2 | All public API covered |
### Files Created
- `Tests/CharacterizationTests/ItemManagerCharacterizationTests.swift`
- `Tests/CharacterizationTests/PriceCalculatorCharacterizationTests.swift`
### Surprising Behaviors Found
- `PriceCalculator.total()` rounds DOWN, not to nearest cent
- `ItemManager.sort()` is unstable — equal dates may reorder
### Ready to Refactor
All [X] characterization tests passing. Safe to refactor with AI.
Run `xcodebuild test` after each change to verify no behavior changed.
```
## When NOT to Use This
- Code is already well-tested (check coverage first)
- You're deleting the code entirely (no need to characterize)
- The code is trivially simple (single-line computed properties)
- You **want** to change the behavior (use `tdd-bug-fix` instead)
## References
- Michael Feathers, *Working Effectively with Legacy Code* — coined "characterization test"
- `generators/test-generator/` — for standard test generation (not characterization)
- `testing/tdd-refactor-guard/` — pre-refactor checklist that uses these tests
This skill generates characterization tests that document the current behavior of Swift code before refactoring. It creates focused tests that capture actual outputs, side effects, and state changes so you have a regression safety net when using AI-assisted or manual refactors.
The skill scans the target module for public types and methods, identifies inputs, outputs, side effects, and state mutations, then classifies each method (pure, stateful, async, side-effecting, error path). It emits test templates with concrete assertions that reflect the code's current behavior and recommends running and locking those tests before any changes.
Are characterization tests the same as unit tests?
They use the same mechanics, but their purpose is to record current behavior (including bugs or oddities) rather than assert intended design. Treat them as a safety net.
Should I keep characterization tests after refactoring?
Yes if you want to preserve behavior. If you intentionally change behavior, update or replace the tests to reflect the new contract.