home / skills / proffesor-for-testing / agentic-qe / mutation-testing

mutation-testing skill

Q: What is a good mutation score?

Use >80% as a practical target; 90%+ indicates excellent test quality. Interpret scores with context—some surviving mutants may be equivalent or low-value.

safe

/v3/assets/skills/mutation-testing

This skill helps evaluate test quality through mutation testing, identifying weak tests and strengthening suites to reliably catch bugs.

npx playbooks add skill proffesor-for-testing/agentic-qe --skill mutation-testing

Review the files below or copy the command above to add this skill to your agents.

Files (1)

SKILL.md

5.5 KB

---
name: mutation-testing
description: "Test quality validation through mutation testing, assessing test suite effectiveness by introducing code mutations and measuring kill rate. Use when evaluating test quality, identifying weak tests, or proving tests actually catch bugs."
category: specialized-testing
priority: high
tokenEstimate: 900
agents: [qe-test-generator, qe-coverage-analyzer, qe-quality-analyzer]
implementation_status: optimized
optimization_version: 1.0
last_optimized: 2025-12-02
dependencies: []
quick_reference_card: true
tags: [mutation, stryker, test-quality, kill-rate, assertions, effectiveness]
---

# Mutation Testing

<default_to_action>
When validating test quality or improving test effectiveness:
1. MUTATE code (change + to -, >= to >, remove statements)
2. RUN tests against each mutant
3. VERIFY tests catch mutations (kill mutants)
4. IDENTIFY surviving mutants (tests need improvement)
5. STRENGTHEN tests to kill surviving mutants

**Quick Mutation Metrics:**
- Mutation Score = Killed / (Killed + Survived)
- Target: > 80% mutation score
- Surviving mutants = weak tests

**Critical Success Factors:**
- High coverage ≠ good tests (100% coverage, 0% assertions)
- Mutation testing proves tests actually catch bugs
- Focus on critical code paths first
</default_to_action>

## Quick Reference Card

### When to Use
- Evaluating test suite quality
- Finding gaps in test assertions
- Proving tests catch bugs
- Before critical releases

### Mutation Score Interpretation
| Score | Interpretation |
|-------|----------------|
| **90%+** | Excellent test quality |
| **80-90%** | Good, minor improvements |
| **60-80%** | Needs attention |
| **< 60%** | Significant gaps |

### Common Mutation Operators
| Category | Original | Mutant |
|----------|----------|--------|
| **Arithmetic** | `a + b` | `a - b` |
| **Relational** | `x >= 18` | `x > 18` |
| **Logical** | `a && b` | `a \|\| b` |
| **Conditional** | `if (x)` | `if (true)` |
| **Statement** | `return x` | *(removed)* |

---

## How Mutation Testing Works

```javascript
// Original code
function isAdult(age) {
  return age >= 18; // ← Mutant: change >= to >
}

// Strong test (catches mutation)
test('18 is adult', () => {
  expect(isAdult(18)).toBe(true); // Kills mutant!
});

// Weak test (mutation survives)
test('19 is adult', () => {
  expect(isAdult(19)).toBe(true); // Doesn't catch >= vs >
});
// Surviving mutant → Test needs boundary value
```

---

## Using Stryker

```bash
# Install
npm install --save-dev @stryker-mutator/core @stryker-mutator/jest-runner

# Initialize
npx stryker init
```

**Configuration:**
```json
{
  "packageManager": "npm",
  "reporters": ["html", "clear-text", "progress"],
  "testRunner": "jest",
  "coverageAnalysis": "perTest",
  "mutate": [
    "src/**/*.ts",
    "!src/**/*.spec.ts"
  ],
  "thresholds": {
    "high": 90,
    "low": 70,
    "break": 60
  }
}
```

**Run:**
```bash
npx stryker run
```

**Output:**
```
Mutation Score: 87.3%
Killed: 124
Survived: 18
No Coverage: 3
Timeout: 1
```

---

## Fixing Surviving Mutants

```javascript
// Surviving mutant: >= changed to >
function calculateDiscount(quantity) {
  if (quantity >= 10) { // Mutant survives!
    return 0.1;
  }
  return 0;
}

// Original weak test
test('large order gets discount', () => {
  expect(calculateDiscount(15)).toBe(0.1); // Doesn't test boundary
});

// Fixed: Add boundary test
test('exactly 10 gets discount', () => {
  expect(calculateDiscount(10)).toBe(0.1); // Kills mutant!
});

test('9 does not get discount', () => {
  expect(calculateDiscount(9)).toBe(0); // Tests below boundary
});
```

---

## Agent-Driven Mutation Testing

```typescript
// Analyze mutation score and generate fixes
await Task("Mutation Analysis", {
  targetFile: 'src/payment.ts',
  generateMissingTests: true,
  minScore: 80
}, "qe-test-generator");

// Returns:
// {
//   mutationScore: 0.65,
//   survivedMutations: [
//     { line: 45, operator: '>=', mutant: '>', killedBy: null }
//   ],
//   generatedTests: [
//     'test for boundary at line 45'
//   ]
// }

// Coverage + mutation correlation
await Task("Coverage Quality Analysis", {
  coverageData: coverageReport,
  mutationData: mutationReport,
  identifyWeakCoverage: true
}, "qe-coverage-analyzer");
```

---

## Agent Coordination Hints

### Memory Namespace
```
aqe/mutation-testing/
├── mutation-results/*   - Stryker reports
├── surviving/*          - Surviving mutants
├── generated-tests/*    - Tests to kill mutants
└── trends/*             - Mutation score over time
```

### Fleet Coordination
```typescript
const mutationFleet = await FleetManager.coordinate({
  strategy: 'mutation-testing',
  agents: [
    'qe-test-generator',     // Generate tests for survivors
    'qe-coverage-analyzer',  // Coverage correlation
    'qe-quality-analyzer'    // Quality assessment
  ],
  topology: 'sequential'
});
```

---

## Related Skills
- [tdd-london-chicago](../tdd-london-chicago/) - Write effective tests first
- [test-design-techniques](../test-design-techniques/) - Boundary value analysis
- [quality-metrics](../quality-metrics/) - Measure test effectiveness

---

## Remember

**High code coverage ≠ good tests.** 100% coverage but weak assertions = useless. Mutation testing proves tests actually catch bugs.

**Focus on critical paths first.** Don't mutation test everything - prioritize payment, authentication, data integrity code.

**With Agents:** Agents run mutation analysis, identify surviving mutants, and generate missing test cases to kill them. Automated improvement of test quality.

Overview

This skill validates test quality using mutation testing by introducing small code changes (mutants) and measuring whether the test suite detects them. It reports a mutation score and surfaces surviving mutants so you can pinpoint weak or missing assertions. Use it to prove tests actually catch bugs and to prioritize strengthening tests on critical code paths. The skill integrates with common tools (e.g., Stryker) and agents that can auto-generate suggested tests.

How this skill works

The skill mutates target source files with common operators (arithmetic, relational, logical, conditional, statement removal) and runs the test suite against each mutant. It classifies mutants as killed, survived, no-coverage, or timed out and computes a mutation score (Killed / (Killed + Survived)). Surviving mutants are returned with file/line/operator context so you can add or improve tests (agents can also propose or generate boundary tests).

When to use it

Evaluating overall test suite effectiveness beyond coverage numbers
Finding gaps in assertions and boundary tests
Before critical releases or high-risk feature launches
Prioritizing improvements on payment, auth, and data integrity code paths
Validating that CI tests will catch realistic regressions

Best practices

Prioritize mutation testing for critical or high-risk modules rather than the entire codebase
Use per-test coverage mode to map mutants to the tests that exercise them
Aim for a mutation score target (commonly >80%) but focus on meaningful surviving mutants
Add boundary and negative tests to kill relational and conditional mutants
Combine mutation results with coverage data to separate untested code from weak tests
Filter out equivalent mutants and noise before changing production logic

Example use cases

Run Stryker on src/**/*.ts to get a mutation report and list of surviving mutants
Have an agent analyze mutation results and generate boundary tests for surviving relational mutants
Use mutation testing in CI gates for core modules with thresholds (break <60%)
Correlate mutation and coverage reports to find files with coverage but weak assertions
Track mutation score trends over time in memory namespace for quality dashboards

FAQ

What is a good mutation score?

Use >80% as a practical target; 90%+ indicates excellent test quality. Interpret scores with context—some surviving mutants may be equivalent or low-value.

Does high coverage mean good tests?

No. High line coverage can coexist with weak assertions. Mutation testing proves whether tests actually detect injected faults.

How do I act on surviving mutants?

Review surviving mutants by file/line/operator, add boundary and negative tests that exercise the mutated behavior, and re-run until the mutant is killed.