home / skills / jmagly / aiwg / mutation-test

mutation-test skill

safe

/agentic/code/addons/testing-quality/skills/mutation-test

This skill runs mutation testing to evaluate test effectiveness and reveal weak tests, improving test quality beyond coverage.

npx playbooks add skill jmagly/aiwg --skill mutation-test

Review the files below or copy the command above to add this skill to your agents.

Files (1)

SKILL.md

7.1 KB

---
name: mutation-test
description: Run mutation testing to validate test quality beyond code coverage. Use when assessing test effectiveness, finding weak tests, or validating test suite quality.
version: 1.0.0
---

# Mutation Test Skill

## Purpose

Run mutation testing to measure test suite effectiveness. Mutation testing introduces small changes (mutants) to code and checks if tests catch them. High coverage with low mutation score indicates weak tests.

## Research Foundation

| Concept | Source | Reference |
|---------|--------|-----------|
| Mutation Testing Theory | IEEE TSE (2019) | Papadakis et al. "Mutation Testing Advances" |
| ICST Mutation Workshop | IEEE Annual | [Mutation 2024](https://conf.researchr.org/home/icst-2024/mutation-2024) |
| Stryker Mutator | Industry Tool | [stryker-mutator.io](https://stryker-mutator.io/) |
| PITest | Java Tool | [pitest.org](https://pitest.org/) |
| mutmut | Python Tool | [github.com/boxed/mutmut](https://github.com/boxed/mutmut) |

## When This Skill Applies

- User asks to "validate test quality" or "check test effectiveness"
- User mentions "mutation testing" or "mutation score"
- User wants to know if tests are "actually testing anything"
- High coverage but bugs still escaping
- Assessing test suite health
- Pre-release quality validation

## Trigger Phrases

| Natural Language | Action |
|------------------|--------|
| "Run mutation testing" | Execute mutation analysis |
| "Check if my tests are effective" | Run mutation + analyze |
| "Validate test quality" | Mutation score report |
| "Are my tests catching real bugs?" | Mutation analysis |
| "Find weak tests" | Identify low-score tests |
| "Why did this bug escape tests?" | Mutation analysis on module |

## Mutation Testing Concepts

### What is a Mutant?

A mutant is a small code change that should cause tests to fail:

```javascript
// Original
if (age >= 18) { return "adult"; }

// Mutant 1: Changed >= to >
if (age > 18) { return "adult"; }

// Mutant 2: Changed >= to ==
if (age == 18) { return "adult"; }

// Mutant 3: Changed "adult" to ""
if (age >= 18) { return ""; }
```

### Mutation Operators

| Operator | Example | Tests |
|----------|---------|-------|
| Arithmetic | `+` → `-` | Math operations |
| Relational | `>=` → `>` | Boundary conditions |
| Logical | `&&` → `\|\|` | Boolean logic |
| Literal | `true` → `false` | Constant handling |
| Return | `return x` → `return null` | Return value handling |

### Mutation Score

```
Mutation Score = (Killed Mutants / Total Mutants) × 100
```

| Score | Quality | Interpretation |
|-------|---------|----------------|
| 90%+ | Excellent | Tests are highly effective |
| 80-89% | Good | Target for production |
| 60-79% | Adequate | Room for improvement |
| <60% | Poor | Tests need significant work |

## Implementation Process

### 1. Detect Project and Install Tool

```python
def setup_mutation_tool(project_type):
    if project_type == "javascript":
        # Install Stryker
        return "npx stryker init"
    elif project_type == "python":
        # Install mutmut
        return "pip install mutmut"
    elif project_type == "java":
        # PITest via Maven/Gradle
        return "Add pitest plugin to pom.xml"
```

### 2. Configure Mutation Testing

**Stryker (JavaScript)**:
```json
// stryker.config.json
{
  "mutate": ["src/**/*.ts", "!src/**/*.test.ts"],
  "testRunner": "vitest",
  "reporters": ["html", "progress"],
  "coverageAnalysis": "perTest",
  "thresholds": {
    "high": 80,
    "low": 60,
    "break": 50
  }
}
```

**mutmut (Python)**:
```ini
# setup.cfg
[mutmut]
paths_to_mutate=src/
tests_dir=tests/
runner=pytest
```

**PITest (Java)**:
```xml
<!-- pom.xml -->
<plugin>
    <groupId>org.pitest</groupId>
    <artifactId>pitest-maven</artifactId>
    <version>1.15.0</version>
    <configuration>
        <targetClasses>
            <param>com.example.*</param>
        </targetClasses>
        <mutationThreshold>80</mutationThreshold>
    </configuration>
</plugin>
```

### 3. Run Mutation Analysis

```bash
# JavaScript
npx stryker run

# Python
mutmut run

# Java
mvn org.pitest:pitest-maven:mutationCoverage
```

### 4. Parse and Report Results

```python
def parse_mutation_results(report_path):
    """Parse mutation testing report"""
    return {
        "total_mutants": 150,
        "killed": 120,
        "survived": 25,
        "timeout": 5,
        "mutation_score": 80.0,
        "survivors": [
            {
                "file": "src/auth/validate.ts",
                "line": 45,
                "mutator": "RelationalOperator",
                "original": "age >= 18",
                "mutant": "age > 18",
                "status": "survived"
            }
            # ... more survivors
        ]
    }
```

## Output Format

```markdown
## Mutation Testing Report

**Module**: src/auth/
**Test Suite**: test/auth/

### Summary

| Metric | Value |
|--------|-------|
| Total Mutants | 150 |
| Killed | 120 (80%) |
| Survived | 25 (17%) |
| Timeout | 5 (3%) |
| **Mutation Score** | **80%** |

### Status: PASSED (threshold: 80%)

### Survived Mutants (Highest Priority)

#### 1. `src/auth/validate.ts:45`
```diff
- if (age >= 18) { return "adult"; }
+ if (age > 18) { return "adult"; }
```
**Problem**: Boundary condition not tested
**Fix**: Add test case for `age = 18`

#### 2. `src/auth/login.ts:23`
```diff
- if (attempts < maxAttempts) { allow(); }
+ if (attempts <= maxAttempts) { allow(); }
```
**Problem**: Off-by-one boundary not tested
**Fix**: Add test for `attempts = maxAttempts`

### Recommended Test Improvements

1. **Add boundary tests** for `validate.ts` (3 survivors)
2. **Add error path tests** for `login.ts` (2 survivors)
3. **Test null/undefined cases** in `session.ts` (1 survivor)

### Coverage vs Mutation Score

| File | Line Coverage | Mutation Score | Gap |
|------|--------------|----------------|-----|
| validate.ts | 95% | 72% | 23% |
| login.ts | 88% | 85% | 3% |
| session.ts | 100% | 91% | 9% |

*High coverage with low mutation score indicates weak assertions*
```

## Integration with CI

### GitHub Actions Integration

```yaml
- name: Run mutation testing
  run: npx stryker run --reporters json

- name: Check mutation threshold
  run: |
    SCORE=$(jq '.metrics.mutationScore' reports/mutation/stryker-incremental.json)
    if (( $(echo "$SCORE < 80" | bc -l) )); then
      echo "::error::Mutation score $SCORE% below 80% threshold"
      exit 1
    fi
```

## Optimization Tips

### Incremental Mutation Testing

Only test changed code:
```bash
# Stryker incremental
npx stryker run --incremental

# PITest history
mvn pitest:mutationCoverage -DwithHistory
```

### Target Critical Modules First

```json
{
  "mutate": [
    "src/auth/**/*.ts",
    "src/payment/**/*.ts",
    "src/validation/**/*.ts"
  ]
}
```

## Related Skills

- `tdd-enforce` - Enforce test-first development
- `flaky-detect` - Identify unreliable tests
- `test-sync` - Maintain test-code alignment

## Script Reference

### mutation_runner.py
Run mutation testing for project:
```bash
python scripts/mutation_runner.py --module src/auth
```

### mutation_analyzer.py
Analyze and prioritize survivors:
```bash
python scripts/mutation_analyzer.py --report stryker-report.json
```

Overview

This skill runs mutation testing to measure how effective a test suite really is, not just how much code it covers. It introduces small code changes (mutants), runs tests, and reports which mutants survive so you can find weak assertions and untested behaviors. Use it to raise test quality, prioritize test improvements, and gate releases by mutation score.

How this skill works

The skill detects project type, installs and configures an appropriate mutator (Stryker for TypeScript/JavaScript, mutmut for Python, PITest for Java), runs mutation analysis, and parses the resulting report. It calculates mutation score, lists survived mutants with file/line/mutator details, and produces prioritized remediation suggestions and CI checks.

When to use it

Validating test effectiveness beyond line coverage
Investigating why bugs escape the test suite
Assessing test suite health before releases
Finding weak or missing assertions and boundary cases
Targeting critical modules for deeper testing

Best practices

Run mutation testing against critical modules first to reduce runtime and focus effort
Use per-test coverage or incremental mode to speed feedback
Set a realistic threshold (e.g., 80%) and fail CI when score drops below it
Prioritize surviving mutants by impact (security, payment, auth) and fix tests with clear examples
Combine mutation reports with coverage data to find high-coverage, low-quality tests

Example use cases

Add mutation testing to CI to prevent regressions in test quality
Audit a codebase where high coverage still lets bugs through—identify missing assertions
Run incremental mutation testing during feature development for fast feedback
Produce a prioritized remediation plan showing survived mutants and exact test cases to add
Validate test-suite improvements by tracking mutation score over time

FAQ

How long does mutation testing take?

Runtime depends on project size and configuration; focus on critical modules and use incremental/per-test coverage to shorten feedback loops.

What does a low mutation score mean?

A low score indicates tests are not catching realistic code changes—often missing boundary cases, weak assertions, or untested error paths.

Can I run mutation testing in CI?

Yes. Export JSON reports and fail a CI step when the mutation score falls below your threshold (example: 80%). Use incremental runs for regular CI to limit cost.