home / skills / secondsky / claude-skills / test-quality-analysis

This skill analyzes test quality to detect smells, flaky tests, and coverage gaps, guiding improvements in correctness, reliability, and maintainability.

npx playbooks add skill secondsky/claude-skills --skill test-quality-analysis

Review the files below or copy the command above to add this skill to your agents.

Files (1)
SKILL.md
4.7 KB
---
name: test-quality-analysis
description: Detect test smells, overmocking, flaky tests, and coverage issues. Analyze test effectiveness, maintainability, and reliability. Use when reviewing tests or improving test quality.
allowed-tools: Bash, Read, Edit, Write, Grep, Glob, TodoWrite
---

# Test Quality Analysis

Expert knowledge for analyzing and improving test quality - detecting test smells, overmocking, insufficient coverage, and testing anti-patterns.

## Core Dimensions

- **Correctness**: Tests verify the right behavior
- **Reliability**: Tests are deterministic, not flaky
- **Maintainability**: Tests are easy to understand
- **Performance**: Tests run quickly
- **Coverage**: Tests cover critical code paths
- **Isolation**: Tests don't depend on external state

## Test Smells

### Overmocking

**Problem**: Mocking too many dependencies makes tests fragile.

```typescript
// ❌ BAD: Overmocked
test('calculate total', () => {
  const mockAdd = vi.fn(() => 10)
  const mockMultiply = vi.fn(() => 20)
  // Testing implementation, not behavior
})

// ✅ GOOD: Mock only external dependencies
test('calculate order total', () => {
  const mockPricingAPI = vi.fn(() => ({ tax: 0.1 }))
  const total = calculateTotal(order, mockPricingAPI)
  expect(total).toBe(38)
})
```

**Detection**: More than 3-4 mocks, mocking pure functions, complex mock setup.

**Fix**: Mock only I/O boundaries (APIs, databases, filesystem).

### Fragile Tests

**Problem**: Tests break with unrelated code changes.

```typescript
// ❌ BAD: Tests implementation details
await page.locator('.form-container > div:nth-child(2) > button').click()

// ✅ GOOD: Semantic selector
await page.getByRole('button', { name: 'Submit' }).click()
```

### Flaky Tests

**Problem**: Tests pass or fail non-deterministically.

```typescript
// ❌ BAD: Race condition
test('loads data', async () => {
  fetchData()
  await new Promise(resolve => setTimeout(resolve, 1000))
  expect(data).toBeDefined()
})

// ✅ GOOD: Proper async handling
test('loads data', async () => {
  const data = await fetchData()
  expect(data).toBeDefined()
})
```

### Poor Assertions

```typescript
// ❌ BAD: Weak assertion
test('returns users', async () => {
  const users = await getUsers()
  expect(users).toBeDefined() // Too vague!
})

// ✅ GOOD: Strong, specific assertions
test('creates user with correct attributes', async () => {
  const user = await createUser({ name: 'John' })
  expect(user).toMatchObject({
    id: expect.any(Number),
    name: 'John',
  })
})
```

## Analysis Tools

```bash
# Vitest coverage (prefer bun)
bun test --coverage
open coverage/index.html

# Check thresholds
bun test --coverage --coverage.thresholds.lines=80

# pytest-cov (Python)
uv run pytest --cov --cov-report=html
open htmlcov/index.html
```

## Best Practices Checklist

### Unit Test Quality (FIRST)
- [ ] **Fast**: Tests run in milliseconds
- [ ] **Isolated**: No dependencies between tests
- [ ] **Repeatable**: Same results every time
- [ ] **Self-validating**: Clear pass/fail
- [ ] **Timely**: Written alongside code

### Mock Guidelines
- [ ] Mock only external dependencies
- [ ] Don't mock business logic or pure functions
- [ ] Use real implementations when possible
- [ ] Limit to 3-4 mocks per test maximum

### Coverage Goals
- [ ] 80%+ line coverage for business logic
- [ ] 100% for critical paths (auth, payment)
- [ ] All error paths tested
- [ ] Boundary conditions tested

### Test Structure (AAA Pattern)

```typescript
test('user registration', async () => {
  // Arrange
  const userData = { email: '[email protected]' }

  // Act
  const user = await registerUser(userData)

  // Assert
  expect(user.email).toBe('[email protected]')
})
```

## Code Review Checklist

- [ ] Tests verify behavior, not implementation
- [ ] Assertions are specific and meaningful
- [ ] No flaky tests (timing, ordering issues)
- [ ] Proper async/await usage
- [ ] Test names clearly describe behavior
- [ ] Minimal code duplication
- [ ] Critical paths have tests
- [ ] Both happy path and error cases covered

## Common Anti-Patterns

### Testing Implementation Details

```typescript
// ❌ BAD
const spy = vi.spyOn(Math, 'sqrt')
calculateDistance()
expect(spy).toHaveBeenCalled() // Testing how, not what

// ✅ GOOD
const distance = calculateDistance({ x: 0, y: 0 }, { x: 3, y: 4 })
expect(distance).toBe(5) // Testing output
```

### Mocking Too Much

```typescript
// ❌ BAD
const mockAdd = vi.fn((a, b) => a + b)

// ✅ GOOD: Use real implementations
import { add } from './utils'
// Only mock external services
const mockPaymentGateway = vi.fn()
```

## See Also

- `vitest-testing` - TypeScript/JavaScript testing
- `playwright-testing` - E2E testing
- `mutation-testing` - Validate test effectiveness

Overview

This skill analyzes test suites to detect test smells, overmocking, flaky tests, and coverage gaps. It evaluates effectiveness, maintainability, and reliability of tests and provides actionable guidance to improve test quality. Use it when reviewing tests, preparing for releases, or enforcing testing standards.

How this skill works

The analyzer inspects test code and runtime artifacts to identify anti-patterns like excessive mocks, fragile selectors, weak assertions, and improper async usage. It checks coverage reports, flags missing error-path and boundary tests, and detects flaky patterns such as implicit timing waits or shared state. The output summarizes problems, pinpoints offending files/lines, and recommends concrete fixes and threshold suggestions.

When to use it

  • During pull request reviews to validate new or changed tests
  • When reducing CI flakiness or debugging intermittent failures
  • Before releases to ensure critical paths are covered
  • When setting or auditing coverage and mocking guidelines
  • As part of a test quality improvement initiative

Best practices

  • Mock only external I/O boundaries (APIs, DB, filesystem); avoid mocking pure business logic
  • Keep tests isolated, fast, and repeatable; aim for millisecond unit tests
  • Write strong, specific assertions rather than vague existence checks
  • Use semantic selectors for UI tests (roles/labels) to avoid fragility
  • Limit complex mock setups to 3–4 mocks per test; prefer real implementations where practical
  • Establish coverage goals: ~80%+ for business logic and 100% for critical flows

Example use cases

  • Scan a repository to find overmocked tests and replace mocks with real implementations
  • Analyze flaky CI jobs and locate race conditions or shared state dependencies
  • Evaluate coverage reports to highlight untested error paths and boundary conditions
  • Review E2E selectors and recommend semantic replacements to reduce fragility
  • Enforce a checklist for unit test quality (FAST, Isolated, Repeatable, Self-validating, Timely) during code review

FAQ

How does the tool detect overmocking?

It flags tests with many mocks (typically >3–4), mocks of pure functions or internal modules, and complex mock scaffolding. It recommends mocking only external I/O boundaries.

What patterns indicate flaky tests?

Common signals are implicit timeouts (setTimeout waits), nondeterministic ordering, reliance on shared global state, and missing await on async calls. The analyzer reports occurrences and shows replacement patterns.