home / skills / levnikolaevich / claude-code-skills / ln-630-test-auditor

ln-630-test-auditor skill

/ln-630-test-auditor

This skill coordinates a comprehensive test suite audit, delegating to five workers and delivering a unified report with actionable results.

npx playbooks add skill levnikolaevich/claude-code-skills --skill ln-630-test-auditor

Review the files below or copy the command above to add this skill to your agents.

Files (2)
SKILL.md
15.6 KB
---
name: ln-630-test-auditor
description: Test suite audit coordinator (L2). Delegates to 5 workers (Business Logic, E2E, Value, Coverage, Isolation). Aggregates results, creates Linear task in Epic 0.
allowed-tools: Read, Grep, Glob, Bash, mcp__Ref, mcp__context7, mcp__linear-server, Skill
---

> **Paths:** File paths (`shared/`, `references/`, `../ln-*`) are relative to skills repo root. If not found at CWD, locate this SKILL.md directory and go up one level for repo root.

# Test Suite Auditor (L2 Coordinator)

Coordinates comprehensive test suite audit across 6 quality categories using 5 specialized workers.

## Purpose & Scope

- **L2 Coordinator** that delegates to L3 specialized audit workers
- Audits all tests against 6 quality categories (via 5 workers)
- Calculates **Usefulness Score** for each test (Keep/Remove/Refactor)
- Identifies missing tests for critical business logic
- Detects anti-patterns and isolation issues
- Aggregates results into unified report
- Creates single Linear task in Epic 0
- Manual invocation by user; not part of Story pipeline

## Core Philosophy

> "Write tests. Not too many. Mostly integration." — Kent Beck
> "Test based on risk, not coverage." — ISO 29119

**Key Principles:**
1. **Test business logic, not frameworks** — bcrypt/Prisma/Express already tested
2. **No performance/load/stress tests** — Tests infrastructure, not code correctness (use k6/JMeter separately)
3. **Risk-based prioritization** — Priority ≥15 or remove
4. **E2E for critical paths only** — Money/Security/Data (Priority ≥20)
5. **Usefulness over quantity** — One useful test > 10 useless tests
6. **Every test must justify existence** — Impact × Probability ≥15

## Workflow

### Phase 1: Discovery (Automated)

**Inputs:** Codebase root directory

**Actions:**
1. Find all test files using Glob:
   - `**/*.test.*` (Jest, Vitest)
   - `**/*.spec.*` (Mocha, Jasmine)
   - `**/__tests__/**/*` (Jest convention)
2. Parse test file structure (test names, assertions count)
3. Auto-discover Team ID from [docs/tasks/kanban_board.md](../docs/tasks/kanban_board.md)

**Output:** `testFilesMetadata` — list of test files with basic stats

### Phase 2: Research Best Practices (ONCE)

**Goal:** Gather testing best practices context ONCE, share with all workers

**Actions:**
1. Use MCP Ref/Context7 to research testing best practices for detected tech stack
2. Load [../shared/references/risk_based_testing_guide.md](../shared/references/risk_based_testing_guide.md)
3. Build `contextStore` with:
   - Testing philosophy (E2E primary, Unit supplementary)
   - Usefulness Score formulas (Impact × Probability)
   - Anti-patterns catalog
   - Framework detection patterns

**Output:** `contextStore` — shared context for all workers

**Key Benefit:** Context gathered ONCE → passed to all workers → token-efficient

### Phase 3: Domain Discovery (NEW)

**Purpose:** Detect project domains from production code folder structure for domain-aware coverage analysis.

**Algorithm:** (same as ln-360-codebase-auditor)

1. **Priority 1: Explicit domain folders**
   - Check for: `src/domains/*/`, `src/features/*/`, `src/modules/*/`
   - Monorepo patterns: `packages/*/`, `libs/*/`, `apps/*/`
   - If found (>1 match) → use as domains

2. **Priority 2: Top-level src/* folders**
   - List folders: `src/users/`, `src/orders/`, `src/payments/`
   - Exclude infrastructure: `utils`, `shared`, `common`, `lib`, `helpers`, `config`, `types`, `interfaces`, `constants`, `middleware`, `infrastructure`, `core`
   - If remaining >1 → use as domains

3. **Priority 3: Fallback to global mode**
   - If <2 domains detected → `domain_mode = "global"`
   - All workers scan entire codebase (backward-compatible behavior)

**Heuristics for domain detection:**

| Heuristic | Indicator | Example |
|-----------|-----------|---------|
| File count | >5 files in folder | `src/users/` with 12 files |
| Structure | controllers/, services/, models/ present | MVC/Clean Architecture |
| Barrel export | index.ts/index.js exists | Module pattern |
| README | README.md describes domain | Domain documentation |

**Output:**
```json
{
  "domain_mode": "domain-aware",
  "all_domains": [
    {"name": "users", "path": "src/users", "file_count": 45},
    {"name": "orders", "path": "src/orders", "file_count": 32},
    {"name": "shared", "path": "src/shared", "file_count": 15, "is_shared": true}
  ]
}
```

**Shared folder handling:**
- Folders named `shared`, `common`, `utils`, `lib`, `core` → mark `is_shared: true`
- Shared code audited but grouped separately in report

### Phase 4: Delegate to Workers

> **CRITICAL:** All delegations use Task tool with `subagent_type: "general-purpose"` for context isolation.

**Prompt template:**
```
Task(description: "Test audit via ln-63X",
     prompt: "Execute ln-63X-{worker}. Read skill from ln-63X-{worker}/SKILL.md. Context: {contextStore}",
     subagent_type: "general-purpose")
```

**Anti-Patterns:**
- ❌ Direct Skill tool invocation without Task wrapper
- ❌ Any execution bypassing subagent context isolation

#### Phase 4a: Global Workers (PARALLEL)

**Global workers** scan entire test suite (not domain-aware):

| # | Worker | Category | What It Audits |
|---|--------|----------|----------------|
| 1 | [ln-631-test-business-logic-auditor](../ln-631-test-business-logic-auditor/) | Business Logic Focus | Framework/Library tests (Prisma, Express, bcrypt, JWT, axios, React hooks) → REMOVE |
| 2 | [ln-632-test-e2e-priority-auditor](../ln-632-test-e2e-priority-auditor/) | E2E Priority | E2E baseline (2/endpoint), Pyramid validation, Missing E2E tests |
| 3 | [ln-633-test-value-auditor](../ln-633-test-value-auditor/) | Risk-Based Value | Usefulness Score = Impact × Probability<br>Decisions: ≥15 KEEP, 10-14 REVIEW, <10 REMOVE |
| 5 | [ln-635-test-isolation-auditor](../ln-635-test-isolation-auditor/) | Isolation + Anti-Patterns | Isolation (6 categories), Determinism, Anti-Patterns (6 types) |

**Invocation (4 workers in PARALLEL):**
```javascript
FOR EACH worker IN [ln-631, ln-632, ln-633, ln-635]:
  Task(description: "Test audit via " + worker,
       prompt: "Execute " + worker + ". Read skill. Context: " + JSON.stringify(contextStore),
       subagent_type: "general-purpose")
```

#### Phase 4b: Domain-Aware Worker (PARALLEL per domain)

**Domain-aware worker** runs once per domain:

| # | Worker | Category | What It Audits |
|---|--------|----------|----------------|
| 4 | [ln-634-test-coverage-auditor](../ln-634-test-coverage-auditor/) | Coverage Gaps | Missing tests for critical paths per domain (Money 20+, Security 20+, Data 15+, Core Flows 15+) |

**Invocation:**
```javascript
IF domain_mode == "domain-aware":
  FOR EACH domain IN all_domains:
    domain_context = {
      ...contextStore,
      domain_mode: "domain-aware",
      current_domain: { name: domain.name, path: domain.path }
    }
    Skill(skill="ln-634-test-coverage-auditor", args=JSON.stringify(domain_context))
ELSE:
  // Fallback: invoke once for entire codebase (global mode)
  Skill(skill="ln-634-test-coverage-auditor", args=JSON.stringify(contextStore))
```

**Parallelism strategy:**
- Phase 4a: All 4 global workers run in PARALLEL
- Phase 4b: All N domain-aware invocations run in PARALLEL
- Example: 3 domains → 3 ln-634 invocations in single message

**Worker Output Contract (Unified):**

All workers MUST return JSON with this structure:
```json
{
  "category": "Category Name",
  "score": 7,
  "total_issues": 12,
  "critical": 0,
  "high": 3,
  "medium": 7,
  "low": 2,
  "findings": [
    {
      "severity": "HIGH",
      "location": "path/file.ts:123",
      "issue": "Description of the issue",
      "principle": "Category / Sub-principle",
      "recommendation": "How to fix",
      "effort": "S"
    }
  ]
}
```

**Unified Scoring Formula (all workers):**
```
penalty = (critical × 2.0) + (high × 1.0) + (medium × 0.5) + (low × 0.2)
score = max(0, 10 - penalty)
```

**Domain-aware workers** add optional fields: `domain`, `scan_path`

### Phase 5: Aggregate Results

**Goal:** Merge all worker results into unified Test Suite Audit Report

**Aggregation Algorithm:**
```
1. Collect JSON from all 5 workers
2. Merge findings from all workers into single array
3. Sum severity counts:
   total_critical = sum(worker.critical for all workers)
   total_high = sum(worker.high for all workers)
   total_medium = sum(worker.medium for all workers)
   total_low = sum(worker.low for all workers)
4. Calculate Overall Score:
   overall_score = average(worker.score for all workers)
5. Sort findings by severity: CRITICAL → HIGH → MEDIUM → LOW
6. Group findings by category for report sections
```

**Actions:**
1. **Collect results** from all workers (global + domain-aware)
2. **Merge findings** into single flat array (all workers use unified format)
3. **Sum severity counts** across all workers
4. **Calculate overall score** = average of 5 worker scores
5. **Domain-aware worker (ln-634)** → group by domain.name if domain_mode="domain-aware"
6. **Generate Executive Summary** (2-3 sentences)
7. **Create Linear task** in Epic 0 with full report (see Output Format below)
8. **Return summary** to user

**Findings grouping:**
- Categories 1-3, 5-6 (Business Logic, E2E, Value, Isolation, Anti-Patterns) → single tables (global)
- Category 4 (Coverage Gaps) → subtables per domain (if domain_mode="domain-aware")

## Output Format

```markdown
## Test Suite Audit Report - [DATE]

### Executive Summary
[2-3 sentences: test suite health, major issues, key recommendations]

### Severity Summary

| Severity | Count |
|----------|-------|
| Critical | X |
| High | X |
| Medium | X |
| Low | X |
| **Total** | **X** |

### Compliance Score

| Category | Score | Notes |
|----------|-------|-------|
| Business Logic Focus | X/10 | X framework tests found |
| E2E Critical Coverage | X/10 | X critical paths missing E2E |
| Risk-Based Value | X/10 | X low-value tests |
| Coverage Gaps | X/10 | X critical paths untested |
| Isolation & Anti-Patterns | X/10 | X isolation + anti-pattern issues |
| **Overall** | **X/10** | Average of 5 categories |

### Domain Coverage Summary (NEW - if domain_mode="domain-aware")

| Domain | Critical Paths | Tested | Coverage % | Gaps |
|--------|---------------|--------|------------|------|
| users | 8 | 6 | 75% | 2 |
| orders | 12 | 8 | 67% | 4 |
| payments | 6 | 5 | 83% | 1 |
| **Total** | **26** | **19** | **73%** | **7** |

### Audit Findings

| Severity | Location | Issue | Principle | Recommendation | Effort |
|----------|----------|-------|-----------|----------------|--------|
| **CRITICAL** | routes/payment.ts:45 | Missing E2E for payment processing (Priority 25) | E2E Critical Coverage / Money Flow | Add E2E: successful payment + discount edge cases | M |
| **HIGH** | auth.test.ts:45-52 | Test 'bcrypt hashes password' validates library behavior | Business Logic Focus / Crypto Testing | Delete — bcrypt already tested by maintainers | S |
| **HIGH** | db.test.ts:78-85 | Test 'Prisma findMany returns array' validates ORM | Business Logic Focus / ORM Testing | Delete — Prisma already tested | S |
| **HIGH** | user.test.ts:45 | Anti-pattern 'The Liar' — no assertions | Anti-Patterns / The Liar | Add specific assertions or delete test | S |
| **MEDIUM** | utils.test.ts:23-27 | Test 'validateEmail' has Usefulness Score 4 | Risk-Based Value / Low Priority | Delete — likely covered by E2E registration | S |
| **MEDIUM** | order.test.ts:200-350 | Anti-pattern 'The Giant' — 150 lines | Anti-Patterns / The Giant | Split into focused tests | M |
| **LOW** | payment.test.ts | Anti-pattern 'Happy Path Only' — no error tests | Anti-Patterns / Happy Path | Add negative tests | M |

### Coverage Gaps by Domain (if domain_mode="domain-aware")

#### Domain: users (src/users/)

| Severity | Category | Missing Test | Location | Priority | Effort |
|----------|----------|--------------|----------|----------|--------|
| CRITICAL | Money | E2E: processRefund() | services/user.ts:120 | 20 | M |
| HIGH | Security | Unit: validatePermissions() | middleware/auth.ts:45 | 18 | S |

#### Domain: orders (src/orders/)

| Severity | Category | Missing Test | Location | Priority | Effort |
|----------|----------|--------------|----------|----------|--------|
| CRITICAL | Money | E2E: applyDiscount() | services/order.ts:45 | 25 | M |
| HIGH | Data | Integration: orderTransaction() | repositories/order.ts:78 | 16 | M |
```

## Worker Architecture

Each worker:
- Receives `contextStore` with testing best practices
- Receives `testFilesMetadata` with test file list
- Loads full test file contents when analyzing
- Returns structured JSON with category findings
- Operates independently (failure in one doesn't block others)

**Token Efficiency:**
- Coordinator: metadata only (~1000 tokens)
- Workers: full test file contents when needed (~5000-10000 tokens each)
- Context gathered ONCE, shared with all workers

## Critical Rules

- **Two-stage delegation:** Global workers (4) + Domain-aware worker (ln-634 × N domains)
- **Domain discovery:** Auto-detect domains from folder structure; fallback to global mode if <2 domains
- **Parallel execution:** All workers (global + domain-aware) run in PARALLEL
- **Domain-grouped output:** Coverage Gaps findings grouped by domain (if domain_mode="domain-aware")
- **Delete > Archive:** Remove useless tests, don't comment out
- **E2E baseline:** Every endpoint needs 2 E2E (positive + negative)
- **Justify each test:** If can't explain Priority ≥15, remove it
- **Trust frameworks:** Don't test Express/Prisma/bcrypt behavior
- **No performance/load tests:** Flag and REMOVE tests measuring throughput/latency/memory (DevOps Epic territory)
- **Code is truth:** If test contradicts code behavior, update test
- **Language preservation:** Report in project's language (EN/RU)

## Definition of Done

- All test files discovered via Glob
- Context gathered from testing best practices (MCP Ref/Context7)
- Domain discovery completed (domain_mode determined)
- contextStore built with test metadata + domain info
- Global workers (4) invoked in PARALLEL
- Domain-aware worker (ln-634) invoked per domain in PARALLEL
- All workers completed successfully (or reported errors)
- Results aggregated with domain grouping (if domain_mode="domain-aware")
- Domain Coverage Summary built (if domain_mode="domain-aware")
- Compliance scores calculated (6 categories)
- Keep/Remove/Refactor decisions for each test
- Missing tests identified with Priority (grouped by domain if applicable)
- Anti-patterns catalogued
- Linear task created in Epic 0 with full report
- Summary returned to user

## Reference Files

- **Orchestrator lifecycle:** `shared/references/orchestrator_pattern.md`
- **Risk-based testing methodology:** `shared/references/risk_based_testing_guide.md`
- **Task delegation pattern:** `shared/references/task_delegation_pattern.md`
- **Audit scoring formula:** `shared/references/audit_scoring.md`
- **Audit output schema:** `shared/references/audit_output_schema.md`

## Related Skills

- **Workers:**
  - [ln-631-test-business-logic-auditor](../ln-631-test-business-logic-auditor/) — Framework tests detection
  - [ln-632-test-e2e-priority-auditor](../ln-632-test-e2e-priority-auditor/) — E2E baseline validation
  - [ln-633-test-value-auditor](../ln-633-test-value-auditor/) — Usefulness Score calculation
  - [ln-634-test-coverage-auditor](../ln-634-test-coverage-auditor/) — Coverage gaps identification
  - [ln-635-test-isolation-auditor](../ln-635-test-isolation-auditor/) — Isolation + Anti-Patterns

- **Reference:**
  - [../shared/references/risk_based_testing_guide.md](../shared/references/risk_based_testing_guide.md) — Risk-Based Testing Guide
  - [../ln-620-codebase-auditor](../ln-620-codebase-auditor/) — Codebase audit coordinator (similar pattern)

---
**Version:** 4.0.0
**Last Updated:** 2025-12-23

Overview

This skill is a Test Suite Audit Coordinator (L2) that orchestrates five specialized L3 workers to evaluate test quality across six categories and produce a unified audit report. It discovers test files, builds shared testing context, detects project domains, runs parallel workers (global + domain-aware), aggregates findings, and creates a Linear task in Epic 0. Designed for manual invocation, it returns an executive summary, severity breakdown, per-category scores, and actionable recommendations.

How this skill works

The coordinator scans the repository for test files and extracts metadata, then builds a single contextStore of testing best practices to share with workers. It auto-detects project domains and runs four global auditors in parallel plus the coverage auditor once per domain (or once globally if domain_mode is "global"). Each worker returns structured JSON; the coordinator aggregates scores, sums severities, groups domain coverage gaps, and generates the final report and a Linear task in Epic 0.

When to use it

  • Before a release to validate test quality and reduce maintenance cost
  • When audit of business-critical flows (payments, auth, data) is required
  • To prioritize test refactors or removals in a large legacy codebase
  • During sprint planning to identify high-risk gaps that need E2E tests
  • When preparing a QA/engineering debt reduction epic

Best practices

  • Gather testing context once and share it to save tokens and ensure consistency
  • Prefer domain-aware coverage when repository shows clear domain folders; fall back to global mode if not
  • Run global auditors and domain coverage checks in parallel to reduce wall time
  • Treat framework/library behavior tests as removable; test business logic instead
  • Use a unified JSON worker contract so aggregation and Linear task creation are deterministic

Example use cases

  • Audit all tests to find redundant ORM or bcrypt assertions and remove them
  • Identify missing E2E coverage for payment flows and create tasks to add them
  • Score tests by Usefulness (Impact × Probability) to decide KEEP/REVIEW/REMOVE
  • Detect isolation and determinism anti-patterns and recommend fixes
  • Produce a single Linear task in Epic 0 that summarizes findings and remediation steps

FAQ

How are domains detected?

Domains are detected by scanning for explicit domain folders (src/domains, packages, etc.), then top-level src/* folders, and falling back to global mode if fewer than two domains are found.

What happens if a worker fails?

Workers operate independently; failures are reported and the coordinator aggregates available results. A failed worker does not block the overall audit.

How is the overall score computed?

Each worker computes a 0–10 score using a unified penalty formula; the coordinator averages the five worker scores to produce the overall score.