home / skills / shotaiuchi / dotclaude / test-performance

test-performance skill

safe

This skill helps you design and report performance tests, benchmarks, and load scenarios to guard against regressions and guide optimizations.

npx playbooks add skill shotaiuchi/dotclaude --skill test-performance

Review the files below or copy the command above to add this skill to your agents.

Files (1)

SKILL.md

2.0 KB

---
name: test-performance
description: >-
  Performance test creation. Apply when writing benchmarks, load tests,
  performance regression tests, measuring execution time, memory allocation,
  and throughput under load.
user-invocable: false
---

# Performance Tests

Write performance tests that measure and guard against performance regressions.

## Test Creation Checklist

### Benchmark Design
- Establish clear metrics (latency, throughput, memory) for each benchmark
- Use warm-up iterations to eliminate JIT and cache cold-start effects
- Run sufficient iterations for statistically significant results
- Isolate benchmarked code from measurement overhead
- Document hardware and environment assumptions for reproducibility

### Load Testing
- Define realistic load profiles based on production traffic patterns
- Test with sustained load, spike patterns, and gradual ramp-up
- Measure response time percentiles (p50, p95, p99) under load
- Verify graceful degradation when load exceeds capacity
- Check resource utilization (CPU, memory, connections) during load

### Memory & Resource Profiling
- Measure allocation rates for hot code paths
- Detect memory leaks by monitoring heap growth over time
- Check for unclosed resources (file handles, connections, streams)
- Verify garbage collection pause times remain acceptable
- Profile object retention to identify unnecessary caching

### Regression Baselines
- Capture baseline metrics from stable release builds
- Define acceptable variance thresholds for each metric
- Automate comparison between current and baseline results
- Alert on regressions exceeding defined thresholds
- Store historical performance data for trend analysis

## Output Format

Report test plan with priority ratings:

| Priority | Description |
|----------|-------------|
| Must | Benchmarks for critical hot paths and SLA-bound operations |
| Should | Load tests covering primary user-facing workflows |
| Could | Profiling for optimization candidates and memory analysis |
| Won't | Micro-benchmarks with negligible real-world impact |

Overview

This skill helps create performance tests and guardrails to detect regressions in latency, throughput, and resource usage. It provides a checklist and templates for benchmarks, load tests, memory profiling, and regression baselines. Use it to turn performance goals into repeatable, automated tests tied to CI and release workflows.

How this skill works

The skill guides you to design benchmarks with clear metrics, warm-up runs, and isolation from measurement overhead. It outlines load testing scenarios, memory and resource profiling steps, and procedures to capture and compare regression baselines. Finally, it recommends automation and storage of historical results so changes trigger alerts when thresholds are exceeded.

When to use it

When adding benchmarks for critical hot paths or SLA-bound operations
Before merging changes that could affect latency, throughput, or memory
When validating capacity planning with sustained or spike load tests
To detect memory leaks or unacceptable allocation growth over time
When establishing automated regression comparisons in CI

Best practices

Define clear metrics (p50, p95, p99, throughput, allocation rate) and acceptable variance
Use warm-up iterations and sufficient samples for statistically meaningful results
Isolate benchmarked code from measurement and minimize tooling overhead
Document hardware, OS, JVM/runtime, and environment variables for reproducibility
Automate baseline capture, comparison, and alerts; retain historical data for trends

Example use cases

Create a microbenchmark for a database access hot path with warm-up and iteration control
Run a load test that ramps to peak production traffic and measures p95/p99 latencies
Profile memory allocation for a service under sustained load to spot leaks
Set a CI job that compares current run to a stored baseline and fails on regressions
Simulate spike traffic to verify graceful degradation and resource exhaustion behavior

FAQ

How many iterations do I need for a benchmark?

Run enough iterations until metrics converge; include warm-up runs and use statistical analysis to confirm stability rather than a fixed count.

What baselines should I store?

Store stable-release metrics for key latency percentiles, throughput, CPU/memory usage, and allocation rates along with environment metadata for reproducibility.