home / skills / a5c-ai / babysitter / performance-benchmark-suite
This skill benchmarks SDK performance across versions, detects regressions, and generates visual reports to guide optimization and release decisions.
npx playbooks add skill a5c-ai/babysitter --skill performance-benchmark-suiteReview the files below or copy the command above to add this skill to your agents.
---
name: performance-benchmark-suite
description: SDK performance benchmarking and regression detection
allowed-tools:
- Read
- Write
- Edit
- Glob
- Grep
- Bash
---
# Performance Benchmark Suite Skill
## Overview
This skill implements comprehensive SDK performance benchmarking, tracking latency, throughput, memory usage, and detecting performance regressions across versions.
## Capabilities
- Measure latency percentiles (p50, p95, p99)
- Track memory usage and allocation patterns
- Detect performance regressions automatically
- Generate visual benchmark reports
- Compare performance across SDK versions
- Implement microbenchmarks for critical paths
- Configure continuous benchmarking in CI
- Support load testing scenarios
## Target Processes
- Performance Benchmarking
- SDK Testing Strategy
- SDK Versioning and Release Management
## Integration Points
- k6 for load testing
- Artillery for HTTP benchmarking
- hyperfine for CLI benchmarking
- Benchmark.js for JavaScript
- pytest-benchmark for Python
- Continuous benchmark systems (Bencher)
## Input Requirements
- Performance requirements (SLOs)
- Benchmark scenarios
- Baseline versions for comparison
- Environment specifications
- Reporting requirements
## Output Artifacts
- Benchmark test suite
- Performance baseline data
- Regression detection rules
- Visual benchmark reports
- CI benchmark configuration
- Historical trend analysis
## Usage Example
```yaml
skill:
name: performance-benchmark-suite
context:
tool: k6
scenarios:
- name: basic-crud
operations: ["create", "read", "update", "delete"]
vus: 10
duration: "30s"
- name: high-load
vus: 100
duration: "5m"
slos:
p95_latency: "100ms"
p99_latency: "500ms"
error_rate: "0.1%"
compareWith: "v1.0.0"
regressionThreshold: "10%"
```
## Best Practices
1. Establish baselines before optimization
2. Track percentiles, not just averages
3. Run benchmarks in consistent environments
4. Automate regression detection in CI
5. Monitor memory alongside latency
6. Document benchmark methodology
This skill implements an SDK performance benchmarking and regression detection suite for JavaScript projects. It measures latency, throughput, memory usage, and generates visual reports to compare SDK versions. The suite is designed to run locally or in CI and to alert on regressions against established baselines.
The skill runs microbenchmarks and load tests using tools like Benchmark.js, k6, Artillery, and hyperfine, collecting percentiles (p50, p95, p99), throughput, and memory metrics. It stores baseline data, applies configurable regression rules, and produces trend reports and CI-friendly pass/fail outputs. Visual reports and historical analysis help pinpoint regressions and performance hotspots.
What inputs are needed to run the suite?
Provide benchmark scenarios, SLOs, baseline version, environment specs, and any CI configuration.
How are regressions detected?
The suite compares current results to baseline data using configurable thresholds and flags failures if a metric exceeds the regression threshold.