home / skills / a5c-ai / babysitter / performance-benchmark-suite

This skill benchmarks SDK performance across versions, detects regressions, and generates visual reports to guide optimization and release decisions.

npx playbooks add skill a5c-ai/babysitter --skill performance-benchmark-suite

Review the files below or copy the command above to add this skill to your agents.

Files (2)
SKILL.md
2.1 KB
---
name: performance-benchmark-suite
description: SDK performance benchmarking and regression detection
allowed-tools:
  - Read
  - Write
  - Edit
  - Glob
  - Grep
  - Bash
---

# Performance Benchmark Suite Skill

## Overview

This skill implements comprehensive SDK performance benchmarking, tracking latency, throughput, memory usage, and detecting performance regressions across versions.

## Capabilities

- Measure latency percentiles (p50, p95, p99)
- Track memory usage and allocation patterns
- Detect performance regressions automatically
- Generate visual benchmark reports
- Compare performance across SDK versions
- Implement microbenchmarks for critical paths
- Configure continuous benchmarking in CI
- Support load testing scenarios

## Target Processes

- Performance Benchmarking
- SDK Testing Strategy
- SDK Versioning and Release Management

## Integration Points

- k6 for load testing
- Artillery for HTTP benchmarking
- hyperfine for CLI benchmarking
- Benchmark.js for JavaScript
- pytest-benchmark for Python
- Continuous benchmark systems (Bencher)

## Input Requirements

- Performance requirements (SLOs)
- Benchmark scenarios
- Baseline versions for comparison
- Environment specifications
- Reporting requirements

## Output Artifacts

- Benchmark test suite
- Performance baseline data
- Regression detection rules
- Visual benchmark reports
- CI benchmark configuration
- Historical trend analysis

## Usage Example

```yaml
skill:
  name: performance-benchmark-suite
  context:
    tool: k6
    scenarios:
      - name: basic-crud
        operations: ["create", "read", "update", "delete"]
        vus: 10
        duration: "30s"
      - name: high-load
        vus: 100
        duration: "5m"
    slos:
      p95_latency: "100ms"
      p99_latency: "500ms"
      error_rate: "0.1%"
    compareWith: "v1.0.0"
    regressionThreshold: "10%"
```

## Best Practices

1. Establish baselines before optimization
2. Track percentiles, not just averages
3. Run benchmarks in consistent environments
4. Automate regression detection in CI
5. Monitor memory alongside latency
6. Document benchmark methodology

Overview

This skill implements an SDK performance benchmarking and regression detection suite for JavaScript projects. It measures latency, throughput, memory usage, and generates visual reports to compare SDK versions. The suite is designed to run locally or in CI and to alert on regressions against established baselines.

How this skill works

The skill runs microbenchmarks and load tests using tools like Benchmark.js, k6, Artillery, and hyperfine, collecting percentiles (p50, p95, p99), throughput, and memory metrics. It stores baseline data, applies configurable regression rules, and produces trend reports and CI-friendly pass/fail outputs. Visual reports and historical analysis help pinpoint regressions and performance hotspots.

When to use it

  • Before and after SDK releases to detect regressions
  • When validating performance impact of code changes or refactors
  • To establish and track performance baselines and SLO compliance
  • During CI runs for automated guardrails against slowdowns
  • For load testing critical API paths and client workflows

Best practices

  • Define clear SLOs and regression thresholds before benchmarking
  • Record percentiles (p50/p95/p99) and memory metrics, not just averages
  • Run benchmarks in consistent, isolated environments to reduce noise
  • Automate benchmark execution and regression checks in CI pipelines
  • Keep baseline versions and methodology documented for reproducibility

Example use cases

  • Compare p95 latency between v1.0.0 and current branch to detect regressions
  • Run k6 scenarios for CRUD operations to validate load behavior
  • Use hyperfine to benchmark CLI command performance after optimization
  • Integrate Benchmark.js microbenchmarks for hot-path function profiling
  • Fail CI when regression exceeds a configured percentage for p99 latency

FAQ

What inputs are needed to run the suite?

Provide benchmark scenarios, SLOs, baseline version, environment specs, and any CI configuration.

How are regressions detected?

The suite compares current results to baseline data using configurable thresholds and flags failures if a metric exceeds the regression threshold.