home / skills / yonatangross / orchestkit / performance-testing

performance-testing skill

Q: Which tool should I pick: k6 or Locust?

Use k6 for JavaScript-friendly, CI-native load tests and metrics integration. Use Locust for Python-based test code or when you need complex user behavior modeling.

Q: What thresholds are recommended?

A practical starting point is p95 < 500ms and error rate < 1%, then tighten based on SLAs and real user expectations.

safe

/plugins/ork/skills/performance-testing

This skill helps validate system performance under load using k6 and Locust, establishing baselines and identifying bottlenecks.

npx playbooks add skill yonatangross/orchestkit --skill performance-testing

Review the files below or copy the command above to add this skill to your agents.

Files (4)

SKILL.md

4.7 KB

---
name: performance-testing
description: Performance and load testing with k6 and Locust. Use when validating system performance under load, stress testing, identifying bottlenecks, or establishing performance baselines.
tags: [testing, performance, load, stress]
context: fork
agent: metrics-architect
version: 1.0.0
author: OrchestKit
user-invocable: false
---

# Performance Testing

Validate system behavior under load.

## k6 Load Test (JavaScript)

```javascript
import http from 'k6/http';
import { check, sleep } from 'k6';

export const options = {
  stages: [
    { duration: '30s', target: 20 },  // Ramp up
    { duration: '1m', target: 20 },   // Steady
    { duration: '30s', target: 0 },   // Ramp down
  ],
  thresholds: {
    http_req_duration: ['p(95)<500'],  // 95% under 500ms
    http_req_failed: ['rate<0.01'],    // <1% errors
  },
};

export default function () {
  const res = http.get('http://localhost:8500/api/health');

  check(res, {
    'status is 200': (r) => r.status === 200,
    'response time < 200ms': (r) => r.timings.duration < 200,
  });

  sleep(1);
}
```

## Locust Load Test (Python)

```python
from locust import HttpUser, task, between

class APIUser(HttpUser):
    wait_time = between(1, 3)

    @task(3)
    def get_analyses(self):
        self.client.get("/api/analyses")

    @task(1)
    def create_analysis(self):
        self.client.post(
            "/api/analyses",
            json={"url": "https://example.com"}
        )

    def on_start(self):
        """Login before tasks."""
        self.client.post("/api/auth/login", json={
            "email": "[email protected]",
            "password": "password"
        })
```

## Test Types

### Load Test
```javascript
// Normal expected load
export const options = {
  vus: 50,           // Virtual users
  duration: '5m',    // Duration
};
```

### Stress Test
```javascript
// Find breaking point
export const options = {
  stages: [
    { duration: '2m', target: 100 },
    { duration: '2m', target: 200 },
    { duration: '2m', target: 300 },
    { duration: '2m', target: 400 },
  ],
};
```

### Spike Test
```javascript
// Sudden traffic surge
export const options = {
  stages: [
    { duration: '10s', target: 10 },
    { duration: '1s', target: 1000 },  // Spike!
    { duration: '3m', target: 1000 },
    { duration: '10s', target: 10 },
  ],
};
```

### Soak Test
```javascript
// Sustained load (memory leaks)
export const options = {
  vus: 50,
  duration: '4h',
};
```

## Metrics to Track

```javascript
import { Trend, Counter, Rate } from 'k6/metrics';

const responseTime = new Trend('response_time');
const errors = new Counter('errors');
const successRate = new Rate('success_rate');

export default function () {
  const start = Date.now();
  const res = http.get('http://localhost:8500/api/data');

  responseTime.add(Date.now() - start);

  if (res.status !== 200) {
    errors.add(1);
    successRate.add(false);
  } else {
    successRate.add(true);
  }
}
```

## CI Integration

```yaml
# GitHub Actions
- name: Run k6 load test
  run: |
    k6 run --out json=results.json tests/load/api.js

- name: Check thresholds
  run: |
    if [ $(jq '.thresholds | .[] | select(.ok == false)' results.json | wc -l) -gt 0 ]; then
      exit 1
    fi
```

## Key Decisions

| Decision | Recommendation |
|----------|----------------|
| Tool | k6 (JS), Locust (Python) |
| Load profile | Start with expected traffic |
| Thresholds | p95 < 500ms, errors < 1% |
| Duration | 5-10 min for load, 4h+ for soak |

## Common Mistakes

- Testing against production without protection
- No warmup period
- Unrealistic load profiles
- Missing error rate thresholds

## Related Skills

- `observability-monitoring` - Metrics collection
- `performance-optimization` - Fixing bottlenecks
- `e2e-testing` - Functional validation

## Capability Details

### load-testing
**Keywords:** load test, concurrent users, k6, Locust, ramp up
**Solves:**
- Simulate concurrent user load
- Configure ramp-up patterns
- Test system under expected load

### stress-testing
**Keywords:** stress test, breaking point, peak load, overload
**Solves:**
- Find system breaking points
- Test beyond expected capacity
- Identify failure modes under stress

### latency-measurement
**Keywords:** latency, response time, p95, p99, percentile
**Solves:**
- Measure response time percentiles
- Track latency distribution
- Set latency SLO thresholds

### throughput-testing
**Keywords:** throughput, requests per second, RPS, TPS
**Solves:**
- Measure maximum throughput
- Test transactions per second
- Verify capacity requirements

### bottleneck-identification
**Keywords:** bottleneck, profiling, hot path, performance issue
**Solves:**
- Identify performance bottlenecks
- Profile critical code paths
- Diagnose slow operations

Overview

This skill provides performance and load testing patterns using k6 and Locust to validate system behavior under load. It helps teams establish baselines, find breaking points, and measure latency, throughput, and error rates. Use it to run load, stress, spike, and soak tests with actionable thresholds and CI integration.

How this skill works

The skill supplies example k6 scripts (JavaScript) and Locust scenarios (Python) to simulate virtual users and traffic patterns. It includes recommended test profiles (ramp-up, steady, ramp-down), metrics collection (response time, errors, success rate), and threshold checks (p95 latency, error rate). CI snippets show how to run tests and fail builds on threshold breaches.

When to use it

Before major releases to validate performance under expected traffic
When investigating stability or finding system breaking points
To establish performance SLOs and measure p95/p99 latency
During capacity planning to verify throughput and RPS
To detect memory leaks or resource degradation with soak tests

Best practices

Start with a warmup period and realistic ramp-up profiles
Define clear thresholds (e.g., p95 < 500ms, errors < 1%) and fail CI when exceeded
Test in a controlled environment or behind production protections
Monitor application and infrastructure metrics alongside load tests
Run multiple test types (load, stress, spike, soak) to cover scenarios

Example use cases

Run a 5–10 minute load test with 50 virtual users to validate normal traffic performance
Execute a stress test that increases VUs gradually to find the breaking point
Simulate a sudden spike to validate autoscaling and failure modes
Run a 4-hour soak test to uncover memory leaks and long-term resource issues
Integrate k6 into GitHub Actions to block merges when thresholds fail

FAQ

Which tool should I pick: k6 or Locust?

Use k6 for JavaScript-friendly, CI-native load tests and metrics integration. Use Locust for Python-based test code or when you need complex user behavior modeling.

What thresholds are recommended?

A practical starting point is p95 < 500ms and error rate < 1%, then tighten based on SLAs and real user expectations.