home / skills / williamzujkowski / cognitive-toolworks / testing-load-designer

testing-load-designer skill

safe

This skill helps design realistic load testing scenarios across k6, JMeter, Gatling, or Locust with ramp-up, think time, and SLI validation.

npx playbooks add skill williamzujkowski/cognitive-toolworks --skill testing-load-designer

Review the files below or copy the command above to add this skill to your agents.

Files (3)

SKILL.md

16.9 KB

---
name: "Load Testing Scenario Designer"
slug: testing-load-designer
description: "Design load testing scenarios using k6, JMeter, Gatling, or Locust with ramp-up patterns, think time modeling, and performance SLI validation."
capabilities:
  - generate_load_test_scripts
  - design_ramp_up_patterns
  - model_think_time
  - validate_performance_sli
  - configure_distributed_load
inputs:
  target_service:
    type: string
    description: "URL or endpoint to test (e.g., https://api.example.com/checkout)"
    required: true
  test_type:
    type: enum
    description: "Type of load test to design"
    enum: ["load", "stress", "spike", "soak"]
    required: true
  sli_requirements:
    type: object
    description: "Performance SLI thresholds (p95_latency_ms, throughput_rps, error_rate_percent)"
    required: true
  tool:
    type: enum
    description: "Load testing tool to generate script for"
    enum: ["k6", "jmeter", "gatling", "locust"]
    required: false
    default: "k6"
  scenario_details:
    type: object
    description: "User count, duration, ramp-up time, think time distribution"
    required: false
outputs:
  test_script:
    type: code
    description: "Executable load test script for specified tool"
  test_config:
    type: json
    description: "Test configuration with VUs, duration, ramp-up, stages"
  assertions:
    type: json
    description: "SLI validation thresholds and success criteria"
  execution_plan:
    type: markdown
    description: "How to execute the test with prerequisites and validation steps"
keywords:
  - load-testing
  - performance-testing
  - k6
  - jmeter
  - gatling
  - locust
  - sli-validation
  - ramp-up
  - stress-testing
  - spike-testing
  - soak-testing
  - performance-engineering
version: 1.0.0
owner: william@cognitive-toolworks
license: MIT
security:
  pii: false
  secrets: false
  sandbox: recommended
links:
  - https://k6.io/docs/
  - https://grafana.com/docs/k6/latest/using-k6/
  - https://gatling.io/docs/
  - https://jmeter.apache.org/usermanual/
  - https://docs.locust.io/
  - https://sre.google/sre-book/monitoring-distributed-systems/
---

## Purpose & When-To-Use

**Trigger conditions:**

- Validating application performance before production deployment
- Establishing performance baselines and capacity planning
- Testing system behavior under peak load, stress, or spike conditions
- Validating SLI/SLO compliance for latency, throughput, and error rates
- Simulating realistic user behavior with ramp-up and think time
- Testing distributed system resilience under sustained load (soak testing)

**Use this skill when** you need to design realistic, repeatable load testing scenarios with clear performance thresholds, appropriate ramp-up patterns, and tool-specific implementations for k6, JMeter, Gatling, or Locust.

---

## Pre-Checks

**Before execution, verify:**

1. **Time normalization**: `NOW_ET = 2025-10-26T02:31:21-04:00` (NIST/time.gov semantics, America/New_York)
2. **Input schema validation**:
   - `target_service` is a valid URL with protocol (http/https)
   - `test_type` is one of: load, stress, spike, soak
   - `sli_requirements` contains numeric values for at least one metric
   - `tool` (if provided) is one of: k6, jmeter, gatling, locust
   - `scenario_details` (if provided) has valid numeric ranges
3. **Source freshness**: All cited sources accessed on `NOW_ET`; verify links resolve
4. **Tool compatibility**: Confirm target service is accessible and testable

**Abort conditions:**

- Target service URL is unreachable or requires complex authentication not specified
- SLI requirements are contradictory (e.g., "10ms p95 latency" for external API)
- Test type and scenario details conflict (e.g., "spike test" with gradual ramp-up)
- Tool selection is incompatible with test requirements (e.g., complex distributed scenarios in basic Locust setup)

---

## Procedure

### T1: Fast Path (≤2k tokens)

**Goal**: Generate basic load test script with simple ramp-up and assertions.

1. **Parse inputs and apply defaults**:
   - Determine tool (default: k6)
   - Extract test type and map to pattern:
     - **load**: Gradual ramp-up to target VUs, sustain, ramp-down
     - **stress**: Gradual ramp-up beyond capacity to find breaking point
     - **spike**: Rapid jump to high VUs, sustain briefly, drop
     - **soak**: Low/moderate VUs sustained for extended duration
   - Parse SLI requirements (p95_latency_ms, throughput_rps, error_rate_percent)

2. **Generate basic test script** (k6 example per [k6 docs](https://k6.io/docs/, accessed 2025-10-26)):
   ```javascript
   import http from 'k6/http';
   import { check, sleep } from 'k6';

   export const options = {
     stages: [
       { duration: '2m', target: 100 },  // Ramp-up
       { duration: '5m', target: 100 },  // Sustain
       { duration: '2m', target: 0 },    // Ramp-down
     ],
     thresholds: {
       http_req_duration: ['p(95)<500'],  // 95% <500ms
       http_req_failed: ['rate<0.01'],     // <1% errors
     },
   };

   export default function () {
     const res = http.get('https://api.example.com/checkout');
     check(res, { 'status 200': (r) => r.status === 200 });
     sleep(1);  // Think time
   }
   ```

3. **Output initial configuration**:
   ```json
   {
     "test_config": {
       "tool": "k6",
       "virtual_users": 100,
       "duration_minutes": 9,
       "ramp_up_minutes": 2,
       "think_time_seconds": 1
     },
     "assertions": {
       "p95_latency_ms": 500,
       "error_rate_percent": 1
     }
   }
   ```

**Token budget**: ≤2k tokens

---

### T2: Extended Analysis (≤6k tokens)

**Goal**: Generate realistic scenarios with advanced patterns, distributed load, and comprehensive assertions.

4. **Design realistic ramp-up pattern** based on test type:
   - **Load test** (per [k6 Load Testing](https://grafana.com/docs/k6/latest/using-k6/, accessed 2025-10-26)):
     - Gradual ramp-up: 0 → target VUs over 10-20% of total test time
     - Sustain at target: 60-70% of total test time
     - Gradual ramp-down: 10-20% of total test time
   - **Stress test**:
     - Multi-stage ramp: 0 → 50% → 75% → 100% → 125% → 150% → find breaking point
     - Shorter sustain periods at each stage (2-3 minutes)
   - **Spike test**:
     - Instant jump: 0 → peak VUs in <30 seconds
     - Brief sustain: 1-2 minutes at peak
     - Instant drop: Return to baseline
   - **Soak test**:
     - Moderate VUs (50-70% of capacity)
     - Extended duration (2-24 hours)
     - Monitor for memory leaks, degradation

5. **Model think time distribution** (per [Google SRE Book - Load Testing](https://sre.google/sre-book/monitoring-distributed-systems/, accessed 2025-10-26)):
   - Use realistic user behavior patterns, not uniform sleep()
   - Apply randomization: `sleep(Math.random() * 3 + 1)` for 1-4s range
   - Consider page type: landing (5-10s), checkout (30-60s), browse (2-5s)
   - Add variance with percentile-based think time (p50: 3s, p90: 10s, p99: 30s)

6. **Map SLI requirements to tool-specific assertions**:
   - **k6**: Use `thresholds` object with percentile syntax
   - **JMeter**: Configure Assertions (Response Assertion, Duration Assertion)
   - **Gatling**: Use `assertions` DSL with percentile checks
   - **Locust**: Custom stats collection and failure conditions

7. **Generate tool-specific advanced script**:
   - Add request tagging/grouping for multi-endpoint scenarios
   - Include custom metrics (business transactions, funnel completion)
   - Configure distributed execution parameters if needed
   - Add data parameterization (CSV for users, JSON for payloads)
   - Reference [JMeter User Manual](https://jmeter.apache.org/usermanual/, accessed 2025-10-26) for JMeter-specific patterns
   - Reference [Gatling Documentation](https://gatling.io/docs/, accessed 2025-10-26) for Gatling DSL
   - Reference [Locust Documentation](https://docs.locust.io/, accessed 2025-10-26) for Locust class-based tests

**Token budget**: ≤6k tokens total (including T1)

---

### T3: Deep Dive (≤12k tokens)

**Goal**: Advanced patterns including distributed load, custom protocols, and comprehensive monitoring integration.

8. **Design distributed load generation**:
   - **k6 Cloud/Enterprise**: Configure multiple load zones (US-East, US-West, EU-West)
   - **JMeter Distributed**: Master-slave configuration with RMI
   - **Gatling Enterprise**: Inject distribution across multiple nodes
   - **Locust Distributed**: Master-worker architecture with load distribution

9. **Add advanced test patterns**:
   - **Breakpoint testing**: Incrementally increase load until system breaks
   - **Capacity testing**: Find maximum sustainable throughput
   - **Endurance patterns**: Multi-day soak with scheduled load variations
   - **Recovery testing**: Inject load spikes, measure recovery time

10. **Integrate with observability stack** (per [Google SRE - Monitoring](https://sre.google/sre-book/monitoring-distributed-systems/, accessed 2025-10-26)):
    - Configure Prometheus remote-write for k6 metrics
    - Set up Grafana dashboards for real-time visualization
    - Add CloudWatch/Datadog integration for cloud metrics correlation
    - Configure distributed tracing correlation (OpenTelemetry)

11. **Generate comprehensive execution plan**:
    - Pre-test validation: Smoke test, baseline collection
    - Test execution: Monitoring checklist, abort criteria
    - Post-test analysis: Report generation, SLI compliance validation
    - Iterative tuning: Adjust VUs/duration based on results

**Token budget**: ≤12k tokens total (including T1 + T2)

---

## Decision Rules

**Test type selection guidance:**

- **Load test**: Normal expected traffic + 20-50% headroom
- **Stress test**: 2-3x expected peak load to find breaking point
- **Spike test**: 5-10x sudden traffic surge (flash sale, DDoS simulation)
- **Soak test**: 50-70% capacity sustained 2-24 hours (memory leak detection)

**VU calculation** (requests per second → virtual users):

```
VUs = (target_RPS × response_time_seconds) / (1 - think_time_ratio)

Example:
- Target: 1000 RPS
- Response time: 200ms (0.2s)
- Think time: 1s per request
- VUs = (1000 × 0.2) / (1 - 0.83) = 200 / 0.17 ≈ 1176 VUs
```

**Tool selection matrix:**

| Feature | k6 | JMeter | Gatling | Locust |
|---------|-----|---------|---------|--------|
| Ease of use | High | Medium | Medium | High |
| Protocol support | HTTP/WebSocket/gRPC | Any (plugins) | HTTP/WebSocket/JMS | HTTP/Custom |
| Distributed | Cloud/Enterprise | Built-in (RMI) | Enterprise | Built-in |
| Scripting | JavaScript | GUI + Groovy | Scala DSL | Python |
| Best for | Modern APIs, DevOps | Legacy/complex protocols | JVM apps, high load | Python devs, simple APIs |

**SLI threshold recommendations** (from [Google SRE Book](https://sre.google/sre-book/monitoring-distributed-systems/, accessed 2025-10-26)):

- **Latency**: p50 <100ms, p95 <500ms, p99 <1s (API endpoints)
- **Throughput**: Based on capacity planning (RPS per instance × instance count)
- **Error rate**: <0.1% (four nines reliability), <1% (three nines)
- **Availability**: 99.9% (43.2 min/month downtime), 99.95% (21.6 min/month)

**Stop conditions:**

- If target service returns 5xx errors during smoke test: abort and fix service
- If SLI requirements are unattainable (require <10ms p95 for external API): renegotiate
- If test script complexity exceeds tool capabilities: recommend tool change

---

## Output Contract

**Required fields** (all outputs):

```typescript
interface LoadTestScript {
  tool: "k6" | "jmeter" | "gatling" | "locust";
  script_content: string;          // Executable test script
  script_language: string;         // "javascript", "xml", "scala", "python"
  entry_point: string;             // How to execute (e.g., "k6 run script.js")
}

interface TestConfig {
  tool: string;
  test_type: "load" | "stress" | "spike" | "soak";
  virtual_users: number | object;  // Number or stages array
  duration_minutes: number;
  ramp_up_pattern: Array<{
    stage: number;
    duration_seconds: number;
    target_vus: number;
  }>;
  think_time_config: {
    min_seconds: number;
    max_seconds: number;
    distribution: "uniform" | "normal" | "exponential";
  };
  distributed_config?: {
    enabled: boolean;
    load_zones?: string[];
    workers?: number;
  };
}

interface Assertions {
  latency_thresholds: {
    p50_ms?: number;
    p95_ms: number;
    p99_ms?: number;
  };
  throughput_threshold?: {
    min_rps: number;
  };
  error_rate_threshold: {
    max_percent: number;
  };
  custom_checks?: Array<{
    metric: string;
    operator: "lt" | "lte" | "gt" | "gte" | "eq";
    value: number;
  }>;
}

interface ExecutionPlan {
  prerequisites: string[];         // Required setup steps
  smoke_test_command: string;      // Pre-flight validation
  full_test_command: string;       // Main execution
  monitoring_checklist: string[];  // What to observe during test
  abort_criteria: string[];        // When to stop test early
  success_criteria: string[];      // How to validate results
  report_generation?: string;      // Post-test analysis steps
}
```

**Format**:

- `test_script`: Valid code for specified tool (JavaScript for k6, XML for JMeter, Scala for Gatling, Python for Locust)
- `test_config`: Valid JSON
- `assertions`: Valid JSON with numeric values
- `execution_plan`: Markdown with code blocks for commands

**Validation**:

- Script is syntactically valid for target tool
- VU counts and durations are positive integers
- Thresholds are achievable (p95 < p99, error_rate <100%)
- Think time min < max

---

## Examples

### Example 1: k6 E-Commerce Checkout Load Test (T2)

```javascript
import http from 'k6/http';
import { check, sleep } from 'k6';
export const options = {
  stages: [
    { duration: '3m', target: 1000 },
    { duration: '10m', target: 1000 },
    { duration: '2m', target: 0 },
  ],
  thresholds: {
    http_req_duration: ['p(95)<800'],
    http_req_failed: ['rate<0.005'],
    http_reqs: ['rate>500'],
  },
};
export default function () {
  const payload = JSON.stringify({
    cart_id: '123',
    payment: 'card'
  });
  const res = http.post(
    'https://api.example.com/checkout',
    payload,
    { headers: { 'Content-Type': 'application/json' } }
  );
  check(res, {
    'status 200': (r) => r.status === 200,
    'checkout success': (r) => r.json('success')
  });
  sleep(Math.random() * 3 + 2);
}
```

---

## Quality Gates

**Token budgets** (mandatory):

- T1 ≤ 2k tokens (basic script + simple assertions)
- T2 ≤ 6k tokens (realistic scenarios + think time modeling)
- T3 ≤ 12k tokens (distributed load + monitoring integration)

**Safety checks**:

- [ ] No hardcoded credentials or API keys in test scripts
- [ ] No production data in test payloads (use synthetic/anonymized data)
- [ ] Load test targets are non-production environments (unless explicitly approved)
- [ ] Distributed tests include rate limiting to prevent accidental DDoS

**Auditability**:

- [ ] All sources cited with access date = `NOW_ET`
- [ ] VU calculations include methodology and assumptions
- [ ] SLI thresholds tied to business requirements or SRE standards
- [ ] Test results are reproducible with same script + config

**Determinism**:

- [ ] Same inputs produce same script structure (±10% VU variance acceptable)
- [ ] Ramp-up patterns follow documented heuristics
- [ ] Think time distributions use seeded randomness where possible

**Validation checklist**:

- [ ] Script executes without syntax errors
- [ ] Assertions align with SLI requirements
- [ ] VU count and duration are realistic for target infrastructure
- [ ] Think time modeling prevents unrealistic "robot" traffic

---

## Resources

**Primary sources** (accessed 2025-10-26):

1. **k6 Documentation**: https://k6.io/docs/
   Official k6 load testing tool documentation with test lifecycle, scripting, and thresholds.

2. **k6 Using Guide**: https://grafana.com/docs/k6/latest/using-k6/
   Comprehensive guide on test types, scenarios, executors, and distributed testing with k6.

3. **Gatling Documentation**: https://gatling.io/docs/
   Gatling load testing framework docs covering Scala DSL, simulation design, and reports.

4. **JMeter User Manual**: https://jmeter.apache.org/usermanual/
   Apache JMeter user manual with test plan creation, distributed testing, and protocols.

5. **Locust Documentation**: https://docs.locust.io/
   Locust Python-based load testing framework docs with distributed mode and custom tasks.

6. **Google SRE Book - Monitoring Distributed Systems**: https://sre.google/sre-book/monitoring-distributed-systems/
   Google SRE principles for SLI/SLO definition, load testing strategies, and performance validation.

**Additional templates**:

- See `examples/load-test-example.js` for complete k6 workflow example
- See `resources/jmeter-template.jmx` for JMeter test plan template
- See `resources/gatling-template.scala` for Gatling simulation template

**Related skills**:

- `observability-slo-calculator` (for defining SLI/SLO before load testing)
- `testing-chaos-designer` (for resilience testing under load)
- `observability-stack-configurator` (for monitoring during load tests)

---

**End of SKILL.md**

Overview

This skill designs repeatable load testing scenarios for k6, JMeter, Gatling, or Locust, including ramp-up patterns, think time modeling, and SLI validation. It produces executable test scripts, a concrete test configuration, assertion thresholds, and an execution plan tailored to the chosen tool and test type.

How this skill works

Provide a target service URL, desired test type (load, stress, spike, soak), SLI requirements, and optional tool choice. The skill validates inputs, selects sensible defaults (k6 if unspecified), maps test type to a ramp pattern, models realistic think time, and emits a tool-specific script plus JSON configs and an execution checklist. It also flags abort conditions and suggests distributed or observability integrations when relevant.

When to use it

Validate performance before a production release or major traffic event
Establish baselines and capacity planning for backend services
Simulate peak, spike, stress, or extended soak conditions
Verify SLI/SLO compliance for latency, throughput, and error rates
Exercise distributed system resilience under sustained load

Best practices

Verify target URL accessibility and avoid running against production without approval
Model think time with randomized distributions to mimic real users
Start with a smoke test and low VU counts, then scale via staged ramps
Map SLIs to tool-native assertions (k6 thresholds, JMeter assertions, Gatling assertions, Locust custom checks)
Use synthetic data and never include hardcoded credentials or production PII
Integrate metrics (Prometheus/Grafana or CloudWatch/Datadog) and set clear abort criteria

Example use cases

k6 scripted load test for e-commerce checkout with gradual ramp-up and p95 latency threshold
JMeter stress run to find breaking point using multi-stage ramps and duration assertions
Gatling capacity test for JVM-based service with percentile assertions and CSV parameterization
Locust soak test to detect memory leaks over multi-hour runs with master-worker distribution
Spike test for flash-sale simulation: instant jump to peak VUs, short sustain, and rapid drop

FAQ

Which tool should I pick for modern APIs?

k6 is recommended for modern APIs due to its JavaScript scripting, built-in thresholds, and easy CI integration. Choose Gatling for JVM-heavy environments, JMeter for complex protocol coverage, and Locust if you prefer Python.

How do I translate SLIs into VUs?

Use the VU formula: VUs = (target_RPS × response_time_seconds) / (1 - think_time_ratio). Validate assumptions with a small pilot run and adjust think time and response-time estimates.