home / skills / dasien / claudemultiagenttemplate / load-testing

This skill designs and executes realistic load tests to validate performance, identify bottlenecks, and ensure SLA compliance under peak conditions.

npx playbooks add skill dasien/claudemultiagenttemplate --skill load-testing

Review the files below or copy the command above to add this skill to your agents.

Files (1)
SKILL.md
5.4 KB
---
name: "Load Testing"
description: "Design and execute load tests to validate performance under stress, identify scalability limits, and ensure SLA compliance"
category: "performance"
required_tools: ["Bash", "Write", "WebSearch"]
---

# Load Testing

## Purpose
Validate application performance under realistic and peak load conditions, identify scalability bottlenecks, and ensure systems meet performance SLAs.

## When to Use
- Before production deployment
- After performance optimizations
- Capacity planning
- Validating scalability
- Testing under expected peak loads

## Key Capabilities

1. **Load Test Design** - Create realistic test scenarios and user patterns
2. **Performance Validation** - Measure response times, throughput, and resource usage
3. **Scalability Analysis** - Identify breaking points and bottlenecks

## Approach

1. **Define Performance Requirements**
   - Target response times (p50, p95, p99)
   - Expected throughput (requests/second)
   - Concurrent users
   - Resource limits (CPU, memory)

2. **Design Test Scenarios**
   - User workflows (login → browse → checkout)
   - Traffic patterns (gradual ramp, spike, sustained)
   - Data variations (small/large payloads)
   - Think time between requests

3. **Select Tools**
   - **k6**: Modern, JavaScript-based, great reporting
   - **JMeter**: Feature-rich, GUI-based
   - **Locust**: Python-based, distributed testing
   - **Gatling**: Scala-based, detailed reports
   - **ab/wrk**: Simple command-line tools

4. **Execute Tests**
   - Start with baseline (single user)
   - Ramp up gradually
   - Test at target load
   - Spike test (sudden traffic increase)
   - Endurance test (sustained load)

5. **Analyze Results**
   - Response time percentiles
   - Error rates
   - Throughput degradation
   - Resource utilization
   - Database connection pool exhaustion

## Example

**Context**: Load testing an e-commerce API

**Test Scenario (k6)**:
```javascript
// load-test.js
import http from 'k6/http';
import { check, sleep } from 'k6';
import { Rate } from 'k6/metrics';

const errorRate = new Rate('errors');

export const options = {
  stages: [
    { duration: '2m', target: 100 },  // Ramp to 100 users
    { duration: '5m', target: 100 },  // Stay at 100
    { duration: '2m', target: 200 },  // Ramp to 200
    { duration: '5m', target: 200 },  // Stay at 200
    { duration: '2m', target: 0 },    // Ramp down
  ],
  thresholds: {
    http_req_duration: ['p(95)<500'], // 95% under 500ms
    http_req_failed: ['rate<0.01'],    // <1% errors
  },
};

export default function () {
  // Browse products
  let res = http.get('https://api.example.com/products');
  check(res, {
    'products loaded': (r) => r.status === 200,
    'response < 500ms': (r) => r.timings.duration < 500,
  }) || errorRate.add(1);
  
  sleep(1); // Think time
  
  // View product detail
  res = http.get('https://api.example.com/products/123');
  check(res, {
    'product detail loaded': (r) => r.status === 200,
  }) || errorRate.add(1);
  
  sleep(2);
  
  // Add to cart
  res = http.post('https://api.example.com/cart', JSON.stringify({
    product_id: 123,
    quantity: 1
  }), {
    headers: { 'Content-Type': 'application/json' },
  });
  check(res, {
    'added to cart': (r) => r.status === 201,
  }) || errorRate.add(1);
  
  sleep(1);
}
```

**Run Test**:
```bash
k6 run --out json=results.json load-test.js
```

**Results Analysis**:
```
scenarios: (100.00%) 1 scenario, 200 max VUs, 18m0s max duration

     ✓ products loaded
     ✓ response < 500ms
     ✓ product detail loaded
     ✓ added to cart

     checks.........................: 98.50% ✓ 19700  ✗ 300
     data_received..................: 45 MB  2.8 MB/s
     data_sent......................: 8.5 MB 530 kB/s
     http_req_blocked...............: avg=1.2ms    min=0s     med=0s      max=150ms   p(95)=5ms    p(99)=25ms
     http_req_duration..............: avg=245ms    min=45ms   med=180ms   max=2.5s    p(95)=480ms  p(99)=850ms
     http_req_failed................: 1.50%  ✓ 300   ✗ 19700
     http_reqs......................: 20000  1250/s
     iteration_duration.............: avg=4.8s     min=4s     med=4.5s    max=8s
     iterations.....................: 5000   312.5/s
     vus............................: 200    min=0    max=200
     vus_max........................: 200    min=200  max=200
```

**Analysis**:
- ✅ p95 response time: 480ms (meets <500ms threshold)
- ⚠️ p99 response time: 850ms (exceeds threshold)
- ⚠️ Error rate: 1.5% (exceeds <1% threshold)
- Throughput: 1250 requests/second
- System starts degrading above 200 concurrent users

**Bottleneck Investigation**:
- Errors occur during "add to cart" operation
- Database connection pool exhausted at peak load
- CPU usage spikes to 95% at 200 users

**Recommendations**:
1. Increase database connection pool size
2. Add caching for product catalog
3. Optimize cart operations
4. Consider horizontal scaling

## Best Practices

- ✅ Test in production-like environment
- ✅ Use realistic user scenarios, not just endpoint hammering
- ✅ Ramp up gradually (don't spike immediately)
- ✅ Monitor system metrics during tests (CPU, memory, DB connections)
- ✅ Test at 2-3x expected peak load
- ✅ Include think time between requests
- ✅ Vary request payloads and parameters
- ✅ Run endurance tests (24+ hours for stability)
- ❌ Avoid: Testing from same network as application
- ❌ Avoid: Ignoring failed requests in results
- ❌ Avoid: Testing only happy paths
- ❌ Avoid: Testing with empty databases

Overview

This skill designs and runs load tests to validate performance under realistic and peak conditions, identify scalability limits, and verify SLA compliance. It helps teams find bottlenecks, quantify capacity, and produce actionable recommendations for scaling and optimization.

How this skill works

Define performance targets (p50/p95/p99, throughput, concurrent users) and model realistic user workflows and traffic patterns. Select a tool (k6, Locust, JMeter, Gatling, wrk), execute baseline, ramp, spike, and endurance tests, then analyze percentiles, error rates, throughput, and resource metrics to pinpoint failure modes. Produce remediation steps and retest after fixes.

When to use it

  • Before deploying major features or a new release to production
  • After performance optimizations to validate gains
  • During capacity planning and scaling exercises
  • To verify systems meet SLA/SLI targets under expected peak traffic
  • When diagnosing intermittent failures under load

Best practices

  • Test in a production-like environment with realistic data
  • Design user workflows and include think time between actions
  • Ramp traffic gradually and include spike and endurance scenarios
  • Monitor system metrics (CPU, memory, DB connections, I/O) alongside test results
  • Run tests at 2–3x expected peak load and include data variation
  • Treat failed requests as first-class results and investigate root causes

Example use cases

  • Load test an e-commerce API to validate checkout flow at peak shopping hours
  • Verify microservice scaling behavior under concurrent user spikes
  • Measure response-time percentiles and error rate after a database migration
  • Run endurance tests to detect memory leaks or connection-pool exhaustion
  • Compare performance impact of cache layer changes or horizontal scaling

FAQ

Which tool should I pick for load testing?

Choose based on workflow and environment: k6 for modern scripting and CI integration, Locust for Python-based scenarios, JMeter for rich GUI features, Gatling for detailed reports, and wrk/ab for quick CLI checks.

How do I know test results are valid?

Ensure tests run in a production-like environment with realistic data, monitor system metrics, include warm-up and cool-down phases, and validate that error rates and percentiles meet defined thresholds.