home / skills / jeremylongshore / claude-code-plugins-plus-skills / clay-load-scale

This skill helps you plan, test, and auto-scale Clay integrations with load testing, capacity planning, and scaling strategies.

npx playbooks add skill jeremylongshore/claude-code-plugins-plus-skills --skill clay-load-scale

Review the files below or copy the command above to add this skill to your agents.

Files (1)
SKILL.md
6.2 KB
---
name: clay-load-scale
description: |
  Implement Clay load testing, auto-scaling, and capacity planning strategies.
  Use when running performance tests, configuring horizontal scaling,
  or planning capacity for Clay integrations.
  Trigger with phrases like "clay load test", "clay scale",
  "clay performance test", "clay capacity", "clay k6", "clay benchmark".
allowed-tools: Read, Write, Edit, Bash(k6:*), Bash(kubectl:*)
version: 1.0.0
license: MIT
author: Jeremy Longshore <[email protected]>
---

# Clay Load & Scale

## Overview
Load testing, scaling strategies, and capacity planning for Clay integrations.

## Prerequisites
- k6 load testing tool installed
- Kubernetes cluster with HPA configured
- Prometheus for metrics collection
- Test environment API keys

## Load Testing with k6

### Basic Load Test
```javascript
// clay-load-test.js
import http from 'k6/http';
import { check, sleep } from 'k6';

export const options = {
  stages: [
    { duration: '2m', target: 10 },   // Ramp up
    { duration: '5m', target: 10 },   // Steady state
    { duration: '2m', target: 50 },   // Ramp to peak
    { duration: '5m', target: 50 },   // Stress test
    { duration: '2m', target: 0 },    // Ramp down
  ],
  thresholds: {
    http_req_duration: ['p(95)<500'],
    http_req_failed: ['rate<0.01'],
  },
};

export default function () {
  const response = http.post(
    'https://api.clay.com/v1/resource',
    JSON.stringify({ test: true }),
    {
      headers: {
        'Content-Type': 'application/json',
        'Authorization': `Bearer ${__ENV.CLAY_API_KEY}`,
      },
    }
  );

  check(response, {
    'status is 200': (r) => r.status === 200,
    'latency < 500ms': (r) => r.timings.duration < 500,
  });

  sleep(1);
}
```

### Run Load Test
```bash
# Install k6
brew install k6  # macOS
# or: sudo apt install k6  # Linux

# Run test
k6 run --env CLAY_API_KEY=${CLAY_API_KEY} clay-load-test.js

# Run with output to InfluxDB
k6 run --out influxdb=http://localhost:8086/k6 clay-load-test.js
```

## Scaling Patterns

### Horizontal Scaling
```yaml
# kubernetes HPA
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: clay-integration-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: clay-integration
  minReplicas: 2
  maxReplicas: 20
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
    - type: Pods
      pods:
        metric:
          name: clay_queue_depth
        target:
          type: AverageValue
          averageValue: 100
```

### Connection Pooling
```typescript
import { Pool } from 'generic-pool';

const clayPool = Pool.create({
  create: async () => {
    return new ClayClient({
      apiKey: process.env.CLAY_API_KEY!,
    });
  },
  destroy: async (client) => {
    await client.close();
  },
  max: 20,
  min: 5,
  idleTimeoutMillis: 30000,
});

async function withClayClient<T>(
  fn: (client: ClayClient) => Promise<T>
): Promise<T> {
  const client = await clayPool.acquire();
  try {
    return await fn(client);
  } finally {
    clayPool.release(client);
  }
}
```

## Capacity Planning

### Metrics to Monitor
| Metric | Warning | Critical |
|--------|---------|----------|
| CPU Utilization | > 70% | > 85% |
| Memory Usage | > 75% | > 90% |
| Request Queue Depth | > 100 | > 500 |
| Error Rate | > 1% | > 5% |
| P95 Latency | > 1000ms | > 3000ms |

### Capacity Calculation
```typescript
interface CapacityEstimate {
  currentRPS: number;
  maxRPS: number;
  headroom: number;
  scaleRecommendation: string;
}

function estimateClayCapacity(
  metrics: SystemMetrics
): CapacityEstimate {
  const currentRPS = metrics.requestsPerSecond;
  const avgLatency = metrics.p50Latency;
  const cpuUtilization = metrics.cpuPercent;

  // Estimate max RPS based on current performance
  const maxRPS = currentRPS / (cpuUtilization / 100) * 0.7; // 70% target
  const headroom = ((maxRPS - currentRPS) / currentRPS) * 100;

  return {
    currentRPS,
    maxRPS: Math.floor(maxRPS),
    headroom: Math.round(headroom),
    scaleRecommendation: headroom < 30
      ? 'Scale up soon'
      : headroom < 50
      ? 'Monitor closely'
      : 'Adequate capacity',
  };
}
```

## Benchmark Results Template

```markdown
## Clay Performance Benchmark
**Date:** YYYY-MM-DD
**Environment:** [staging/production]
**SDK Version:** X.Y.Z

### Test Configuration
- Duration: 10 minutes
- Ramp: 10 → 100 → 10 VUs
- Target endpoint: /v1/resource

### Results
| Metric | Value |
|--------|-------|
| Total Requests | 50,000 |
| Success Rate | 99.9% |
| P50 Latency | 120ms |
| P95 Latency | 350ms |
| P99 Latency | 800ms |
| Max RPS Achieved | 150 |

### Observations
- [Key finding 1]
- [Key finding 2]

### Recommendations
- [Scaling recommendation]
```

## Instructions

### Step 1: Create Load Test Script
Write k6 test script with appropriate thresholds.

### Step 2: Configure Auto-Scaling
Set up HPA with CPU and custom metrics.

### Step 3: Run Load Test
Execute test and collect metrics.

### Step 4: Analyze and Document
Record results in benchmark template.

## Output
- Load test script created
- HPA configured
- Benchmark results documented
- Capacity recommendations defined

## Error Handling
| Issue | Cause | Solution |
|-------|-------|----------|
| k6 timeout | Rate limited | Reduce RPS |
| HPA not scaling | Wrong metrics | Verify metric name |
| Connection refused | Pool exhausted | Increase pool size |
| Inconsistent results | Warm-up needed | Add ramp-up phase |

## Examples

### Quick k6 Test
```bash
k6 run --vus 10 --duration 30s clay-load-test.js
```

### Check Current Capacity
```typescript
const metrics = await getSystemMetrics();
const capacity = estimateClayCapacity(metrics);
console.log('Headroom:', capacity.headroom + '%');
console.log('Recommendation:', capacity.scaleRecommendation);
```

### Scale HPA Manually
```bash
kubectl scale deployment clay-integration --replicas=5
kubectl get hpa clay-integration-hpa
```

## Resources
- [k6 Documentation](https://k6.io/docs/)
- [Kubernetes HPA](https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/)
- [Clay Rate Limits](https://docs.clay.com/rate-limits)

## Next Steps
For reliability patterns, see `clay-reliability-patterns`.

Overview

This skill implements Clay load testing, auto-scaling, and capacity planning strategies for Clay integrations. It guides you to create k6 load tests, configure Kubernetes HPA with custom metrics, and produce benchmark reports that drive scaling decisions. Use it to validate performance, define safe headroom, and automate horizontal scaling in production-like environments.

How this skill works

The skill provides a k6 test script template and run commands to generate traffic against Clay endpoints while exporting metrics to InfluxDB or Prometheus. It recommends HPA configuration that combines CPU and a custom queue-depth metric and shows connection pooling patterns to reduce resource churn. Finally, it includes capacity estimation logic and a benchmark template to convert observed metrics into concrete scale recommendations.

When to use it

  • Before releasing a new integration or major feature that affects request volume
  • When validating SLAs or P95/P99 latency targets under realistic load
  • While configuring Kubernetes HPA to ensure scaling reacts to both CPU and queue depth
  • During capacity planning to estimate headroom and trigger scale actions
  • When troubleshooting intermittent errors or pool exhaustion under load

Best practices

  • Start with a ramp-up phase in k6 to warm caches and avoid false failures
  • Export k6 metrics to InfluxDB/Prometheus for dashboards and long-term analysis
  • Combine resource-based (CPU) and application-level (queue depth) HPA metrics
  • Use connection pooling to limit client creation overhead and reduce spikes
  • Define clear thresholds (warning/critical) for CPU, memory, latency, and error rate

Example use cases

  • Run a k6 script that ramps users from 10 to 50 and validates p95 latency under 500ms
  • Configure an HPA that scales from 2 to 20 replicas using CPU and clay_queue_depth
  • Estimate max RPS and headroom from Prometheus metrics to decide replica targets
  • Benchmark a staging environment, record results using the template, and create recommendations
  • Diagnose errors during load by checking pool size, HPA metrics, and rate limits

FAQ

What are the required prerequisites?

Install k6, have a Kubernetes cluster with HPA, expose metrics to Prometheus/InfluxDB, and use test environment API keys.

How do I avoid false failures during tests?

Include a ramp-up phase to warm caches, monitor rate limits, and run multiple iterations to verify consistency.

When should I scale HPA thresholds?

Adjust thresholds after benchmark runs: target ~70% CPU utilization as steady state, and use queue depth warning/critical levels based on observed headroom.