home / skills / openclaw / skills / k8-autoscaling

k8-autoscaling skill

/skills/rohitg00/k8-autoscaling

This skill configures Kubernetes autoscaling using HPA, VPA, and KEDA to optimize resource use and scale workloads efficiently.

npx playbooks add skill openclaw/skills --skill k8-autoscaling

Review the files below or copy the command above to add this skill to your agents.

Files (3)
SKILL.md
4.6 KB
---
name: k8s-autoscaling
description: Configure Kubernetes autoscaling with HPA, VPA, and KEDA. Use for horizontal/vertical pod autoscaling, event-driven scaling, and capacity management.
---

# Kubernetes Autoscaling

Comprehensive autoscaling using HPA, VPA, and KEDA with kubectl-mcp-server tools.

## Quick Reference

### HPA (Horizontal Pod Autoscaler)

Basic CPU-based scaling:
```yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: my-app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
```

Apply and verify:
```
apply_manifest(hpa_yaml, namespace)
get_hpa(namespace)
```

### VPA (Vertical Pod Autoscaler)

Right-size resource requests:
```yaml
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: my-app-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  updatePolicy:
    updateMode: "Auto"
```

## KEDA (Event-Driven Autoscaling)

### Detect KEDA Installation
```
keda_detect_tool()
```

### List ScaledObjects
```
keda_scaledobjects_list_tool(namespace)
keda_scaledobject_get_tool(name, namespace)
```

### List ScaledJobs
```
keda_scaledjobs_list_tool(namespace)
```

### Trigger Authentication
```
keda_triggerauths_list_tool(namespace)
keda_triggerauth_get_tool(name, namespace)
```

### KEDA-Managed HPAs
```
keda_hpa_list_tool(namespace)
```

See [KEDA-TRIGGERS.md](KEDA-TRIGGERS.md) for trigger configurations.

## Common KEDA Triggers

### Queue-Based Scaling (AWS SQS)
```yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: sqs-scaler
spec:
  scaleTargetRef:
    name: queue-processor
  minReplicaCount: 0  # Scale to zero!
  maxReplicaCount: 100
  triggers:
  - type: aws-sqs-queue
    metadata:
      queueURL: https://sqs.region.amazonaws.com/...
      queueLength: "5"
```

### Cron-Based Scaling
```yaml
triggers:
- type: cron
  metadata:
    timezone: America/New_York
    start: 0 8 * * 1-5   # 8 AM weekdays
    end: 0 18 * * 1-5    # 6 PM weekdays
    desiredReplicas: "10"
```

### Prometheus Metrics
```yaml
triggers:
- type: prometheus
  metadata:
    serverAddress: http://prometheus:9090
    metricName: http_requests_total
    query: sum(rate(http_requests_total{app="myapp"}[2m]))
    threshold: "100"
```

## Scaling Strategies

| Strategy | Tool | Use Case |
|----------|------|----------|
| CPU/Memory | HPA | Steady traffic patterns |
| Custom metrics | HPA v2 | Business metrics |
| Event-driven | KEDA | Queue processing, cron |
| Vertical | VPA | Right-size requests |
| Scale to zero | KEDA | Cost savings, idle workloads |

## Cost-Optimized Autoscaling

### Scale to Zero with KEDA
Reduce costs for idle workloads:
```
keda_scaledobjects_list_tool(namespace)
# ScaledObjects with minReplicaCount: 0 can scale to zero
```

### Right-Size with VPA
Get recommendations and apply:
```
get_resource_recommendations(namespace)
# Apply VPA recommendations
```

### Predictive Scaling
Use cron triggers for known patterns:
```yaml
# Scale up before traffic spike
triggers:
- type: cron
  metadata:
    start: 0 7 * * *  # 7 AM
    end: 0 9 * * *    # 9 AM
    desiredReplicas: "20"
```

## Multi-Cluster Autoscaling

Configure KEDA across clusters:
```
keda_scaledobjects_list_tool(namespace, context="production")
keda_scaledobjects_list_tool(namespace, context="staging")
```

## Troubleshooting

### HPA Not Scaling
```
get_hpa(namespace)
get_pod_metrics(name, namespace)  # Metrics available?
describe_pod(name, namespace)     # Resource requests set?
```

### KEDA Not Triggering
```
keda_scaledobject_get_tool(name, namespace)  # Check status
get_events(namespace)                        # Check events
```

### Common Issues

| Symptom | Check | Resolution |
|---------|-------|------------|
| HPA unknown | Metrics server | Install metrics-server |
| KEDA no scale | Trigger auth | Check TriggerAuthentication |
| VPA not updating | Update mode | Set updateMode: Auto |
| Scale down slow | Stabilization | Adjust stabilizationWindowSeconds |

## Best Practices

1. **Always Set Resource Requests**
   - HPA requires requests to calculate utilization

2. **Use Multiple Metrics**
   - Combine CPU + custom metrics for accuracy

3. **Stabilization Windows**
   - Prevent flapping with scaleDown stabilization

4. **Scale to Zero Carefully**
   - Consider cold start time
   - Use activation threshold

## Related Skills
- [k8s-cost](../k8s-cost/SKILL.md) - Cost optimization
- [k8s-troubleshoot](../k8s-troubleshoot/SKILL.md) - Debug scaling issues

Overview

This skill configures Kubernetes autoscaling using HPA, VPA, and KEDA to manage pod counts and resource requests across workloads. It helps implement horizontal CPU/custom-metric scaling, vertical right-sizing, and event-driven or cron-based scaling with scale-to-zero support. Use it to reduce cost, improve responsiveness, and automate capacity management.

How this skill works

The skill applies and inspects HPA, VPA, and KEDA resources and retrieves metrics and recommendations from the cluster. It can create and list ScaledObjects/ScaledJobs, check KEDA trigger and authentication status, and fetch VPA recommendations for right-sizing. It also provides quick troubleshooting checks for missing metrics, trigger failures, and stabilization settings.

When to use it

  • Implement CPU- or memory-based horizontal scaling for steady traffic patterns
  • Apply VPA when pods require automated resource request tuning
  • Use KEDA for event-driven scaling (queues, cron, Prometheus metrics) and scale-to-zero scenarios
  • Combine HPA and custom metrics for business-driven autoscaling
  • Audit and troubleshoot autoscaling issues across clusters

Best practices

  • Always set pod resource requests so HPA can compute utilization
  • Combine multiple metrics (CPU, custom business metrics) to avoid noisy scaling
  • Use stabilization windows to prevent frequent flapping during scale-down
  • Test scale-to-zero paths and account for cold-start latency
  • Use VPA recommendations in controlled rollouts before enabling Auto updateMode

Example use cases

  • Add an HPA that scales a web deployment between 2 and 10 replicas based on CPU utilization
  • Deploy a VPA to collect recommendations and right-size long-running background workers
  • Create a KEDA ScaledObject to scale queue processors to zero when queues are empty
  • Schedule cron-based scaling for predictable daily traffic spikes (pre-scale before business hours)
  • Use KEDA with Prometheus triggers to scale microservices based on custom request metrics

FAQ

Do I need metrics-server for HPA to work?

Yes. HPA requires metrics-server (or a compatible metrics pipeline) to read pod resource utilization.

Can VPA and HPA be used together?

Use caution: VPA adjusts resource requests and can affect HPA calculations. Prefer using VPA in recommendation mode or isolate VPA to workloads where HPA is not required.