home / skills / ancoleman / ai-design-components / load-balancing-patterns

load-balancing-patterns skill

safe

This skill helps you select and configure the right load balancing strategy across L4/L7, cloud-managed, self-managed, and Kubernetes, with health checks and

npx playbooks add skill ancoleman/ai-design-components --skill load-balancing-patterns

Review the files below or copy the command above to add this skill to your agents.

Files (21)

SKILL.md

14.9 KB

---
name: load-balancing-patterns
description: When distributing traffic across multiple servers or regions, use this skill to select and configure the appropriate load balancing solution (L4/L7, cloud-managed, self-managed, or Kubernetes ingress) with proper health checks and session management.
---

# Load Balancing Patterns

Distribute traffic across infrastructure using the appropriate load balancing approach, from simple round-robin to global multi-region failover.

## When to Use This Skill

Use load-balancing-patterns when:
- Distributing traffic across multiple application servers
- Implementing high availability and failover
- Routing traffic based on URLs, headers, or geographic location
- Managing session persistence across stateless backends
- Deploying applications to Kubernetes clusters
- Configuring global traffic management across regions
- Implementing zero-downtime deployments (blue-green, canary)
- Selecting between cloud-managed and self-managed load balancers

## Core Load Balancing Concepts

### Layer 4 vs Layer 7

**Layer 4 (L4) - Transport Layer:**
- Routes based on IP address and port (TCP/UDP packets)
- No application data inspection, lower latency, higher throughput
- Protocol agnostic, preserves client IP addresses
- Use for: Database connections, video streaming, gaming, financial transactions, non-HTTP protocols

**Layer 7 (L7) - Application Layer:**
- Routes based on HTTP URLs, headers, cookies, request body
- Full application data visibility, SSL/TLS termination, caching, WAF integration
- Content-based routing capabilities
- Use for: Web applications, REST APIs, microservices, GraphQL endpoints, complex routing logic

For detailed comparison including performance benchmarks and hybrid approaches, see `references/l4-vs-l7-comparison.md`.

### Load Balancing Algorithms

| Algorithm | Distribution Method | Use Case |
|-----------|-------------------|----------|
| **Round Robin** | Sequential | Stateless, similar servers |
| **Weighted Round Robin** | Capacity-based | Different server specs |
| **Least Connections** | Fewest active connections | Long-lived connections |
| **Least Response Time** | Fastest server | Performance-sensitive |
| **IP Hash** | Client IP-based | Session persistence |
| **Resource-Based** | CPU/memory metrics | Varying workloads |

### Health Check Types

**Shallow (Liveness):** Is the process alive?
- Endpoint: `/health/live` or `/live`
- Returns: 200 if process running
- Use for: Process monitoring, container health

**Deep (Readiness):** Can the service handle requests?
- Endpoint: `/health/ready` or `/ready`
- Validates: Database, cache, external API connectivity
- Use for: Load balancer routing decisions

**Health Check Hysteresis:** Different thresholds for marking up vs down to prevent flapping
- Example: 3 failures to mark down, 2 successes to mark up

For complete health check implementation patterns, see `references/health-check-strategies.md`.

## Cloud Load Balancers

### AWS Load Balancing

**Application Load Balancer (ALB) - Layer 7:**
- Use for: HTTP/HTTPS applications, microservices, WebSocket
- Features: Path/host/header routing, AWS WAF integration, Lambda targets
- Choose when: Content-based routing needed

**Network Load Balancer (NLB) - Layer 4:**
- Use for: Ultra-low latency (<1ms), TCP/UDP, static IPs, millions RPS
- Features: Preserves source IP, TLS termination
- Choose when: Non-HTTP protocols, performance critical

**Global Accelerator - Layer 4 Global:**
- Use for: Multi-region applications, global users, DDoS protection
- Features: Anycast IPs, automatic regional failover

### GCP Load Balancing

**Application LB (L7):** Global HTTPS LB, Cloud CDN integration, Cloud Armor (WAF/DDoS)
**Network LB (L4):** Regional TCP/UDP, pass-through balancing, session affinity
**Cloud Load Balancing:** Single anycast IP, global distribution, backend buckets

### Azure Load Balancing

**Application Gateway (L7):** WAF integration, URL-based routing, SSL termination, autoscaling
**Load Balancer (L4):** Basic and Standard SKUs, health probes, HA ports
**Traffic Manager (Global):** DNS-based routing (priority, weighted, performance, geographic)

For complete cloud provider configurations and Terraform examples, see `references/cloud-load-balancers.md`.

## Self-Managed Load Balancers

### NGINX

**Best for:** General-purpose HTTP/HTTPS load balancing, web application stacks

**Capabilities:**
- HTTP reverse proxy with multiple algorithms
- TCP/UDP stream load balancing
- SSL/TLS termination
- Passive health checks (open source), active health checks (NGINX Plus)
- Cookie-based sticky sessions (NGINX Plus)

**Basic configuration:**
```nginx
upstream backend {
    least_conn;
    server backend1.example.com:8080 weight=3;
    server backend2.example.com:8080 weight=2;
    keepalive 32;
}

server {
    listen 80;
    location / {
        proxy_pass http://backend;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
    }
}
```

For complete NGINX patterns and advanced configurations, see `references/nginx-patterns.md`.

### HAProxy

**Best for:** Maximum performance, database load balancing, resource efficiency

**Capabilities:**
- Highest raw throughput, lowest memory footprint
- 10+ load balancing algorithms
- Sophisticated health checks (HTTP, TCP, Redis, MySQL, etc.)
- Cookie or IP-based persistence

**Basic configuration:**
```haproxy
frontend http_front
    bind *:80
    default_backend web_servers

backend web_servers
    balance roundrobin
    option httpchk GET /health
    server web1 192.168.1.101:8080 check
    server web2 192.168.1.102:8080 check
```

For complete HAProxy patterns, see `references/haproxy-patterns.md`.

### Envoy

**Best for:** Microservices, Kubernetes, service mesh integration

**Capabilities:**
- Cloud-native design with dynamic configuration (xDS APIs)
- Circuit breakers, retries, timeouts
- Advanced health checks (TCP, HTTP, gRPC)
- Excellent observability

For complete Envoy patterns, see `references/envoy-patterns.md`.

### Traefik

**Best for:** Docker/Kubernetes environments, dynamic configuration, ease of use

**Capabilities:**
- Automatic service discovery
- Native Kubernetes integration
- Built-in Let's Encrypt support
- Middleware system (auth, rate limiting)

For complete Traefik patterns, see `references/traefik-patterns.md`.

## Kubernetes Ingress Controllers

### Selection Guide

| Controller | Best For | Strengths |
|------------|----------|-----------|
| **NGINX Ingress** (F5) | General purpose | Stability, wide adoption, mature features |
| **Traefik** | Dynamic environments | Easy configuration, service discovery |
| **HAProxy Ingress** | High performance | Advanced L7 routing, reliability |
| **Envoy** (Contour/Gateway) | Service mesh | Rich L7 features, extensibility |
| **Kong** | API-heavy apps | JWT auth, rate limiting, plugins |
| **Cloud Provider** | Single-cloud | Native cloud integration |

### Basic Ingress Example

```yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: app-ingress
  annotations:
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
    nginx.ingress.kubernetes.io/affinity: "cookie"
spec:
  ingressClassName: nginx
  tls:
  - hosts:
    - app.example.com
    secretName: app-tls
  rules:
  - host: app.example.com
    http:
      paths:
      - path: /api
        pathType: Prefix
        backend:
          service:
            name: api-service
            port:
              number: 80
      - path: /
        pathType: Prefix
        backend:
          service:
            name: web-service
            port:
              number: 80
```

For complete Kubernetes ingress examples and Gateway API patterns, see `references/kubernetes-ingress.md`.

## Session Persistence

### Sticky Sessions (Use Sparingly)

**Cookie-Based:** Load balancer sets cookie to track server affinity
- Accurate routing, works with NAT/proxies
- HTTP only, adds cookie overhead

**IP Hash:** Hash client IP to select backend server
- No cookie required, works for non-HTTP
- Poor distribution with NAT/proxies

**Drawbacks:** Uneven load distribution, session lost on server failure, complicates scaling

### Shared Session Store (Recommended)

Architecture: Stateless application servers + centralized session storage (Redis, Memcached)

**Benefits:**
- No sticky sessions needed
- True load balancing
- Server failures don't lose sessions
- Horizontal scaling trivial

### Client-Side Tokens (Best for APIs)

JWT (JSON Web Tokens): Server generates signed token, client stores and sends with requests

**Benefits:**
- Fully stateless servers
- Perfect load balancing
- No session storage needed

For complete session management patterns and code examples, see `references/session-persistence.md`.

## Global Load Balancing

### GeoDNS Routing

Route users to nearest server based on geographic location:
- DNS returns different IPs based on client location
- Reduces latency, supports compliance and regional content
- Implementation: AWS Route 53, GCP Cloud DNS, Azure Traffic Manager

### Multi-Region Failover

Primary/secondary region configuration:
- Health checks determine primary region health
- Automatic DNS failover to secondary
- Transparent to clients

### CDN Integration

Combine load balancing with CDN:
- GeoDNS routes to closest CDN PoP
- CDN caches content globally
- Origin load balancing for cache misses

For complete global load balancing examples with Terraform, see `references/global-load-balancing.md`.

## Decision Frameworks

### L4 vs L7 Selection

Choose **L4** when:
- Protocol is TCP/UDP (not HTTP)
- Ultra-low latency critical (<1ms)
- High throughput required (millions RPS)
- Client source IP preservation needed

Choose **L7** when:
- Protocol is HTTP/HTTPS
- Content-based routing needed (URL, headers)
- SSL termination required
- WAF integration needed
- Microservices architecture

### Cloud vs Self-Managed

Choose **Cloud-Managed** when:
- Single cloud deployment
- Auto-scaling required
- Team lacks load balancer expertise
- Managed service preferred

Choose **Self-Managed** when:
- Multi-cloud or hybrid deployment
- Advanced routing requirements
- Cost optimization important
- Full control needed
- Vendor lock-in avoidance

### Self-Managed Selection

- **NGINX:** General-purpose, web stacks, HTTP/3 support
- **HAProxy:** Maximum performance, database LB, lowest resource usage
- **Envoy:** Microservices, service mesh, dynamic configuration
- **Traefik:** Docker/Kubernetes, automatic discovery, easy configuration

## Configuration Examples

Complete working examples available in `examples/` directory:

**Cloud Providers:**
- `examples/aws/alb-terraform.tf` - AWS ALB with path-based routing
- `examples/aws/nlb-terraform.tf` - AWS NLB for TCP load balancing

**Self-Managed:**
- `examples/nginx/http-load-balancing.conf` - NGINX HTTP reverse proxy
- `examples/haproxy/http-lb.cfg` - HAProxy configuration
- `examples/envoy/basic-lb.yaml` - Envoy cluster configuration
- `examples/traefik/kubernetes-ingress.yaml` - Traefik IngressRoute

**Kubernetes:**
- `examples/kubernetes/nginx-ingress.yaml` - NGINX Ingress with TLS
- `examples/kubernetes/traefik-ingress.yaml` - Traefik IngressRoute
- `examples/kubernetes/gateway-api.yaml` - Gateway API configuration

## Monitoring and Observability

### Key Metrics

**Throughput:** Requests per second, bytes transferred, connection rate
**Latency:** Request duration (p50, p95, p99), backend response time, SSL handshake time
**Errors:** HTTP error rates (4xx, 5xx), backend connection failures, health check failures
**Resource Utilization:** CPU, memory, active connections, connection queue depth
**Health:** Healthy/unhealthy backend count, health check success rate

### Load Balancer Logs

Enable access logs for request/response details, client IPs, response times, error tracking
- **AWS ALB:** Store in S3, analyze with Athena
- **NGINX:** Custom log format, ship to centralized logging
- **HAProxy:** Syslog integration, structured logging

## Troubleshooting

### Uneven Load Distribution

**Symptoms:** One server receives disproportionate traffic
**Causes:** Sticky sessions with few clients, IP hash with NAT concentration, long-lived connections
**Solutions:** Switch to least connections, disable sticky sessions, implement connection draining

### Health Check Flapping

**Symptoms:** Servers rapidly transition between healthy/unhealthy
**Causes:** Health check timeout too short, threshold too low, network instability
**Solutions:** Increase interval and timeout, implement hysteresis, use deep health checks

### Session Loss After Failover

**Symptoms:** Users logged out when server fails
**Causes:** Sticky sessions without replication, in-memory sessions
**Solutions:** Implement shared session store (Redis), use client-side tokens (JWT)

## Integration Points

**Related Skills:**
- `infrastructure-as-code` - Deploy load balancers via Terraform/Pulumi
- `kubernetes-operations` - Ingress controllers for K8s traffic management
- `network-architecture` - Network design and topology for load balancing
- `deploying-applications` - Blue-green and canary deployments via load balancers
- `observability` - Load balancer metrics, access logs, distributed tracing
- `security-hardening` - WAF integration, rate limiting, DDoS protection
- `service-mesh` - Envoy as both ingress and service mesh proxy
- `implementing-tls` - TLS termination and certificate management

## Quick Reference

### Selection Matrix

| Use Case | Recommended Solution |
|----------|---------------------|
| HTTP web app (AWS) | ALB |
| Non-HTTP protocol (AWS) | NLB |
| Kubernetes HTTP ingress | NGINX Ingress or Traefik |
| Maximum performance | HAProxy |
| Service mesh | Envoy |
| Docker Swarm | Traefik |
| Multi-cloud portable | NGINX or HAProxy |
| Global distribution | CloudFlare, AWS Global Accelerator |

### Algorithm Selection

| Traffic Pattern | Algorithm |
|-----------------|-----------|
| Stateless, similar servers | Round Robin |
| Stateless, different capacity | Weighted Round Robin |
| Long-lived connections | Least Connections |
| Performance-sensitive | Least Response Time |
| Session persistence needed | IP Hash or Cookie |
| Varying server load | Resource-Based |

### Health Check Configuration

| Service Type | Check Type | Interval | Timeout |
|--------------|------------|----------|---------|
| Web app | HTTP /health | 10s | 3s |
| API | HTTP /health/ready | 10s | 5s |
| Database | TCP connect | 5s | 2s |
| Critical service | HTTP deep check | 5s | 3s |
| Background worker | HTTP /live | 30s | 5s |

## Summary

Load balancing is essential for distributing traffic, ensuring high availability, and enabling horizontal scaling. Choose L4 for raw performance and non-HTTP protocols, L7 for intelligent content-based routing. Prefer cloud-managed load balancers for simplicity and auto-scaling, self-managed for multi-cloud portability and advanced features. Implement proper health checks with hysteresis, avoid sticky sessions when possible, and monitor key metrics continuously.

For deployment patterns, see examples in `examples/aws/`, `examples/nginx/`, `examples/kubernetes/`, and other provider directories.

Overview

This skill helps you choose and configure the right load balancing pattern for your application stack, from L4 vs L7 decisions to cloud-managed, self-managed, and Kubernetes ingress options. It focuses on practical outcomes: correct routing, health checks, session management, global failover, and observability. Use it to align architecture, cost, and operational constraints with traffic distribution needs.

How this skill works

The skill inspects application protocol needs, latency and throughput requirements, session persistence demands, and deployment topology (single-cloud, multi-cloud, Kubernetes). It recommends L4 vs L7, specific cloud or self-managed technologies (ALB/NLB, NGINX, HAProxy, Envoy, Traefik), health check strategies, and session patterns (sticky vs shared store vs JWT). It also provides guidance for global DNS/CDN integration and monitoring signals to track.

When to use it

Distributing traffic across multiple application servers or regions to achieve HA and failover
Implementing content-based routing for web apps, APIs, or microservices
Deploying to Kubernetes and selecting an ingress controller or Gateway API
Choosing between cloud-managed vs self-managed solutions for cost or control reasons
Managing session persistence or migrating to stateless session architectures
Designing global traffic management with GeoDNS, CDN and multi-region failover

Best practices

Choose L4 for non-HTTP, ultra-low-latency, or when preserving client IP is required; choose L7 for content-aware routing and WAF integration
Prefer shared session stores (Redis/Memcached) or JWT for true stateless scaling; use sticky sessions only when necessary
Implement deep readiness and shallow liveness endpoints; apply hysteresis to avoid flapping
Use cloud-managed balancers for single-cloud autoscaling and ease; use self-managed (NGINX/HAProxy/Envoy/Traefik) for multi-cloud control or advanced routing
Instrument load balancers and backends for throughput, latency (p50/p95/p99), errors, and health-check trends
Plan DNS TTLs and health-check intervals for smooth multi-region failover and blue/green or canary deployments

Example use cases

AWS ALB with path-based routing for a microservices web app and WAF integration
NLB for low-latency TCP services such as game servers or databases preserving source IP
NGINX or HAProxy as self-managed reverse proxies for multi-cloud portability and cost optimization
Traefik in Kubernetes for automatic service discovery and Let’s Encrypt integration
Envoy as an ingress for service-mesh scenarios with retries, circuit breakers, and observability
Global setup: CDN at edge, GeoDNS for regional routing, and DNS failover to a secondary region

FAQ

When should I avoid sticky sessions?

Avoid sticky sessions when you need even distribution, easy horizontal scaling, and resilience to server failure; prefer shared session stores or JWT for stateless architectures.

How do I prevent health-check flapping?

Use deeper readiness checks, longer timeouts, and hysteresis thresholds (more failures to mark down than successes to mark up) to stabilize transitions.