home / skills / secondsky / claude-skills / health-check-endpoints

This skill helps you implement robust health check endpoints for liveness and readiness to improve reliability and auto-scaling.

npx playbooks add skill secondsky/claude-skills --skill health-check-endpoints

Review the files below or copy the command above to add this skill to your agents.

Files (2)
SKILL.md
2.6 KB
---
name: health-check-endpoints
description: Health check endpoints for liveness, readiness, dependency monitoring. Use for Kubernetes, load balancers, auto-scaling, or encountering probe failures, startup delays, dependency checks, timeout configuration errors.
---

# Health Check Endpoints

Implement health checks for monitoring service availability and readiness.

## Probe Types

| Probe | Purpose | Failure Action |
|-------|---------|----------------|
| Liveness | Is process alive? | Restart container |
| Readiness | Can handle traffic? | Remove from LB |
| Startup | Has app started? | Delay other probes |
| Deep | All deps healthy? | Trigger alerts |

## Implementation (Express)

```javascript
class HealthChecker {
  async checkDatabase() {
    const start = Date.now();
    try {
      await db.query('SELECT 1');
      return { status: 'healthy', latency: Date.now() - start };
    } catch (err) {
      return { status: 'unhealthy', error: String(err?.message || err) };
    }
  }

  async checkRedis() {
    try {
      await redis.ping();
      return { status: 'healthy' };
    } catch (err) {
      return { status: 'unhealthy', error: err.message };
    }
  }

  async getReadiness() {
    const checks = await Promise.all([
      this.checkDatabase(),
      this.checkRedis()
    ]);
    const healthy = checks.every(c => c.status === 'healthy');
    return { healthy, checks };
  }
}

// Liveness - lightweight
app.get('/health/live', (req, res) => {
  res.json({ status: 'ok', timestamp: new Date().toISOString() });
});

// Readiness - check dependencies
app.get('/health/ready', async (req, res) => {
  const health = await healthChecker.getReadiness();
  res.status(health.healthy ? 200 : 503).json(health);
});
```

## Kubernetes Configuration

```yaml
livenessProbe:
  httpGet:
    path: /health/live
    port: 3000
  initialDelaySeconds: 15
  periodSeconds: 10
  failureThreshold: 3

readinessProbe:
  httpGet:
    path: /health/ready
    port: 3000
  initialDelaySeconds: 5
  periodSeconds: 10
```

## Best Practices

- Keep liveness checks minimal (no external deps)
- Check only critical systems in readiness
- Return 200 for healthy, 503 for unhealthy
- Set reasonable timeouts to prevent cascading failures
- Include response time metrics

## Additional Implementations

See [references/implementations.md](references/implementations.md) for:
- Python Flask complete health checker
- Java Spring Boot Actuator
- Full Kubernetes deployment config

## Never Do

- Make liveness depend on external services
- Return 200 when dependencies are down
- Skip dependency checks in readiness

Overview

This skill provides production-ready health check endpoints for liveness, readiness, startup, and deep dependency monitoring. It's written in TypeScript and designed for use with Express-style servers and Kubernetes probes. It helps keep containers healthy, coordinate load balancers, and surface dependency issues quickly.

How this skill works

The skill exposes lightweight liveness endpoints that only verify the process is running and timestamped. Readiness endpoints run targeted checks against critical dependencies (database, cache) and return a 200 when ready or 503 when not. Startup and deep checks allow delaying other probes and performing full dependency health scans with timing and error details for alerting and metrics.

When to use it

  • Deploying services to Kubernetes with liveness/readiness probes
  • Configuring load balancers or ingress to remove unhealthy pods
  • Implementing autoscaling that depends on readiness and startup behavior
  • Troubleshooting probe failures, startup delays, or cascading timeouts
  • Monitoring critical dependencies (DB, Redis, external APIs) before traffic

Best practices

  • Keep liveness checks minimal and never call external services
  • Limit readiness checks to critical systems that must be ready to serve traffic
  • Return 200 for healthy and 503 for unhealthy to align with probe semantics
  • Set conservative timeouts and sensible initialDelay/period/failureThreshold values
  • Include latency/response-time metrics and error details for deeper checks

Example use cases

  • Expose /health/live for Kubernetes livenessProbe to prevent unnecessary restarts
  • Use /health/ready to gate load balancer traffic until DB and cache are available
  • Run a startup probe to delay readiness checks while migrations or warm-ups finish
  • Trigger alerts when deep checks detect downstream dependency failures
  • Record check latencies to identify slow dependencies and tune timeouts

FAQ

Should liveness check external services?

No. Liveness must confirm the process is alive; depending on external services can cause unwanted restarts.

What HTTP status codes should probes return?

Return 200 for healthy, 503 for unhealthy. Use these codes so Kubernetes and load balancers act correctly.