home / skills / mjunaidca / mjs-agent-skills / kafka

kafka skill

/.claude/skills/kafka

This skill helps you deploy Kafka on Kubernetes with Strimzi KRaft, adapt resources by environment, and troubleshoot issues.

npx playbooks add skill mjunaidca/mjs-agent-skills --skill kafka

Review the files below or copy the command above to add this skill to your agents.

Files (25)
SKILL.md
8.2 KB
---
name: kafka
description: |
  Apache Kafka on Kubernetes with Strimzi (KRaft mode, no ZooKeeper).
  This skill should be used when users ask to deploy Kafka clusters, build
  producers/consumers, implement event-driven patterns, or debug Kafka issues.
  Includes tested manifests and Makefile for one-command deployment.
hooks:
  PreToolUse:
    - matcher: "Bash"
      hooks:
        - type: command
          command: "bash \"$CLAUDE_PROJECT_DIR\"/.claude/hooks/verify-kubectl-context.sh"
---

# Apache Kafka

Event streaming for Kubernetes. Strimzi operator, KRaft mode, no ZooKeeper.

## Quick Start (Tested)

```bash
make install    # Deploy Strimzi + Kafka
make test       # Verify everything works
make status     # Show resources
make uninstall  # Clean up
```

**Requirements:** Kubernetes cluster, Helm 3+

**Versions:** Strimzi 0.49+, Kafka 4.1.1

---

## Resource Detection & Adaptation

**Before generating manifests, detect the target environment:**

```bash
# Detect machine memory
sysctl -n hw.memsize 2>/dev/null | awk '{print $0/1024/1024/1024 " GB"}' || \
  grep MemTotal /proc/meminfo | awk '{print $2/1024/1024 " GB"}'

# Detect Docker Desktop allocation
docker info --format '{{.MemTotal}}' 2>/dev/null | awk '{print $0/1024/1024/1024 " GB"}'

# Detect Kubernetes node capacity
kubectl get nodes -o jsonpath='{.items[0].status.capacity.memory}' 2>/dev/null
```

**Adapt resource configuration based on detection:**

| Detected RAM | Profile | Kafka Memory | Action |
|--------------|---------|--------------|--------|
| < 12GB | Minimal | 512Mi-1Gi | Warn user about constraints |
| 12-24GB | Standard | 1Gi-2Gi | Default configuration |
| > 24GB | Production | 4Gi-8Gi | Enable full features |

### Adaptive Resource Templates

**Minimal (detected < 12GB):**
```yaml
resources:
  requests:
    memory: 512Mi
    cpu: 200m
  limits:
    memory: 1Gi
    cpu: 500m
```
⚠️ Agent should warn: "Limited resources detected. Kafka may be unstable under load."

**Standard (detected 12-24GB):**
```yaml
resources:
  requests:
    memory: 1Gi
    cpu: 250m
  limits:
    memory: 2Gi
    cpu: 1000m
```

**Production (detected > 24GB or real cluster):**
```yaml
resources:
  requests:
    memory: 4Gi
    cpu: 1000m
  limits:
    memory: 8Gi
    cpu: 4000m
```

### Agent Behavior

1. **Always detect** before generating manifests
2. **Adapt** resource configs to detected environment
3. **Warn** if resources are insufficient for requested workload
4. **Suggest** Docker Desktop settings if running locally

---

## What This Skill Does

| Task | How |
|------|-----|
| **Analyze coupling** | Identify temporal, availability, behavioral issues |
| **Explain eventual consistency** | Consistency windows, read-your-writes patterns |
| **Design events** | Domain events, CloudEvents, Avro schemas |
| Deploy Kafka | Helm (Strimzi) + kubectl (manifests) |
| Create topics | KafkaTopic CRD |
| Build producers | confluent-kafka-python templates |
| Build consumers | AIOConsumer for FastAPI |
| Debug issues | Runbooks in references/ |

## What This Skill Does NOT Do

- Deploy ZooKeeper (KRaft only)
- Manage Kafka Streams applications
- Configure multi-datacenter replication

---

## Deployment

### Install Strimzi Operator

```bash
helm repo add strimzi https://strimzi.io/charts
helm install strimzi-operator strimzi/strimzi-kafka-operator -n kafka --create-namespace --wait
```

### Deploy Kafka Cluster

```bash
kubectl apply -f manifests/kafka-cluster.yaml -n kafka
kubectl wait kafka/dev-cluster --for=condition=Ready --timeout=300s -n kafka
```

### Create Topic

```bash
kubectl apply -f manifests/kafka-topic.yaml -n kafka
```

### Verify

```bash
kubectl get kafka,kafkatopic,pods -n kafka
```

---

## Core Concepts

```
Topic      = Named stream (like a database table)
Partition  = Ordered log within topic (parallelism unit)
Consumer Group = Consumers sharing work (partition → one consumer)
Offset     = Consumer position (commit to track progress)
Broker     = Kafka server
Controller = Metadata manager (KRaft replaces ZooKeeper)
```

---

## Local Development

Connect from your host machine (no port-forward needed):

```python
# From your local machine (outside Kubernetes)
producer = Producer({'bootstrap.servers': 'localhost:30092'})
```

Connect from inside Kubernetes (pod-to-pod):

```python
# From another pod in the cluster
producer = Producer({'bootstrap.servers': 'dev-cluster-kafka-bootstrap.kafka:9092'})
```

| Location | Bootstrap Server |
|----------|------------------|
| Local machine | `localhost:30092` |
| Same namespace | `dev-cluster-kafka-bootstrap:9092` |
| Different namespace | `dev-cluster-kafka-bootstrap.kafka.svc.cluster.local:9092` |

---

## Producer/Consumer (Python)

```python
from confluent_kafka import Producer, Consumer

# Producer (production config)
producer = Producer({
    'bootstrap.servers': 'localhost:30092',  # Or K8s service for pods
    'acks': 'all',
    'enable.idempotence': True,
})
producer.produce('my-topic', key='key', value='message')
producer.flush()

# Consumer
consumer = Consumer({
    'bootstrap.servers': 'localhost:30092',
    'group.id': 'my-group',
    'auto.offset.reset': 'earliest',
    'enable.auto.commit': False,
})
consumer.subscribe(['my-topic'])
msg = consumer.poll(1.0)
```

See `assets/templates/producer-consumer.py` for async FastAPI integration.

---

## Debugging

```bash
# Check consumer lag
kubectl exec -n kafka dev-cluster-dual-role-0 -- \
  bin/kafka-consumer-groups.sh --bootstrap-server localhost:9092 \
  --describe --group <group-name>

# List topics
kubectl exec -n kafka dev-cluster-dual-role-0 -- \
  bin/kafka-topics.sh --bootstrap-server localhost:9092 --list

# Describe topic
kubectl exec -n kafka dev-cluster-dual-role-0 -- \
  bin/kafka-topics.sh --bootstrap-server localhost:9092 \
  --describe --topic <topic-name>
```

See `references/debugging-runbooks.md` for detailed troubleshooting.

---

## Delivery Semantics

| Guarantee | Config | Use When |
|-----------|--------|----------|
| At-most-once | `acks=0` | Metrics, logs (may lose) |
| At-least-once | `acks=all` + manual commit | Most cases (may duplicate) |
| Exactly-once | Transactions | Financial (higher latency) |

**Default:** At-least-once with idempotent consumers.

---

## File Structure

```
kafka/
├── Makefile                 # Tested deployment commands
├── manifests/
│   ├── kafka-cluster.yaml   # KRaft cluster (tested)
│   └── kafka-topic.yaml     # Topic CRD
├── assets/templates/
│   └── producer-consumer.py # Python async templates
└── references/              # Deep knowledge
    ├── core-concepts.md
    ├── producers.md
    ├── consumers.md
    ├── debugging-runbooks.md
    ├── gotchas.md
    └── ... (15 files)
```

---

## Architecture Analysis

When analyzing synchronous architectures for coupling:

```
Scenario: Service A calls B, C, D directly (500ms each)

Temporal Coupling?
└── Does caller wait for all responses? → YES = coupled

Availability Coupling?
└── If B is down, does A fail? → YES = coupled

Behavioral Coupling?
└── Does A import B, C, D clients? → YES = coupled
```

**Solution:** Publish domain event, services consume independently.

See `references/architecture-patterns.md` for detailed analysis templates.

---

## References

| File | When to Read |
|------|--------------|
| `references/architecture-patterns.md` | **Coupling analysis, eventual consistency, when to use Kafka** |
| `references/agent-event-patterns.md` | **AI agent coordination, correlation IDs, fanout** |
| `references/strimzi-deployment.md` | KRaft mode, CRDs, storage sizing |
| `references/producers.md` | Producer configuration, batching, tuning |
| `references/consumers.md` | Consumer groups, commits |
| `references/delivery-semantics.md` | At-most/least/exactly-once decision tree |
| `references/outbox-pattern.md` | Transactional outbox with Debezium CDC |
| `references/debugging-runbooks.md` | Lag, rebalancing issues |
| `references/monitoring.md` | Prometheus, alerts, Grafana |
| `references/gotchas.md` | Common mistakes |
| `references/security-patterns.md` | SCRAM, mTLS |

---

## Related Skills

| Skill | Use For |
|-------|---------|
| `/kubernetes` | Cluster operations |
| `/helm` | Chart customization |
| `/docker` | Local development |

Overview

This skill provides a tested, opinionated workflow to deploy and operate Apache Kafka on Kubernetes using the Strimzi operator in KRaft mode (no ZooKeeper). It includes manifests, a Makefile for one-command deployment, and Python producer/consumer templates for quick integration. The skill adapts resource profiles to target environments and includes runbooks for common debugging scenarios. Use it to deploy clusters, create topics, build clients, and diagnose streaming issues.

How this skill works

Before generating or applying manifests the skill detects host and cluster resources (memory and CPU) and selects an adaptive resource profile (Minimal, Standard, Production). It installs Strimzi via Helm, applies KRaft-mode Kafka CRs and topic CRDs, and exposes bootstrap endpoints for local and in-cluster access. The package contains Python templates for producers/consumers, runbooks for debugging (lag, rebalances, topic inspection), and a Makefile with install/test/status/uninstall targets for repeatable workflows.

When to use it

  • Deploy a Kafka cluster on Kubernetes where ZooKeeper is not desired (KRaft).
  • Rapid local development with Docker Desktop or a small k8s cluster and need working producer/consumer examples.
  • Implement event-driven patterns, domain events, or CloudEvents with schema guidance.
  • Debug consumer lag, topic metadata, or broker/controller issues in KRaft clusters.
  • Generate manifests adapted to the available node/container memory to avoid resource overcommit.

Best practices

  • Always run the resource detection step and choose the profile matching your environment.
  • Prefer the Standard profile for general workloads; use Production only on real clusters with >24GB.
  • Use acks=all and enable.idempotence for at-least-once guarantees; use transactions for exactly-once where required.
  • Expose bootstrap.servers appropriately: localhost:30092 for host dev, cluster service DNS for pod-to-pod.
  • Automate install/test/status/uninstall via the provided Makefile for reproducible deployments.

Example use cases

  • One-command deploy of Strimzi + KRaft Kafka for a staging environment using make install.
  • Generate adaptive manifests for a low-RAM developer laptop and warn about constraints.
  • Build a FastAPI-based async consumer using the provided Python template for event-driven microservices.
  • Run debugging runbooks to identify consumer lag and describe topic partitions during an incident.
  • Create KafkaTopic CRDs to provision topics as part of CI/CD pipelines.

FAQ

Does this skill deploy ZooKeeper?

No. It uses Strimzi in KRaft mode and does not deploy ZooKeeper.

What environments are supported for local development?

Supports Docker Desktop and standard Kubernetes clusters; bootstrap endpoints include localhost:30092 for host access and service DNS for in-cluster access.

How does it decide resource settings?

It detects host and node memory/CPU and maps to Minimal, Standard, or Production profiles, warning when resources are insufficient.