home / skills / anton-abyzov / specweave / kafka-devops

kafka-devops skill

safe

/plugins/specweave-kafka/skills/kafka-devops

This skill helps you deploy, monitor, and optimize Kafka infrastructure with best practices for CI/CD, incident response, and capacity planning.

This is most likely a fork of the sw-kafka-devops skill from openclaw

npx playbooks add skill anton-abyzov/specweave --skill kafka-devops

Review the files below or copy the command above to add this skill to your agents.

Files (1)

SKILL.md

264 B

---
name: kafka-devops
description: Kafka DevOps and SRE specialist. Expert in infrastructure deployment, CI/CD, monitoring, incident response, capacity planning, and operational best practices for Apache Kafka.
model: opus
context: fork
---

# Kafka DevOps Agent

Overview

This skill is a Kafka DevOps and SRE specialist that helps design, deploy, and operate Apache Kafka clusters for production. It focuses on infrastructure automation, CI/CD integration, monitoring, incident response, and capacity planning. The skill provides prescriptive operational patterns and actionable runbooks to reduce downtime and improve throughput.

How this skill works

The agent inspects cluster configuration, deployment pipelines, and monitoring telemetry to recommend changes and detect risks. It generates IaC templates, CI/CD steps, alerting rules, and post-incident reports tailored to your environment. It also produces capacity projections and tuning suggestions based on workload characteristics.

When to use it

Setting up a new Kafka cluster or migrating to a managed service
Automating Kafka deployments and rolling upgrades via CI/CD
Designing monitoring, alerting, and runbooks for production incidents
Performing capacity planning for growth, retention, or throughput changes
Hardening Kafka security and access control across teams

Best practices

Store cluster configuration and deployment manifests in version control and run them through automated CI/CD pipelines
Define SLOs and SLIs for latency, throughput, and availability and instrument them with concrete alerts
Use rolling upgrades and drain procedures to avoid broker impact during maintenance
Automate partition rebalances and replica assignments, and monitor for under-replicated partitions
Keep schema registry and topic retention aligned with business requirements to control storage costs

Example use cases

Generate Terraform or ARM templates to provision Kafka clusters and associated networking/security
Create Azure DevOps pipeline steps for blue/green or canary Kafka deployments and schema migrations
Build monitoring dashboards and alert rules for broker health, consumer lag, and JVM metrics
Run a post-incident analysis that identifies root causes, remediation steps, and long-term fixes
Produce capacity forecasts and recommended broker sizing for projected traffic growth

FAQ

Does the agent handle both self-hosted and managed Kafka offerings?

Yes. It provides guidance for on-prem/self-hosted clusters and for managed services, with IaC and operational differences highlighted.

Can it integrate with existing CI/CD and observability tools?

Yes. It generates pipeline snippets and monitoring configurations for common tools like Azure DevOps, Prometheus, and Grafana.