home / skills / anton-abyzov / specweave / kafka-devops

kafka-devops skill

/plugins/specweave-kafka/skills/kafka-devops

This skill helps you deploy, monitor, and optimize Kafka infrastructure with best practices for CI/CD, incident response, and capacity planning.

This is most likely a fork of the sw-kafka-devops skill from openclaw
npx playbooks add skill anton-abyzov/specweave --skill kafka-devops

Review the files below or copy the command above to add this skill to your agents.

Files (1)
SKILL.md
264 B
---
name: kafka-devops
description: Kafka DevOps and SRE specialist. Expert in infrastructure deployment, CI/CD, monitoring, incident response, capacity planning, and operational best practices for Apache Kafka.
model: opus
context: fork
---

# Kafka DevOps Agent

Overview

This skill is a Kafka DevOps and SRE specialist that helps design, deploy, and operate Apache Kafka clusters for production. It focuses on infrastructure automation, CI/CD integration, monitoring, incident response, and capacity planning. The skill provides prescriptive operational patterns and actionable runbooks to reduce downtime and improve throughput.

How this skill works

The agent inspects cluster configuration, deployment pipelines, and monitoring telemetry to recommend changes and detect risks. It generates IaC templates, CI/CD steps, alerting rules, and post-incident reports tailored to your environment. It also produces capacity projections and tuning suggestions based on workload characteristics.

When to use it

  • Setting up a new Kafka cluster or migrating to a managed service
  • Automating Kafka deployments and rolling upgrades via CI/CD
  • Designing monitoring, alerting, and runbooks for production incidents
  • Performing capacity planning for growth, retention, or throughput changes
  • Hardening Kafka security and access control across teams

Best practices

  • Store cluster configuration and deployment manifests in version control and run them through automated CI/CD pipelines
  • Define SLOs and SLIs for latency, throughput, and availability and instrument them with concrete alerts
  • Use rolling upgrades and drain procedures to avoid broker impact during maintenance
  • Automate partition rebalances and replica assignments, and monitor for under-replicated partitions
  • Keep schema registry and topic retention aligned with business requirements to control storage costs

Example use cases

  • Generate Terraform or ARM templates to provision Kafka clusters and associated networking/security
  • Create Azure DevOps pipeline steps for blue/green or canary Kafka deployments and schema migrations
  • Build monitoring dashboards and alert rules for broker health, consumer lag, and JVM metrics
  • Run a post-incident analysis that identifies root causes, remediation steps, and long-term fixes
  • Produce capacity forecasts and recommended broker sizing for projected traffic growth

FAQ

Does the agent handle both self-hosted and managed Kafka offerings?

Yes. It provides guidance for on-prem/self-hosted clusters and for managed services, with IaC and operational differences highlighted.

Can it integrate with existing CI/CD and observability tools?

Yes. It generates pipeline snippets and monitoring configurations for common tools like Azure DevOps, Prometheus, and Grafana.