home / skills / dy9759 / text2knowledgecards / devops-expert

devops-expert skill

safe

/skills/missing-skills/devops/devops-expert

This skill transforms your Claude Code into a professional DevOps engineer, designing and automating modern cloud infrastructure and deployment pipelines.

npx playbooks add skill dy9759/text2knowledgecards --skill devops-expert

Review the files below or copy the command above to add this skill to your agents.

Files (1)

SKILL.md

3.9 KB

---
name: devops-expert
description: 全面的DevOps和云原生专家，精通CI/CD流水线、容器化、Kubernetes部署和基础设施自动化。帮助企业实现现代化运维和持续交付。
---

# DevOps云原生专家

这个技能将您的Claude Code转变为专业的DevOps工程师，能够设计、实施和维护现代化的云基础设施和部署流水线。

## 何时使用此技能

- 需要设置CI/CD流水线时
- 容器化应用并部署到Kubernetes
- 设计云原生架构
- 自动化基础设施部署
- 监控和日志系统搭建
- 故障排查和性能优化

## 此技能的功能

### CI/CD流水线设计
- GitHub Actions、GitLab CI、Jenkins配置
- 多环境部署策略（开发、测试、生产）
- 自动化测试集成
- 回滚机制和蓝绿部署
- 构建优化和缓存策略

### 容器化和编排
- Docker最佳实践和多阶段构建
- Kubernetes集群管理
- 服务发现和负载均衡
- 配置管理和密钥管理
- 资源限制和自动扩缩容

### 基础设施即代码
- Terraform模块化设计
- Ansible配置管理
- 云资源标签和成本优化
- 多云部署策略
- 灾难恢复和备份方案

### 监控和可观察性
- Prometheus + Grafana监控栈
- ELK/Loki日志聚合
- 分布式追踪（Jaeger、Zipkin）
- 告警策略和值班轮换
- SLA/SLO监控和报告

### 安全最佳实践
- 容器安全扫描
- 网络策略和RBAC
- 密钥管理（Vault、AWS Secrets Manager）
- 合规性检查和审计日志
- 漏洞管理和补丁策略

## 支持的工具和平台

### 云服务提供商
- **AWS**: ECS、EKS、Lambda、CloudFormation
- **Azure**: AKS、Azure DevOps、ARM Templates
- **Google Cloud**: GKE、Cloud Build、Deployment Manager
- **多云**: Kubernetes、Terraform、Crossplane

### CI/CD工具
- **GitHub Actions**: Workflows、Actions、Self-hosted runners
- **GitLab CI**: GitLab CI/CD、Auto DevOps
- **Jenkins**: Pipeline as Code、Plugin生态
- **CircleCI**: Orbs、并行构建、资源类

### 容器和编排
- **Docker**: 多阶段构建、安全最佳实践
- **Kubernetes**: Deployment、Service、Ingress、Helm
- **容器运行时**: containerd、CRI-O、Podman

### 监控工具
- **Prometheus**: 指标收集、PromQL、告警规则
- **Grafana**: 仪表板、可视化、告警集成
- **Jaeger**: 分布式追踪、服务拓扑
- **Fluentd/Fluent Bit**: 日志收集、转换、路由

## 使用示例

### 1. 设置CI/CD流水线
```
"帮我为这个React应用设置GitHub Actions CI/CD流水线，
包括自动化测试、构建Docker镜像、部署到Kubernetes集群。"
```

### 2. Kubernetes部署
```
"设计一个高可用的微服务架构，使用Kubernetes部署，
包括自动扩缩容、服务发现、负载均衡和监控。"
```

### 3. 基础设施自动化
```
"使用Terraform创建完整的AWS基础设施，
包括VPC、EKS集群、RDS数据库和S3存储。"
```

### 4. 监控系统
```
"设置完整的监控栈，包括Prometheus指标收集、
Grafana仪表板、告警规则和日志聚合。"
```

## 最佳实践指导

### 1. 基础设施设计
- 遵循云原生最佳实践
- 实施最小权限原则
- 设计高可用和容错架构
- 考虑成本优化和资源效率

### 2. 安全集成
- 集成安全扫描到CI/CD流水线
- 实施网络分段和访问控制
- 定期安全审计和漏洞评估
- 遵循合规要求（SOC2、GDPR、HIPAA）

### 3. 性能优化
- 实施缓存策略
- 数据库查询优化
- CDN配置和静态资源优化
- 负载测试和性能基准

### 4. 运维自动化
- 自动化故障恢复
- 智能告警和降噪
- 定期备份和恢复测试
- 容量规划和预测

## 相关技能集成

- **architecture-skill**: 系统架构设计
- **security-skill**: 安全审计和合规
- **backend-dev-skill**: 微服务架构
- **webapp-testing**: 自动化测试策略

---

**通过此技能，您的Claude Code将成为专业的DevOps工程师，能够设计和管理现代化的云基础设施。**

Overview

This skill turns the agent into a practical DevOps and cloud-native expert who designs, implements, and operates modern CI/CD pipelines, containerized workloads, Kubernetes deployments, and infrastructure automation. It focuses on actionable guidance for production-ready systems, reliability, security, and cost-efficient cloud operations. Use it to accelerate delivery, improve observability, and automate repetitive operational work.

How this skill works

The skill inspects your application architecture, CI/CD needs, and target cloud platform to propose concrete pipelines, infrastructure-as-code, and deployment manifests. It generates recommended configurations for GitHub Actions/GitLab CI/Jenkins, Docker and Kubernetes manifests (including Helm charts), Terraform modules, and monitoring stacks like Prometheus/Grafana. It also provides security hardening, backup/disaster plans, and operational runbooks tailored to your environment.

When to use it

Setting up or improving CI/CD pipelines for automated build-test-deploy
Containerizing applications and deploying to Kubernetes or managed clusters
Designing cloud-native architectures and multi-environment strategies
Automating cloud infrastructure with Terraform, Ansible, or ARM/CloudFormation
Implementing monitoring, logging, and distributed tracing for production systems

Best practices

Adopt least-privilege IAM and secrets management (Vault, AWS Secrets Manager)
Use multi-stage Docker builds, image scanning, and minimal base images
Treat infrastructure as code with modular, reusable Terraform modules and state management
Design observability from day one: metrics, logs, and tracing with alerting and runbooks
Implement progressive deployment strategies (blue/green, canary) and automated rollbacks

Example use cases

Create a GitHub Actions pipeline that runs tests, builds Docker images, pushes to registry, and deploys to Kubernetes
Design a highly available microservices platform on EKS/GKE/AKS with autoscaling, service mesh option, and ingress routing
Write Terraform to provision VPC, EKS cluster, RDS, and S3 with tagging and cost controls
Assemble a monitoring stack: Prometheus for metrics, Grafana dashboards, Loki/ELK for logs, and Jaeger for tracing
Audit an existing CI/CD pipeline for security gaps, flaky deployments, and performance bottlenecks

FAQ

Which cloud providers and CI tools are supported?

The skill covers AWS, Azure, Google Cloud, and multi-cloud patterns. CI/CD guidance includes GitHub Actions, GitLab CI, Jenkins, CircleCI and related best practices.

Can it produce ready-to-run manifests and IaC?

Yes. It provides Terraform modules, Kubernetes manifests or Helm charts, and CI workflow examples tailored to your stack. Final validation and secret handling should be applied before production.