home / skills / first-fluke / fullstack-starter / devops-iac-engineer
This skill guides designing and operating cloud infrastructure with IaC, CI/CD, observability, and SRE practices for reliable deployments.
npx playbooks add skill first-fluke/fullstack-starter --skill devops-iac-engineerReview the files below or copy the command above to add this skill to your agents.
---
name: devops-iac-engineer
description: Expert guidance for designing, implementing, and maintaining cloud infrastructure using Experience in Infrastructure as Code (IaC) principles. Use this skill for architecting cloud solutions, setting up CI/CD pipelines, implementing observability, and following SRE best practices.
---
# DevOps IaC Engineer
This skill provides expertise in designing and managing cloud infrastructure using Infrastructure as Code (IaC) and DevOps/SRE best practices.
## When to Use
- Designing cloud architecture (AWS, GCP, Azure)
- Implementing or refactoring CI/CD pipelines
- Setting up observability (logging, metrics, tracing)
- Creating Kubernetes clusters and container orchestration strategies
- Implementing security controls and compliance checks
- Improving system reliability (SLO/SLA, Disaster Recovery)
## Infrastructure as Code (IaC) Principles
- **Declarative Code**: Use Terraform/OpenTofu to define the desired state.
- **GitOps**: Code repository is the single source of truth. Changes are applied via PRs and automated pipelines.
- **Immutable Infrastructure**: Replace servers/containers rather than patching them in place.
## Core Domains
### 1. Terraform & IaC
- Use modules for reusability.
- Separate state by environment (dev, stage, prod) and region.
- Automate `plan` and `apply` in CI/CD.
### 2. Kubernetes & Containers
- Build small, stateless containers.
- Use Helm or Kustomize for resource management.
- Implement resource limits and requests.
- Use namespaces for isolation.
### 3. CI/CD Pipelines
- **CI**: Lint, test, build, and scan (security) on every commit.
- **CD**: Automated deployment to lower environments; manual approval for production.
- Use tools like GitHub Actions, Cloud Build, or ArgoCD.
### 4. Observability
- **Logs**: Centralized logging (e.g., Cloud Logging, ELK).
- **Metrics**: Prometheus/Grafana or Cloud Monitoring.
- **Tracing**: OpenTelemetry for distributed tracing.
### 5. Security (DevSecOps)
- Scan IaC for misconfigurations (e.g., Checkov, Trivy).
- Manage secrets utilizing Secret Manager or Vault (never in code).
- Least privilege IAM roles.
## SRE Practices
- **SLI/SLO**: Define Service Level Indicators and Objectives for critical user journeys.
- **Error Budgets**: Use error budgets to balance innovation and reliability.
- **Post-Mortems**: Conduct blameless post-mortems for incidents.
This skill provides hands-on guidance for designing, implementing, and operating cloud infrastructure using Infrastructure as Code (IaC) and SRE/DevOps best practices. It focuses on practical patterns for Terraform, Kubernetes, CI/CD, observability, and security across cloud providers. Use it to move from ad hoc scripts to reproducible, production-ready infrastructure and pipelines.
The skill inspects your architecture goals and current practices, then recommends declarative IaC patterns, module boundaries, and state isolation strategies. It outlines CI/CD flows that automate plan/apply, security scans, and promotion gates, and it prescribes observability and SRE controls like SLOs, error budgets, and incident processes. It includes concrete implementation advice for Terraform, container orchestration, and secret management.
Which IaC tool should I choose, Terraform or OpenTofu?
Choose the tool with the community and provider support you need; both follow declarative patterns and modules. Prefer the ecosystem that matches your organizational requirements and CI integrations.
How do I manage secrets without exposing them in code?
Use a dedicated secret manager or Vault, grant minimal access via IAM roles, and inject secrets at runtime through the CI/CD or orchestration platform.