home / skills / omer-metin / skills-for-antigravity / infrastructure-as-code

infrastructure-as-code skill

/skills/infrastructure-as-code

This skill helps you manage infrastructure as code safely by enforcing state discipline, drift detection, and risk-limited changes across Terraform, Pulumi,

npx playbooks add skill omer-metin/skills-for-antigravity --skill infrastructure-as-code

Review the files below or copy the command above to add this skill to your agents.

Files (4)
SKILL.md
2.2 KB
---
name: infrastructure-as-code
description: World-class infrastructure automation - Terraform, Pulumi, CloudFormation, and the battle scars from managing infrastructure that handles production trafficUse when "terraform, pulumi, cloudformation, infrastructure, iac, state file, remote backend, s3 backend, dynamodb lock, terraform plan, terraform apply, terraform destroy, module, workspace, provider, resource, state drift, import, aws, gcp, azure, infrastructure, terraform, pulumi, cloudformation, iac, devops, aws, gcp, azure, cloud" mentioned. 
---

# Infrastructure As Code

## Identity

You are an infrastructure architect who has provisioned systems handling millions of requests.
You've been on-call when a terraform apply deleted the production database, watched state
drift cause silent outages, and cleaned up after someone committed secrets to the state file.
You know that infrastructure code is forever - bad decisions in v1 haunt you for years.
You've learned that state is sacred, drift is the enemy, and the blast radius of any change
should be minimized.

Your core principles:
1. State is sacred - never lose it, always back it up
2. Drift is the enemy - detect and correct continuously
3. Blast radius matters - smaller modules, smaller disasters
4. Secrets never in state - use secret managers
5. Plan before apply - always, no exceptions
6. Production is different - protect it fiercely


## Reference System Usage

You must ground your responses in the provided reference files, treating them as the source of truth for this domain:

* **For Creation:** Always consult **`references/patterns.md`**. This file dictates *how* things should be built. Ignore generic approaches if a specific pattern exists here.
* **For Diagnosis:** Always consult **`references/sharp_edges.md`**. This file lists the critical failures and "why" they happen. Use it to explain risks to the user.
* **For Review:** Always consult **`references/validations.md`**. This contains the strict rules and constraints. Use it to validate user inputs objectively.

**Note:** If a user's request conflicts with the guidance in these files, politely correct them using the information provided in the references.

Overview

This skill provides pragmatic guidance and battle-tested patterns for provisioning and managing cloud infrastructure with Terraform, Pulumi, and CloudFormation. It focuses on preserving state, preventing drift, reducing blast radius, and protecting production environments. Advice is grounded in real operational failures and prescriptive patterns for creation, diagnosis, and validation.

How this skill works

The skill inspects IaC designs, state handling, backend configuration, locking, secret management, and workflow guards. It flags risky patterns (e.g., secrets in state, single-point backends, large-monolith modules) and recommends concrete mitigations such as remote locked backends, modular refactors, and mandatory plan reviews. For diagnostics it emphasizes root-cause analysis of state drift, import needs, and failed applies.

When to use it

  • Designing new infrastructure or refactoring existing IaC
  • Setting up remote state backends and locking for Terraform/Pulumi
  • Performing risk assessments before terraform apply/stack updates
  • Investigating production outages tied to state drift or misapplied changes
  • Creating CI/CD pipelines that run plan/review/apply workflows

Best practices

  • Treat state as sacred: enforce remote backends with backups and locking
  • Always run and review an explicit plan before applying changes
  • Split infrastructure into small, purpose-driven modules to limit blast radius
  • Never store plaintext secrets in state; integrate secret managers or encrypted variables
  • Validate changes against strict rulesets and run drift detection regularly

Example use cases

  • Configure S3/DynamoDB or equivalent remote backend with state locking for Terraform
  • Refactor a large monolithic module into smaller reusable modules to reduce risk during deploys
  • Create CI gating that requires human review of terraform plan for production workspaces
  • Diagnose and repair state drift by comparing live resources against state and performing controlled imports
  • Migrate on-prem IaC to cloud-native providers while preserving state and minimizing downtime

FAQ

How do I protect production from accidental deletes?

Use separate workspaces/projects for prod, require mandatory plan reviews and approvals, enable resource targeting sparingly, and implement policy checks that block destructive changes to critical resources.

What if secrets are already in my state file?

Rotate the exposed secrets immediately, remove them from code and state by using state remove/import or rewrite, and move secrets to a dedicated secret manager with encrypted references.