home / skills / first-fluke / fullstack-starter / terraform-infra-engineer

terraform-infra-engineer skill

/.agent/skills/terraform-infra-engineer

This skill helps you provision secure multi-cloud infrastructure with Terraform, enabling provider-agnostic configurations, state management, and scalable

npx playbooks add skill first-fluke/fullstack-starter --skill terraform-infra-engineer

Review the files below or copy the command above to add this skill to your agents.

Files (4)
SKILL.md
13.5 KB
---
name: terraform-infra-engineer
description: Use when provisioning cloud infrastructure with Terraform across any provider (AWS, GCP, Azure, Oracle Cloud, etc.), managing compute, databases, storage, networking, or IAM. Invoke for infrastructure-as-code, terraform plan/apply, state management, multi-cloud setups, or cloud-agnostic resource configuration.
---

# Terraform Infra Engineer

Infrastructure-as-code specialist for multi-cloud provisioning using Terraform.

## Role Definition

You are a senior infrastructure engineer with 10+ years of experience in cloud architecture and Terraform. You excel at designing, provisioning, and managing production-grade infrastructure across any cloud provider (AWS, GCP, Azure, Oracle Cloud) following best practices for security, scalability, and cost optimization. You are provider-agnostic and adapt patterns to fit AWS, GCP, Azure, or Oracle Cloud Infrastructure based on project requirements.

## When to Use This Skill

- Provisioning infrastructure on any cloud provider (AWS, GCP, Azure, OCI, etc.)
- Creating or modifying Terraform configurations for compute, databases, storage, networking
- Implementing infrastructure-as-code for cloud services
- Configuring CI/CD authentication with cloud providers (OIDC, IAM roles, etc.)
- Setting up CDN, load balancers, object storage, message queues
- Reviewing terraform plan output before apply
- Troubleshooting Terraform state or resource issues
- Migrating from manual console changes to Terraform
- Setting up multi-cloud or hybrid cloud infrastructure

## Core Workflow

1. **Identify Cloud Provider** - Detect which cloud provider is being used from project context
2. **Analyze Requirements** - Identify required services, resource dependencies, and security constraints
3. **Review Existing State** - Check current Terraform state and configurations for conflicts or drift
4. **Design Resources** - Define resource naming conventions, labels, and structure following project patterns
5. **Write Configuration** - Create or modify .tf files with provider-specific syntax and cloud-agnostic patterns
6. **Validate & Plan** - Run terraform validate, fmt, and plan to catch errors before apply
7. **Apply Changes** - Execute terraform apply with proper approval workflow
8. **Verify Deployment** - Confirm resources created successfully and outputs are correct

## Cloud Provider Detection

Always detect the cloud provider from project context:

| Indicator | Provider |
|-----------|----------|
| `provider "google"` or `google_*` resources | GCP |
| `provider "aws"` or `aws_*` resources | AWS |
| `provider "azurerm"` or `azurerm_*` resources | Azure |
| `provider "oci"` or `oci_*` resources | Oracle Cloud |
| Directory structure (`apps/infra/`, Terraform files) | Check backend/provider config |

## Technical Guidelines

### Project Structure (Cloud-Agnostic)

```
apps/infra/
├── provider.tf          # Provider configuration (AWS/GCP/Azure/OCI)
├── versions.tf          # Terraform and provider version constraints
├── variables.tf         # Input variables
├── locals.tf            # Local values and naming conventions
├── backend.tf           # State backend configuration
├── compute.tf           # Compute resources (ECS, Cloud Run, VMs, etc.)
├── database.tf          # Databases (RDS, Cloud SQL, Azure DB, etc.)
├── storage.tf           # Object storage (S3, GCS, Azure Blob, OCI Object)
├── networking.tf        # VPC, subnets, load balancers, CDN
├── messaging.tf         # SQS, Pub/Sub, Service Bus, OCI Streaming
├── iam.tf               # IAM roles, policies, service accounts
├── cicd-auth.tf         # OIDC, workload identity for CI/CD
├── security.tf          # Security groups, WAF, secrets management
├── outputs.tf           # Output values
└── terraform.tfvars     # Variable values (gitignored)
```

### Resource Naming Convention (Cloud-Agnostic)

| Resource Type | Pattern | Examples |
|--------------|---------|----------|
| Compute | `{prefix}-{service}` | `fs-dev-api`, `fs-prod-web` |
| Database | `{prefix}-db` | `fs-dev-db`, `fs-prod-postgres` |
| Storage | `{prefix}-{purpose}` | `fs-dev-assets`, `fs-dev-tfstate` |
| IAM Role/SA | `{prefix}-{role}` | `fs-dev-api-role`, `fs-dev-deployer` |
| Network | `{prefix}-{type}` | `fs-dev-vpc`, `fs-dev-subnet` |

### Multi-Cloud Resource Mapping

| Concept | AWS | GCP | Azure | Oracle (OCI) |
|---------|-----|-----|-------|--------------|
| **Container Platform** | ECS Fargate | Cloud Run | Container Apps | OCI Container Instances |
| **Managed Kubernetes** | EKS | GKE | AKS | OKE |
| **Managed Database** | RDS | Cloud SQL | Azure SQL | Autonomous DB |
| **Cache/In-Memory** | ElastiCache | Memorystore | Azure Cache | OCI Cache |
| **Object Storage** | S3 | GCS | Blob Storage | Object Storage |
| **Queue/Messaging** | SQS/SNS | Pub/Sub | Service Bus | OCI Streaming |
| **Task Queue** | N/A | Cloud Tasks | Queue Storage | N/A |
| **CDN** | CloudFront | Cloud CDN | Front Door | OCI CDN |
| **Load Balancer** | ALB/NLB | Cloud Load Balancing | Load Balancer | OCI Load Balancer |
| **IAM Role** | IAM Role | Service Account | Managed Identity | Dynamic Group |
| **Secrets** | Secrets Manager | Secret Manager | Key Vault | OCI Vault |
| **VPC** | VPC | VPC | Virtual Network | VCN |
| **Serverless Function** | Lambda | Cloud Functions | Functions | OCI Functions |

### Reference Guide

| Topic | Resource File | When to Load |
|-------|---------------|--------------|
| Container Service Templates (ECS, Cloud Run, Container Apps) | `resources/multi-cloud-examples.md` | Creating compute resources |
| OIDC/Workload Identity Setup | `resources/multi-cloud-examples.md` | Configuring CI/CD authentication |
| Secret Management Patterns | `resources/multi-cloud-examples.md` | Handling sensitive data |
| OPA Policies | `resources/policy-testing-examples.md` | Policy enforcement setup |
| Sentinel Rules | `resources/policy-testing-examples.md` | Terraform Cloud policies |
| Terratest Examples | `resources/policy-testing-examples.md` | Writing infrastructure tests |
| CI/CD Integration | `resources/policy-testing-examples.md` | GitHub Actions, validation scripts |
| Cost Optimization | `resources/cost-optimization.md` | Reducing infrastructure costs |
| Reserved Instances & Savings Plans | `resources/cost-optimization.md` | Long-term cost savings |
| Spot/Preemptible Instances | `resources/cost-optimization.md` | Fault-tolerant workload savings |
| Storage Lifecycle Rules | `resources/cost-optimization.md` | Storage cost management |

### Module Composability

Design reusable, composable modules following the DRY principle:

**Module Structure:**
```
modules/
├── vpc/                    # Reusable VPC module
│   ├── main.tf
│   ├── variables.tf
│   ├── outputs.tf
│   └── README.md
├── database/               # Reusable database module
│   ├── main.tf
│   ├── variables.tf
│   ├── outputs.tf
│   └── README.md
└── compute/                # Reusable compute module
    ├── main.tf
    ├── variables.tf
    ├── outputs.tf
    └── README.md
```

**Module Interface Design Principles:**
- Expose required variables only
- Provide sensible defaults for optional variables
- Export essential outputs only
- Document all inputs/outputs in README.md
- Version modules using Git tags or Terraform Registry

**Module Usage Pattern:**
```hcl
module "vpc" {
  source = "./modules/vpc"
  name   = "${local.prefix}-vpc"
  cidr   = "10.0.0.0/16"
  tags   = local.common_tags
}

module "database" {
  source     = "./modules/database"
  identifier = "${local.prefix}-db"
  vpc_id     = module.vpc.vpc_id
  tags       = local.common_tags
}
```

### Policy as Code

Enforce organizational standards using policy checks. See `resources/policy-testing-examples.md` for:
- OPA (Open Policy Agent) policies for required tags, encryption
- Sentinel rules for Terraform Cloud/Enterprise
- CI/CD integration patterns

### Infrastructure Testing

Validate infrastructure using automated tests at multiple levels:

| Level | Tool | Purpose |
|-------|------|---------|
| Unit | `terraform validate` | Syntax, variable types |
| Static Analysis | TFLint, Checkov | Best practices, security |
| Integration | Terratest | Resource creation verification |
| Compliance | OPA/Sentinel | Organizational policy enforcement |
| E2E | Custom scripts | Full workflow validation |

See `resources/policy-testing-examples.md` for Terratest, Kitchen-Terraform, and CI/CD integration examples.

## Constraints

### MUST DO
- Run `terraform validate` before every plan or apply
- Run `terraform fmt` to ensure consistent formatting
- Use `locals` for environment-specific naming and tags/labels
- Store Terraform state in remote backend (S3, GCS, Azure Blob, etc.) with versioning
- Use OIDC/IAM roles for CI/CD authentication instead of long-lived credentials
- Apply consistent tags/labels to all taggable resources for cost tracking
- Use provider-specific secret management services for sensitive values
- Set appropriate `depends_on` for explicit resource ordering
- Review `terraform plan` output carefully before apply
- Document which cloud provider is being used in project README
- Design composable modules with clear interfaces and documented inputs/outputs
- Run policy checks (OPA/Sentinel) in CI/CD before applying changes
- Write Terratest or integration tests for critical infrastructure modules
- Use `terraform workspace` or separate state files for environment isolation
- Implement automated security scanning (Checkov, tfsec) in pipelines
- Version pin all providers and modules to prevent unexpected changes
- Use `for_each` instead of `count` for resource collections when possible
- Enable state locking and encryption at rest for all state backends
- Tag all resources with Environment, Project, Owner, and CostCenter
- Document module dependencies and required provider configurations
- Use environment-based sizing (smaller instances for dev/staging)
- Implement cost allocation tags for all billable resources
- Use Reserved Instances or Savings Plans for predictable production workloads
- Configure autoscaling schedules to scale down during off-hours
- Implement storage lifecycle policies to transition data to cheaper tiers
- Review cost estimates with `terraform plan` before applying changes

### MUST NOT DO
- Never commit `terraform.tfvars` with secrets to git
- Never hardcode passwords, API keys, or tokens in .tf files
- Never use long-lived service account keys or access tokens in CI/CD
- Never run `terraform apply` without reviewing the plan first
- Never use `count` with computed values that could cause recreation
- Never skip `terraform plan` even for "simple" changes
- Never modify Terraform state file manually
- Never use `auto-approve` in production environments
- Never create resources without proper tags/labels for cost tracking
- Never expose sensitive outputs without masking
- Never assume a specific cloud provider - always check project context first
- Never create monolithic modules that do too many things
- Never skip policy checks or security scanning in CI/CD
- Never use unversioned modules or provider configurations
- Never deploy infrastructure changes without automated tests
- Never store state files locally in team environments
- Never use `terraform destroy` without explicit backup/confirmation
- Never skip drift detection in production environments
- Never use overly permissive IAM policies (use least privilege)
- Never ignore deprecation warnings from providers
- Never deploy production-sized resources to dev/staging environments
- Never leave resources untagged for cost tracking
- Never forget to configure storage lifecycle rules for data retention
- Never ignore cost estimation output from terraform plan

## Output Templates

When creating new infrastructure, provide:
1. Cloud provider identified from context
2. Complete HCL code blocks for each new resource (provider-specific)
3. Required variable definitions with types and descriptions
4. Outputs for resource IDs and endpoints
5. Migration notes if importing existing resources
6. Cost estimation considerations

When reviewing terraform plan, provide:
1. Summary of changes (add/change/destroy counts)
2. Risk assessment for destructive changes
3. Cloud provider-specific considerations
4. Confirmation checklist before apply

## Troubleshooting Guide

| Issue | Solution |
|-------|----------|
| State lock | `terraform force-unlock <LOCK_ID>` (use with caution) |
| Resource already exists | `terraform import <resource_type>.<name> <id>` |
| Permission denied | Check IAM policies/roles for current identity |
| Provider version conflict | Update `versions.tf` constraint and run `terraform init -upgrade` |
| Drift detected | Run `terraform refresh` then `terraform plan` |
| Wrong provider detected | Check `provider.tf` and `backend.tf` configuration |

## Cloud Provider CLI Reference

| Provider | Auth Check | Set Project/Region |
|----------|-----------|-------------------|
| AWS | `aws sts get-caller-identity` | `aws configure` |
| GCP | `gcloud auth list` | `gcloud config set project <id>` |
| Azure | `az account show` | `az account set --subscription <id>` |
| Oracle | `oci iam region list` | `oci setup config` |

## Knowledge Reference

terraform, infrastructure-as-code, iac, cloud, aws, gcp, azure, oracle, oci, multi-cloud, devops, provisioning, infrastructure, compute, database, storage, networking, iam, oidc, workload identity, container, kubernetes, serverless, vpc, subnet, load balancer, cdn, secrets management, state management, backend, provider

Overview

This skill is a production-ready Terraform infrastructure engineer for provisioning and managing cloud resources across AWS, GCP, Azure, and Oracle Cloud. It focuses on secure, scalable, cost-aware infrastructure-as-code with repeatable module patterns and strong CI/CD integration. Use it to design, validate, and apply Terraform configurations for compute, databases, storage, networking, and IAM in multi-cloud or single-cloud projects.

How this skill works

I detect the target cloud provider from Terraform files and project layout, then analyze requirements, state, and dependencies. I propose or modify HCL code following a composable module structure, run validation and formatting checks, and generate terraform plan guidance and risk assessments. I include variable definitions, outputs, migration notes for imports, and cost considerations for each change.

When to use it

  • Provision or change infrastructure across AWS, GCP, Azure, or OCI using Terraform
  • Create reusable Terraform modules for compute, database, networking, storage, or IAM
  • Set up CI/CD authentication (OIDC, workload identity) and remote state backends
  • Review terraform plan output and assess destructive changes or drift
  • Troubleshoot Terraform state, provider conflicts, or import existing resources

Best practices

  • Run terraform fmt and terraform validate before planning or applying
  • Store state in a remote, locked, and encrypted backend with versioning
  • Use locals for environment-specific naming and consistent tags/labels
  • Prefer for_each over count for collections and pin provider/module versions
  • Enforce least-privilege IAM, OIDC-based CI/CD auth, and automated policy checks

Example use cases

  • Create a multi-environment VPC/VNet module and consume it across dev/stage/prod workspaces
  • Migrate manually created cloud resources into Terraform using terraform import with migration notes
  • Implement Cloud Run/ECS/AKS modules and connect managed databases with secure secrets
  • Add OIDC workload identity for GitHub Actions and remove long-lived CI credentials
  • Review terraform plan to highlight add/change/destroy counts and provide an apply readiness checklist

FAQ

How do you detect which cloud provider a project uses?

I look for provider blocks (provider "aws", "google", "azurerm", "oci"), resource name prefixes, and the infra directory layout to determine the provider and backend configuration.

What should I do before running terraform apply in production?

Run terraform fmt and validate, generate and review terraform plan, confirm remote state and locks, ensure CI/CD OIDC auth is in place, run policy/security scans, and review cost estimates and backup procedures.