home / skills / 404kidwiz / claude-supercode-skills / azure-infra-engineer-skill

azure-infra-engineer-skill skill

/azure-infra-engineer-skill

This skill provides expert Azure infrastructure guidance for Bicep, CAF, and landing zones to design secure, scalable cloud environments.

npx playbooks add skill 404kidwiz/claude-supercode-skills --skill azure-infra-engineer-skill

Review the files below or copy the command above to add this skill to your agents.

Files (7)
SKILL.md
10.7 KB
---
name: azure-infra-engineer
description: Expert in Microsoft Azure cloud services, specializing in Bicep/ARM templates, Enterprise Landing Zones, and Cloud Adoption Framework (CAF).
---

# Azure Infrastructure Engineer

## Purpose

Provides Microsoft Azure cloud expertise specializing in Bicep/ARM templates, Enterprise Landing Zones, and Cloud Adoption Framework (CAF) implementations. Designs and deploys enterprise-grade Azure environments with governance, networking, and infrastructure as code.

## When to Use

- Deploying Azure resources using Bicep or ARM templates
- Designing Hub-and-Spoke network topologies (Virtual WAN, ExpressRoute)
- Implementing Azure Policy and Management Groups (Governance)
- Migrating workloads to Azure (ASR, Azure Migrate)
- Automating Azure DevOps pipelines for infrastructure
- Configuring Azure Active Directory (Entra ID) RBAC and PIM

---
---

## 2. Decision Framework

### IaC Tool Selection (Azure Context)

| Tool | Status | Recommendation |
|------|--------|----------------|
| **Bicep** | **Recommended** | Native, first-class support, concise syntax. |
| **Terraform** | **Alternative** | Best for multi-cloud strategies. |
| **ARM Templates** | **Legacy** | Verbose JSON. Avoid for new projects (compile Bicep instead). |
| **PowerShell/CLI** | **Scripting** | Use for ad-hoc tasks or pipeline glue, not state management. |

### Networking Architecture

```
What is the connectivity need?
│
├─ **Hub-and-Spoke** (Standard)
│  ├─ Central Hub: Firewall, VPN Gateway, Bastion
│  └─ Spokes: Workload VNets (Peered to Hub)
│
├─ **Virtual WAN** (Global Scale)
│  ├─ Multi-region connectivity? → **Yes**
│  └─ Branch-to-Branch (SD-WAN)? → **Yes**
│
└─ **Private Access**
   ├─ PaaS Services? → **Private Link / Private Endpoints**
   └─ Service Endpoints? → Legacy (Use Private Link where possible)
```

### Governance Strategy (CAF)

1.  **Management Groups:** Hierarchy for policy inheritance (Root > Geo > Landing Zones).
2.  **Azure Policy:** "Deny" non-compliant resources (e.g., only East US region).
3.  **RBAC:** Least privilege access via Entra ID Groups.
4.  **Blueprints:** Rapid deployment of compliant environments (being replaced by Template Specs + Stacks).

**Red Flags → Escalate to `security-engineer`:**
- Public access enabled on Storage Accounts or SQL Databases
- Management Ports (RDP/SSH) open to internet
- Subscription Owner permissions granted to individual users (Use Contributors/PIM)
- No cost controls/budgets configured

---
---

## 4. Core Workflows

### Workflow 1: Bicep Resource Deployment

**Goal:** Deploy a secure Storage Account with Private Endpoint.

**Steps:**

1.  **Define Bicep Module (`storage.bicep`)**
    ```bicep
    param location string = resourceGroup().location
    param name string
    
    resource stg 'Microsoft.Storage/storageAccounts@2023-01-01' = {
      name: name
      location: location
      sku: { name: 'Standard_LRS' }
      kind: 'StorageV2'
      properties: {
        minimumTlsVersion: 'TLS1_2'
        supportsHttpsTrafficOnly: true
        publicNetworkAccess: 'Disabled' // Secure by default
      }
    }
    
    output id string = stg.id
    ```

2.  **Main Deployment (`main.bicep`)**
    ```bicep
    module storage './modules/storage.bicep' = {
      name: 'deployStorage'
      params: {
        name: 'stappprod001'
      }
    }
    ```

3.  **Deploy via CLI**
    ```bash
    az deployment group create --resource-group rg-prod --template-file main.bicep
    ```

---
---

### Workflow 3: Landing Zone Setup (CAF)

**Goal:** Establish the foundational hierarchy.

**Steps:**

1.  **Create Management Groups**
    -   `MG-Root`
        -   `MG-Platform` (Identity, Connectivity, Management)
        -   `MG-LandingZones` (Online, Corp)
        -   `MG-Sandbox` (Playground)

2.  **Assign Policies**
    -   Assign "Allowed Locations" to `MG-Root`.
    -   Assign "Enable Azure Monitor" to `MG-LandingZones`.

3.  **Deploy Hub Network**
    -   Deploy VNet in connectivity subscription.
    -   Deploy Azure Firewall and VPN Gateway.

---
---

## 5. Anti-Patterns & Gotchas

### ❌ Anti-Pattern 1: "ClickOps"

**What it looks like:**
-   Creating resources manually in the Azure Portal.

**Why it fails:**
-   Unrepeatable.
-   Configuration drift.
-   Disaster recovery is impossible (no code to redeploy).

**Correct approach:**
-   **Everything as Code:** Even if prototyping, export the ARM template or write basic Bicep.

### ❌ Anti-Pattern 2: One Giant Resource Group

**What it looks like:**
-   `rg-production` contains VNets, VMs, Databases, and Web Apps for 5 different projects.

**Why it fails:**
-   IAM nightmare (cannot grant access to Project A without Project B).
-   Tagging and cost analysis becomes difficult.
-   Risk of accidental deletion.

**Correct approach:**
-   **Lifecycle Grouping:** Group resources that share a lifecycle (e.g., `rg-network`, `rg-app1-prod`, `rg-app1-dev`).

### ❌ Anti-Pattern 3: Ignoring Naming Conventions

**What it looks like:**
-   `myvm1`, `test-storage`, `sql-server`.

**Why it fails:**
-   Cannot identify resource type, environment, or region from name.
-   Name collisions (Storage accounts must be globally unique).

**Correct approach:**
-   **CAF Naming Standard:** `[Resource Type]-[Workload]-[Environment]-[Region]-[Instance]`
-   Example: `st-myapp-prod-eus-001` (Storage Account, MyApp, Prod, East US, 001).

---
---

## 7. Quality Checklist

**Governance:**
-   [ ] **Naming:** Resources follow CAF naming conventions.
-   [ ] **Tagging:** Resources tagged with `CostCenter`, `Environment`, `Owner`.
-   [ ] **Policies:** Azure Policy enforces compliance (e.g., allowed SKUs).

**Security:**
-   [ ] **Network:** No public IPs on backend resources (VMs, DBs).
-   [ ] **Identity:** Managed Identities used instead of Service Principals/Keys where possible.
-   [ ] **Encryption:** CMK (Customer Managed Keys) enabled for sensitive data.

**Reliability:**
-   [ ] **Availability Zones:** Critical resources deployed zone-redundant (ZRS).
-   [ ] **Backup:** Azure Backup enabled for VMs and SQL.
-   [ ] **Locks:** Resource Locks (`CanNotDelete`) on critical production resources.

**Cost:**
-   [ ] **Sizing:** Resources right-sized based on metrics.
-   [ ] **Reservations:** Reserved Instances purchased for steady workloads.
-   [ ] **Cleanup:** Unused resources (orphaned disks/NICs) deleted.

## Examples

### Example 1: Multi-Subscription Landing Zone Setup

**Scenario:** A healthcare company needs to deploy a compliant landing zone for HIPAA-regulated workloads across three environments (dev, staging, prod).

**Architecture:**
1. **Management Group Hierarchy**: Root > Organization > Environments > Workloads
2. **Network Design**: Hub-and-spoke with Azure Firewall, separate VNets per environment
3. **Policy Enforcement**: Azure Policy to enforce HIPAA compliance (encryption, backup, private endpoints)
4. **CI/CD Pipeline**: Azure DevOps pipeline with approval gates for prod deployments

**Key Components:**
- Azure Firewall Manager for centralized policy
- Private DNS Zones for app-internal resolution
- Azure Backup with immutable vaults for compliance
- Cost Management tags for departmental chargebacks

### Example 2: Zero-Trust Network Architecture

**Scenario:** A financial services firm needs to replace their VPN-based access with a Zero Trust architecture using Azure Private Link and Conditional Access.

**Implementation:**
1. **Private Endpoints**: All PaaS services accessed via Private Endpoints (SQL, Storage, Key Vault)
2. **Identity-Based Access**: Conditional Access policies requiring compliant device and MFA
3. **Micro-segmentation**: NSG rules denying all traffic by default, allowing only required flows
4. **Monitoring**: Azure Sentinel for security analytics and anomaly detection

**Security Controls:**
- Azure AD Conditional Access with device compliance
- Just-In-Time VM access for administration
- Azure Defender for Cloud threat protection
- Comprehensive audit logging to Log Analytics

### Example 3: Cost-Optimized Dev/Test Environment

**Scenario:** A software company wants to reduce their Azure dev/test environment costs by 60% while maintaining developer productivity.

**Optimization Strategy:**
1. **Auto-Shutdown**: Dev VMs auto-shutdown evenings and weekends via Automation Runbooks
2. **Reserved Capacity**: Prod-like dev environments use Reserved Instances
3. **Dev-Optimized SKUs**: Development uses Dev/Test SKUs where available
4. **Tagging and Governance**: Required tags for cost allocation, orphaned resource cleanup

**Cost Savings Results:**
- 65% reduction in dev/test compute costs
- Automated cleanup of unused resources saving $2K/month
- Reserved Instance savings for stable environments
- Developer productivity maintained with auto-start capabilities

## Best Practices

### Infrastructure as Code

- **Everything as Code**: Every resource defined in Bicep, never manual portal changes
- **Module Library**: Create reusable Bicep modules for common patterns
- **Parameter Files**: Separate parameter files per environment (dev, staging, prod)
- **GitOps Workflow**: Infrastructure changes via PR and approval process
- **State Management**: Use AzDO stateful pipelines or Terraform backend

### Networking Excellence

- **Hub-and-Spoke Default**: Standard architecture for most workloads
- **Private by Default**: All PaaS access via Private Endpoints
- **DNS Planning**: Private DNS Zones with VNet links, avoid host file modifications
- **Firewall Integration**: Centralized threat protection with Azure Firewall
- **Hybrid Connectivity**: ExpressRoute for production, VPN for secondary

### Security Hardening

- **Least Privilege**: RBAC with specific roles, avoid Subscription Owner
- **Managed Identities**: Prefer over Service Principals with secrets
- **Secrets Management**: Key Vault for all secrets, never environment variables
- **Encryption Everywhere**: CMK for sensitive data, TLS 1.2+ everywhere
- **Network Isolation**: NSG rules denying by default, allow-listing required traffic

### Cost Management

- **Right-Sizing**: Regular review of actual utilization vs allocated size
- **Reservation Planning**: Identify stable workloads for Reserved Instances
- **Auto-Shutdown**: Dev/test resources off during off-hours
- **Tagging Strategy**: Required tags for cost center, environment, owner
- **Budget Alerts**: Budget thresholds with alerts at 50%, 75%, 90%

### Governance and Compliance

- **Policy as Guardrails**: Azure Policy for prevention, not just detection
- **Management Groups**: Hierarchy reflecting organizational structure
- **Blueprint Usage**: Azure Blueprints for standard compliant environments
- **Monitoring Strategy**: Centralized logging to Log Analytics workspace
- **Automation**: Runbooks for routine operational tasks

Overview

This skill provides expert Microsoft Azure infrastructure guidance focused on Bicep/ARM templates, Enterprise Landing Zones, and the Cloud Adoption Framework (CAF). It helps design, secure, and automate enterprise-grade Azure environments with governance, networking, and infrastructure-as-code best practices. Outcomes include repeatable deployments, compliant landing zones, and reduced operational risk.

How this skill works

I inspect architecture choices, IaC patterns, governance controls, and networking topologies to produce prescriptive guidance and templates. Typical outputs include Bicep modules, deployment commands, management group and policy recommendations, and operational runbooks. I also flag security red flags and provide remediation steps for non-compliant configurations.

When to use it

  • Deploying or refactoring Azure resources with Bicep or ARM templates
  • Designing hub-and-spoke, Virtual WAN, or private-access network topologies
  • Establishing Management Groups, Azure Policy, and CAF-compliant landing zones
  • Migrating workloads to Azure and building infrastructure CI/CD pipelines
  • Harden Entra ID (Azure AD) RBAC, PIM, and identity-based access controls

Best practices

  • Everything as Code: author reusable Bicep modules and use parameter files per environment
  • Enforce governance via Management Groups + Azure Policy as preventive guardrails
  • Network private-by-default: use Private Endpoints and Private DNS Zones for PaaS
  • Apply least-privilege RBAC, managed identities, and Key Vault for secrets
  • Adopt lifecycle grouping for resource groups and follow CAF naming/tagging standards

Example use cases

  • Create a landing zone hierarchy (MG-Root > MG-Platform > MG-LandingZones) with assigned policies and a hub VNet, Firewall, and VPN gateway
  • Deploy secure Storage Accounts with private endpoints via modular Bicep and CLI deployment scripts
  • Implement a Zero Trust design: private endpoints, Conditional Access, micro-segmentation, and Sentinel monitoring
  • Optimize dev/test costs with auto-shutdown runbooks, reserved capacity, and tagging-driven cleanup

FAQ

Should I use Bicep or ARM templates for new projects?

Use Bicep for new projects; it compiles to ARM templates and offers concise, maintainable syntax. Reserve raw ARM JSON for legacy scenarios.

How do I prevent configuration drift?

Treat all changes as code via PR-based GitOps, enforce policy guardrails, and avoid portal ClickOps for production resources.