home / skills / williamzujkowski / cognitive-toolworks / cloud-kubernetes-integrator

cloud-kubernetes-integrator skill

/skills/cloud-kubernetes-integrator

This skill guides deploying Kubernetes workloads to AWS, Azure, and GCP with IAM, ingress, storage, autoscaling, and monitoring integrations.

npx playbooks add skill williamzujkowski/cognitive-toolworks --skill cloud-kubernetes-integrator

Review the files below or copy the command above to add this skill to your agents.

Files (3)
SKILL.md
14.0 KB
---
name: "Cloud Platform Integrator"
slug: "cloud-kubernetes-integrator"
description: "Integrate Kubernetes workloads with AWS EKS, Azure AKS, and GCP GKE including IAM, ingress controllers, storage classes, and platform-specific features."
capabilities:
  - EKS, AKS, GKE cluster integration configuration
  - Cloud-native ingress controller setup (ALB, App Gateway, Cloud Load Balancing)
  - IAM roles for service accounts (IRSA, Workload Identity, AAD Pod Identity)
  - Cloud storage class integration (EBS, Azure Disk, Persistent Disk)
  - Autoscaling configuration (Cluster Autoscaler, Karpenter)
  - Cloud-specific monitoring and logging integration
inputs:
  - cloud_provider: "aws, azure, gcp (string)"
  - cluster_name: "Kubernetes cluster name (string)"
  - region: "cloud region (string)"
  - ingress_type: "alb, nginx, traefik, app-gateway, cloud-load-balancer (string, optional)"
  - storage_required: "requires persistent storage (boolean, default: false)"
  - autoscaling_enabled: "enable cluster autoscaling (boolean, default: false)"
outputs:
  - platform_config: "cloud-specific Kubernetes configuration"
  - iam_config: "service account to IAM role bindings"
  - ingress_controller: "ingress controller setup and configuration"
  - storage_classes: "cloud-native storage class definitions"
  - monitoring_integration: "CloudWatch/Azure Monitor/Cloud Logging setup"
keywords:
  - eks
  - aks
  - gke
  - cloud-kubernetes
  - irsa
  - workload-identity
  - alb-controller
  - cluster-autoscaler
  - karpenter
  - cloud-native
version: "1.0.0"
owner: "cognitive-toolworks"
license: "MIT"
security: "Public; no secrets or PII; safe for open repositories"
links:
  - https://aws.github.io/aws-eks-best-practices/
  - https://learn.microsoft.com/en-us/azure/aks/best-practices
  - https://cloud.google.com/kubernetes-engine/docs/best-practices
  - https://docs.aws.amazon.com/eks/latest/userguide/
---

## Purpose & When-To-Use

**Trigger conditions:**
- Deploying Kubernetes workloads to managed cloud clusters (EKS, AKS, GKE)
- Integrating cloud IAM with Kubernetes service accounts
- Setting up cloud-native ingress controllers
- Configuring persistent storage with cloud block storage
- Enabling cluster autoscaling for dynamic workloads
- Integrating cloud monitoring and logging services

**Not for:**
- Generic Kubernetes manifest generation (use kubernetes-manifest-generator)
- Helm chart creation (use kubernetes-helm-builder)
- Service mesh configuration (use kubernetes-servicemesh-configurator)
- Serverless deployments (use cloud-serverless-designer)
- Complete orchestration (use cloud-native-orchestrator agent)

---

## Pre-Checks

**Time normalization:**
- Compute `NOW_ET` using NIST/time.gov semantics (America/New_York, ISO-8601): 2025-10-26T01:33:54-04:00
- Use `NOW_ET` for all citation access dates

**Input validation:**
- `cloud_provider` must be: aws, azure, or gcp
- `cluster_name` must be valid for cloud provider naming rules
- `region` must be valid region for selected cloud provider
- `ingress_type` if specified must match cloud provider capabilities

**Source freshness:**
- AWS EKS Best Practices (accessed 2025-10-26T01:33:54-04:00): https://aws.github.io/aws-eks-best-practices/
- Azure AKS Best Practices (accessed 2025-10-26T01:33:54-04:00): https://learn.microsoft.com/en-us/azure/aks/best-practices
- GCP GKE Best Practices (accessed 2025-10-26T01:33:54-04:00): https://cloud.google.com/kubernetes-engine/docs/best-practices

**Decision thresholds:**
- T1 for basic cloud integration (IAM, storage classes)
- T2 for production integration (ingress, autoscaling, monitoring)

---

## Procedure

### T1: Basic Cloud Integration (≤2k tokens)

**Step 1: Configure IAM for Kubernetes**
- **AWS**: Create IRSA (IAM Roles for Service Accounts) configuration
- **Azure**: Configure AAD Pod Identity or Workload Identity
- **GCP**: Set up Workload Identity binding
- Generate ServiceAccount with cloud IAM annotation

**Step 2: Define storage classes**
- **AWS**: Create StorageClass for gp3 EBS volumes with encryption
- **Azure**: Create StorageClass for Azure Disk (Premium_LRS)
- **GCP**: Create StorageClass for Persistent Disk (pd-ssd)
- Add reclaim policy and volume expansion settings

**Output:**
- Cloud IAM to Kubernetes ServiceAccount binding config
- StorageClass definitions for persistent storage
- Basic integration validation steps

**Abort conditions:**
- Cloud region not supported by cluster
- IAM permissions insufficient for IRSA/Workload Identity setup

---

### T2: Production Cloud Integration (≤6k tokens)

**All T1 steps plus:**

**Step 1: Deploy cloud-native ingress controller**
- **AWS**: Install AWS Load Balancer Controller for ALB/NLB ingress
- **Azure**: Configure Application Gateway Ingress Controller (AGIC)
- **GCP**: Set up GKE Ingress for Cloud Load Balancing
- Configure ingress annotations for SSL, health checks, routing

**Step 2: Enable cluster autoscaling**
- **AWS**: Deploy Cluster Autoscaler or Karpenter for node provisioning
- **Azure**: Configure AKS cluster autoscaler with node pools
- **GCP**: Enable GKE cluster autoscaler with min/max node counts
- Set autoscaling policies based on CPU/memory utilization

**Step 3: Integrate cloud monitoring**
- **AWS**: Configure Container Insights with CloudWatch
- **Azure**: Enable Azure Monitor for containers
- **GCP**: Set up Cloud Logging and Cloud Monitoring
- Add log aggregation and metrics collection configs

**Step 4: Configure container registry integration**
- **AWS**: Set up ECR pull secrets or IRSA for ECR
- **Azure**: Configure ACR integration with AKS
- **GCP**: Enable Artifact Registry with Workload Identity
- Add imagePullSecrets to ServiceAccount

**Step 5: Network policy and security**
- Configure cloud-specific network policies
- Set up VPC/VNet integration for private clusters
- Add security group rules for ingress/egress
- Enable pod security policies (PSPs) or Pod Security Standards

**Output:**
- Complete cloud-native ingress setup
- Cluster autoscaling configuration
- Monitoring and logging integration
- Container registry authentication
- Network and security configurations

**Abort conditions:**
- Ingress controller conflicts with existing setup
- Insufficient cloud quotas for autoscaling
- Network policy conflicts with cloud VPC rules

---

### T3: Advanced Cloud Platform Features (≤12k tokens)

**All T1 + T2 steps plus:**

**Step 1: Multi-AZ and high availability**
- Configure node pools across availability zones
- Set topology spread constraints for pod distribution
- Add pod disruption budgets for maintenance

**Step 2: Advanced autoscaling**
- Configure custom metrics autoscaling (KEDA)
- Set up predictive autoscaling based on schedules
- Add spot/preemptible instance integration

**Step 3: Disaster recovery and backup**
- Configure Velero with cloud storage backend
- Set up cross-region cluster federation
- Add automated backup schedules

**Output:**
- Multi-AZ HA configuration
- Advanced autoscaling with custom metrics
- Disaster recovery and backup setup

---

## Decision Rules

**Cloud provider-specific features:**
- **AWS EKS**: IRSA for IAM, ALB Controller, Karpenter for autoscaling, EBS CSI driver
- **Azure AKS**: Workload Identity, AGIC, Virtual nodes, Azure Monitor
- **GCP GKE**: Workload Identity, GKE Ingress, Autopilot mode, Cloud Logging native

**Ingress controller selection:**
- **AWS ALB**: Native AWS integration, Layer 7 load balancing, WAF integration
- **Azure App Gateway**: Azure-native, WAF, SSL offload
- **GCP GKE Ingress**: Cloud Load Balancing, global load balancing, CDN integration
- **NGINX/Traefik**: Cloud-agnostic, advanced routing, middleware support

**Storage class types:**
- **AWS**: gp3 (general purpose SSD), io2 (high IOPS), efs (shared filesystem)
- **Azure**: Premium_LRS (SSD), Standard_LRS (HDD), Azure Files (shared)
- **GCP**: pd-ssd (SSD), pd-standard (HDD), Filestore (shared NFS)

**Autoscaling strategy:**
- **Cluster Autoscaler**: Standard, multi-cloud compatible
- **Karpenter (AWS)**: Fast, bin-packing optimization, spot instances
- **GKE Autopilot**: Fully managed, pay-per-pod
- **AKS Virtual Nodes**: Serverless node pool with ACI

**Ambiguity handling:**
- If ingress_type not specified → use cloud-native option (ALB, AGIC, GKE Ingress)
- If storage_required unclear → ask about stateful application needs
- If autoscaling_enabled unclear → recommend based on workload variability

---

## Output Contract

**Required fields (all tiers):**
```yaml
iam_config:
  platform: "aws-irsa | azure-workload-identity | gcp-workload-identity"
  service_account:
    apiVersion: v1
    kind: ServiceAccount
    metadata:
      name: "app-sa"
      annotations:
        cloud_annotation: "arn:aws:iam::xxx | azure_client_id | gcp_sa_email"
  iam_policy: "cloud IAM policy or role definition"

storage_classes:
  - name: "cloud-storage"
    provisioner: "cloud-specific CSI driver"
    parameters:
      type: "gp3 | Premium_LRS | pd-ssd"
      encrypted: "true"
    reclaimPolicy: "Retain | Delete"
    allowVolumeExpansion: true
```

**Additional T2 fields:**
```yaml
ingress_controller:
  type: "alb | app-gateway | gke-ingress | nginx"
  installation: "Helm chart or manifest YAML"
  configuration: "controller-specific settings"
  ingress_class: "IngressClass resource YAML"

autoscaling:
  type: "cluster-autoscaler | karpenter | aks-autoscaler | gke-autoscaler"
  configuration: "autoscaler deployment or managed config"
  scaling_policies:
    min_nodes: integer
    max_nodes: integer
    target_cpu_utilization: integer

monitoring:
  platform: "cloudwatch | azure-monitor | cloud-logging"
  configuration: "monitoring agent DaemonSet or managed config"
  log_aggregation: "FluentBit/Fluentd configuration"
  metrics_collection: "Prometheus scraping or cloud metrics"

registry_auth:
  method: "irsa | workload-identity | image-pull-secret"
  configuration: "registry authentication setup"
```

**Additional T3 fields:**
```yaml
high_availability:
  multi_az: boolean
  topology_spread_constraints: "pod topology config"
  pod_disruption_budgets: "PDB YAML"

advanced_autoscaling:
  keda_scalers: ["custom metric scalers"]
  predictive_scaling: "schedule-based scaling rules"
  spot_instances:
    enabled: boolean
    fallback_to_on_demand: boolean

disaster_recovery:
  velero_config: "Velero installation with cloud storage"
  backup_schedule: "cron schedule for backups"
  cross_region_replication: boolean
```

---

## Examples

```yaml
# T1 Example: AWS IRSA ServiceAccount
apiVersion: v1
kind: ServiceAccount
metadata:
  name: app-service-account
  namespace: default
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::123456789012:role/app-role
```

```yaml
# T1 Example: AWS EBS StorageClass
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: ebs-gp3
provisioner: ebs.csi.aws.com
parameters:
  type: gp3
  encrypted: "true"
  iops: "3000"
  throughput: "125"
reclaimPolicy: Retain
allowVolumeExpansion: true
volumeBindingMode: WaitForFirstConsumer
```

```yaml
# T2 Example: AWS ALB Ingress
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: app-ingress
  annotations:
    kubernetes.io/ingress.class: alb
    alb.ingress.kubernetes.io/scheme: internet-facing
    alb.ingress.kubernetes.io/target-type: ip
    alb.ingress.kubernetes.io/certificate-arn: arn:aws:acm:...
spec:
  rules:
  - host: app.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: app-service
            port:
              number: 80
```

---

## Quality Gates

**Token budgets (enforced):**
- **T1**: ≤2,000 tokens - basic IAM and storage integration
- **T2**: ≤6,000 tokens - ingress, autoscaling, monitoring, registry auth
- **T3**: ≤12,000 tokens - HA, advanced autoscaling, disaster recovery

**Safety checks:**
- IAM policies follow least-privilege principle
- Storage encryption enabled by default
- Ingress configured with HTTPS/TLS (production)
- Network policies restrict unnecessary traffic

**Auditability:**
- All cloud resources cite official cloud provider documentation
- IAM role ARNs/IDs explicitly specified
- Storage classes specify encryption and reclaim policies
- Autoscaling policies include min/max node constraints

**Determinism:**
- Same inputs produce identical cloud integration configs
- Storage class parameters are explicit (not cloud defaults)
- IAM annotations use consistent format

**Validation requirements:**
- Cloud IAM configs validate against cloud provider schemas
- StorageClass manifests validate with kubectl
- Ingress resources validate against Kubernetes API
- T2+ configs include cost estimate for cloud resources

---

## Resources

**Official Documentation (accessed 2025-10-26T01:33:54-04:00):**
- AWS EKS User Guide: https://docs.aws.amazon.com/eks/latest/userguide/
- AWS EKS Best Practices: https://aws.github.io/aws-eks-best-practices/
- AWS Load Balancer Controller: https://kubernetes-sigs.github.io/aws-load-balancer-controller/
- Azure AKS Documentation: https://learn.microsoft.com/en-us/azure/aks/
- Azure AKS Best Practices: https://learn.microsoft.com/en-us/azure/aks/best-practices
- GCP GKE Documentation: https://cloud.google.com/kubernetes-engine/docs
- GCP GKE Best Practices: https://cloud.google.com/kubernetes-engine/docs/best-practices

**IAM and Identity:**
- AWS IRSA: https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts.html
- Azure Workload Identity: https://learn.microsoft.com/en-us/azure/aks/workload-identity-overview
- GCP Workload Identity: https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity

**Storage and Networking:**
- AWS EBS CSI Driver: https://github.com/kubernetes-sigs/aws-ebs-csi-driver
- Azure Disk CSI Driver: https://github.com/kubernetes-sigs/azuredisk-csi-driver
- GCP Persistent Disk CSI Driver: https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver

**Autoscaling:**
- Cluster Autoscaler: https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler
- Karpenter: https://karpenter.sh/
- AKS Cluster Autoscaler: https://learn.microsoft.com/en-us/azure/aks/cluster-autoscaler
- GKE Cluster Autoscaler: https://cloud.google.com/kubernetes-engine/docs/concepts/cluster-autoscaler

Overview

This skill integrates Kubernetes workloads with managed cloud platforms (AWS EKS, Azure AKS, GCP GKE), handling IAM, ingress, storage classes, autoscaling, and platform-specific features. It produces concrete Kubernetes manifests and cloud-binding configs for service accounts, storage, ingress controllers, autoscalers, monitoring, and registry authentication. The outputs are validation-ready YAML and a checklist for basic, production, and advanced integration tiers.

How this skill works

Given cloud_provider, cluster_name, region, and optional ingress_type or autoscaling preferences, the skill validates inputs and selects cloud-native defaults. It generates IAM-to-Kubernetes ServiceAccount bindings, storage class manifests, ingress controller installation/configuration, autoscaler definitions, monitoring integrations, and registry auth. Tiers control depth: T1 covers IAM and storage; T2 adds ingress, autoscaling, monitoring and registry auth; T3 extends to HA, advanced autoscaling and disaster recovery.

When to use it

  • Deploying stateful or stateless apps to managed clusters (EKS/AKS/GKE) requiring cloud IAM integration
  • Setting up cloud-native ingress with SSL, health checks, and routing
  • Configuring persistent storage with encryption and reclaim policies
  • Enabling cluster autoscaling and cluster-level scaling policies
  • Integrating cloud monitoring, logging, and container registry authentication

Best practices

  • Prefer cloud-native identity (IRSA / Workload Identity) over static credentials for least-privilege access
  • Use encrypted storage classes with explicit reclaimPolicy and allowVolumeExpansion set
  • Select the cloud-native ingress controller by default unless advanced routing requires NGINX/Traefik
  • Define min/max node counts and target utilization for autoscalers; include cost estimates for T2+
  • Apply network policies, private VPC/VNet integration, and pod security standards for production clusters

Example use cases

  • T1: Create an AWS IRSA-bound ServiceAccount and a gp3 EBS StorageClass for a stateful app
  • T2: Install AWS Load Balancer Controller, enable Cluster Autoscaler, and configure CloudWatch Container Insights
  • T2: Configure AKS with Workload Identity, Application Gateway Ingress Controller, and ACR integration
  • T3: Deploy multi-AZ node pools, KEDA-based custom-metric autoscaling, and Velero backups to cloud storage
  • Migration: Convert on-prem Kubernetes manifests to cloud-integrated manifests with registry auth and network rules

FAQ

What inputs are required to generate configs?

Provide cloud_provider (aws|azure|gcp), cluster_name, region, and optional ingress_type or storage requirements; the skill validates naming and region support.

Which tier should I choose for production workloads?

Use T2 for production-ready ingress, autoscaling, monitoring, and registry auth; choose T3 for HA, advanced autoscaling, and disaster recovery.