home / skills / mjunaidca / mjs-agent-skills / cloud-deploy-blueprint

This skill guides end-to-end cloud deployment to AKS/GKE/DOKS, including CI/CD, ingress, SSL, and Next.js build-time variable handling.

npx playbooks add skill mjunaidca/mjs-agent-skills --skill cloud-deploy-blueprint

Review the files below or copy the command above to add this skill to your agents.

Files (4)
SKILL.md
10.9 KB
---
name: cloud-deploy-blueprint
description: End-to-end cloud deployment skill for Kubernetes (AKS/GKE/DOKS) with CI/CD pipelines. Covers managed services integration (Neon, Upstash), ingress configuration, SSL certificates, GitHub Actions workflows with selective builds, and Next.js build-time vs runtime environment handling. Battle-tested from 9-hour deployment session.
version: 1.0.0
---

# Cloud Deploy Blueprint

## Overview

This skill captures the complete knowledge for deploying a multi-service application to cloud Kubernetes, based on battle-tested learnings from deploying TaskFlow (5 microservices) to Azure AKS.

## When to Use

- Deploying to AKS, GKE, or DOKS
- Setting up CI/CD with GitHub Actions
- Integrating managed services (Neon PostgreSQL, Upstash Redis)
- Configuring ingress with SSL certificates
- Handling Next.js `NEXT_PUBLIC_*` variables in Docker/K8s

## Architecture Pattern

```
                         INTERNET
                             │
                             ▼
                    ┌─────────────────┐
                    │  Load Balancer  │  (Single Public IP)
                    └────────┬────────┘
                             │
                    ┌────────▼────────┐
                    │ Ingress (Traefik│  Routes by subdomain
                    │   or nginx)     │
                    └────────┬────────┘
                             │
        ┌────────────────────┼────────────────────┐
        │                    │                    │
        ▼                    ▼                    ▼
  ┌──────────┐        ┌──────────┐        ┌──────────┐
  │   Web    │        │   SSO    │        │   MCP    │
  │ (PUBLIC) │        │ (PUBLIC) │        │ (PUBLIC) │
  └────┬─────┘        └────┬─────┘        └────┬─────┘
       │                   │                   │
       │              ┌────▼─────┐             │
       └──────────────►   API    ◄─────────────┘
                      │(INTERNAL)│
                      └────┬─────┘
                           │
              ┌────────────┴────────────┐
              ▼                         ▼
      ┌─────────────┐           ┌─────────────┐
      │    Neon     │           │   Upstash   │
      │ (Postgres)  │           │   (Redis)   │
      │  EXTERNAL   │           │  EXTERNAL   │
      └─────────────┘           └─────────────┘
```

## Critical Concept: Build-Time vs Runtime Variables

### The Problem

Next.js `NEXT_PUBLIC_*` variables are **embedded at build time**, not runtime. This means:

```dockerfile
# WRONG: Setting NEXT_PUBLIC_* at runtime does NOTHING
ENV NEXT_PUBLIC_API_URL=https://api.example.com

# RIGHT: Must be set as build ARG
ARG NEXT_PUBLIC_API_URL=https://api.example.com
ENV NEXT_PUBLIC_API_URL=$NEXT_PUBLIC_API_URL
```

### The Solution

1. **In Dockerfile**: Use ARG for NEXT_PUBLIC_* variables
2. **In CI/CD**: Pass --build-arg with domain-specific values
3. **In values.yaml**: These are NOT runtime configurable

### Build-Time Variables (Next.js)

| Service | Variable | Purpose |
|---------|----------|---------|
| Web | `NEXT_PUBLIC_SSO_URL` | SSO endpoint for browser OAuth |
| Web | `NEXT_PUBLIC_API_URL` | API endpoint for browser fetch |
| Web | `NEXT_PUBLIC_APP_URL` | App URL for redirects |
| SSO | `NEXT_PUBLIC_BETTER_AUTH_URL` | Better Auth URL for browser |
| SSO | `NEXT_PUBLIC_CONTINUE_URL` | Redirect after email verify |

### Runtime Variables (ConfigMaps/Secrets)

| Service | Variable | Source |
|---------|----------|--------|
| SSO | `DATABASE_URL` | Secret (Neon) |
| SSO | `BETTER_AUTH_SECRET` | Secret |
| API | `SSO_URL` | ConfigMap (internal K8s URL) |
| MCP | `TASKFLOW_SSO_URL` | ConfigMap (internal K8s URL) |

## Internal K8s Service Names

Services communicate via K8s service names, NOT public URLs:

```yaml
# CORRECT - Internal communication
SSO_URL: http://sso-platform:3001
API_URL: http://taskflow-api:8000

# WRONG - Don't use public URLs for internal traffic
SSO_URL: https://sso.example.com
```

## GitHub Actions CI/CD Pattern

### Selective Builds with Path Filters

```yaml
jobs:
  changes:
    runs-on: ubuntu-latest
    outputs:
      api: ${{ steps.filter.outputs.api }}
      web: ${{ steps.filter.outputs.web }}
    steps:
      - uses: dorny/paths-filter@v3
        id: filter
        with:
          filters: |
            api:
              - 'apps/api/**'
            web:
              - 'apps/web/**'

  build-api:
    needs: changes
    if: needs.changes.outputs.api == 'true' || github.event_name == 'workflow_dispatch'
```

### Next.js Build Args Pattern

```yaml
- name: Build and push (web)
  uses: docker/build-push-action@v5
  with:
    build-args: |
      NEXT_PUBLIC_SSO_URL=https://sso.${{ vars.DOMAIN }}
      NEXT_PUBLIC_API_URL=https://api.${{ vars.DOMAIN }}
      NEXT_PUBLIC_APP_URL=https://${{ vars.DOMAIN }}
```

## GitHub Secrets & Variables

### Secrets (Sensitive)

```
NEON_SSO_DATABASE_URL
NEON_API_DATABASE_URL
NEON_CHATKIT_DATABASE_URL
NEON_NOTIFICATION_DATABASE_URL
UPSTASH_REDIS_HOST
UPSTASH_REDIS_PASSWORD
REDIS_URL
REDIS_TOKEN
BETTER_AUTH_SECRET
OPENAI_API_KEY
SMTP_USER
SMTP_PASSWORD
AZURE_CREDENTIALS (or GCP_CREDENTIALS)
```

### Variables (Non-sensitive)

```
DOMAIN=example.com
CLOUD_PROVIDER=azure
AZURE_RESOURCE_GROUP=myapp-rg
AZURE_CLUSTER_NAME=myapp-cluster
INGRESS_CLASS=traefik
```

## Helm Values Pattern

### values-cloud.yaml (Committed, Non-sensitive defaults)

```yaml
global:
  domain: ""  # Set via --set
  namespace: taskflow
  imagePullPolicy: Always

managedServices:
  neon:
    enabled: true
    # Connection strings injected via --set from secrets
  upstash:
    enabled: true
    # Credentials injected via --set from secrets

sso:
  enabled: true
  name: sso-platform
  postgresql:
    enabled: false  # Using Neon
  env:
    NODE_ENV: production
    BETTER_AUTH_URL: ""  # Set via --set
```

### Helm --set Pattern

```bash
helm upgrade --install taskflow ./infrastructure/helm/taskflow \
  --values values-cloud.yaml \
  --set global.imageRegistry="ghcr.io/owner/repo" \
  --set global.imageTag="${{ github.sha }}" \
  --set "managedServices.neon.ssoDatabase=${{ secrets.NEON_SSO_DATABASE_URL }}" \
  --set "sso.env.BETTER_AUTH_SECRET=${{ secrets.BETTER_AUTH_SECRET }}"
```

## CRITICAL: CPU Architecture Check

**BEFORE ANY DEPLOYMENT**, check your cluster's node architecture:

```bash
kubectl get nodes -o jsonpath='{.items[*].status.nodeInfo.architecture}'
```

- `amd64` → Use `platforms: linux/amd64`
- `arm64` → Use `platforms: linux/arm64`

**ARM64 is increasingly common** (Azure, AWS Graviton, Apple Silicon dev). Don't assume amd64!

### Docker Build for Correct Architecture

```yaml
- uses: docker/build-push-action@v5
  with:
    platforms: linux/arm64      # MATCH YOUR CLUSTER!
    provenance: false           # Avoid manifest issues
    no-cache: true              # When debugging
```

**Why `provenance: false`?**
Buildx attestation creates complex manifest lists that can cause "no match for platform" errors. Disable for simple, reliable images.

## Common Gotchas (Battle-Tested)

### 1. Logout Redirect to 0.0.0.0

**Problem:** `request.url` in K8s returns container bind address
**Solution:** Use `NEXT_PUBLIC_APP_URL` env var for redirects

```typescript
// WRONG
const response = NextResponse.redirect(new URL("/", request.url));

// RIGHT
const APP_URL = process.env.NEXT_PUBLIC_APP_URL || "http://localhost:3000";
const response = NextResponse.redirect(new URL("/", APP_URL));
```

### 2. Email Verification Redirect to localhost

**Problem:** Missing `NEXT_PUBLIC_CONTINUE_URL` in SSO Dockerfile
**Solution:** Add to Dockerfile and CD pipeline:

```dockerfile
ARG NEXT_PUBLIC_CONTINUE_URL=http://localhost:3000
ENV NEXT_PUBLIC_CONTINUE_URL=$NEXT_PUBLIC_CONTINUE_URL
```

### 3. Browser Making Requests to localhost

**Problem:** `NEXT_PUBLIC_*` not passed as build arg
**Solution:** Check ALL `NEXT_PUBLIC_*` variables systematically:

```bash
grep -r "NEXT_PUBLIC_" apps/web/src --include="*.ts" --include="*.tsx" | \
  grep -oE "NEXT_PUBLIC_[A-Z_]+" | sort -u
```

### 4. Hardcoded Sensitive Data

**Problem:** Email/passwords hardcoded in values files
**Solution:** Use `--set` from GitHub Secrets for ALL sensitive data

### 5. Missing Database Sections in values.yaml

**Problem:** Helm templates expect `database.host`, `postgresql.name` etc.
**Solution:** Include empty/default sections even for managed services:

```yaml
postgresql:
  enabled: false
  name: sso-platform-postgres

database:
  host: ""
  port: "5432"
  name: taskflow_sso
  user: postgres
```

## SSL Certificate Pattern (cert-manager)

```yaml
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-prod
spec:
  acme:
    server: https://acme-v02.api.letsencrypt.org/directory
    email: [email protected]
    privateKeySecretRef:
      name: letsencrypt-prod
    solvers:
    - http01:
        ingress:
          class: traefik
```

## Ingress Annotations for TLS

```yaml
annotations:
  cert-manager.io/cluster-issuer: letsencrypt-prod
  traefik.ingress.kubernetes.io/router.tls: "true"
```

## Pre-Deployment Checklist

### Code Changes
- [ ] All `NEXT_PUBLIC_*` vars documented and in Dockerfiles
- [ ] Redirect URLs use env vars, not `request.url`
- [ ] No hardcoded localhost in production code paths

### Dockerfiles
- [ ] All `NEXT_PUBLIC_*` as ARG and ENV
- [ ] Multi-stage build for slim production image
- [ ] Health check endpoint configured

### CI/CD Pipeline
- [ ] Build args for Next.js apps
- [ ] Path filters for selective builds
- [ ] All secrets listed and documented
- [ ] Helm --set for all sensitive values

### Helm Chart
- [ ] values-cloud.yaml has all required sections
- [ ] No sensitive data in committed files
- [ ] Internal service names for inter-service communication
- [ ] Ingress configured with correct class

### GitHub Setup
- [ ] All secrets created in repository settings
- [ ] All variables created in repository settings
- [ ] Azure/GCP credentials configured

## Related Skills

- `aks-deployment-troubleshooter` - Debug ImagePullBackOff, CrashLoopBackOff, architecture issues
- `containerize-apps` - Dockerization patterns
- `helm-charts` - Helm chart structure
- `kubernetes-essentials` - K8s fundamentals
- `better-auth-sso` - SSO integration

## Related Agents

- `impact-analyzer-agent` - Pre-containerization analysis

Overview

This skill captures a battle-tested, end-to-end blueprint for deploying multi-service applications to Kubernetes (AKS, GKE, DOKS) with CI/CD. It bundles patterns for managed service integration, ingress and TLS, GitHub Actions selective builds, and correct Next.js build-time vs runtime handling. The guidance is distilled from a 9-hour real-world deployment and focuses on reliable, repeatable outcomes.

How this skill works

The skill documents an opinionated architecture: a single public IP and load balancer, an ingress controller (Traefik or nginx) routing to public and internal services, and external managed services like Neon Postgres and Upstash Redis. It prescribes Dockerfile, Helm, and GitHub Actions patterns to pass Next.js NEXT_PUBLIC_* build args, inject runtime secrets via Helm --set from GitHub Secrets, and ensure images match cluster CPU architecture. It also includes cert-manager TLS setup, ingress annotations, and a pre-deployment checklist.

When to use it

  • Deploying multi-service apps to AKS, GKE, or DOKS with Kubernetes and Helm
  • Setting up GitHub Actions CI/CD with selective builds and build-arg injection
  • Integrating managed services such as Neon (Postgres) and Upstash (Redis)
  • Configuring ingress, cert-manager TLS, and production-grade routing
  • Ensuring Next.js build-time variables are embedded correctly and runtime secrets are secured

Best practices

  • Treat NEXT_PUBLIC_* variables as build-time: use ARG in Dockerfile and pass --build-arg in CI
  • Keep sensitive values out of committed values.yaml and inject them via Helm --set from GitHub Secrets
  • Always check cluster node architecture and build images for matching platforms (amd64/arm64)
  • Use internal Kubernetes service names for inter-service communication, not public URLs
  • Use path filters in GitHub Actions to trigger only affected service builds and saves CI time

Example use cases

  • Deploy a Next.js web, SSO, API, and worker set to Azure AKS with external Neon/Postgres and Upstash/Redis
  • Create CI pipeline that only rebuilds changed services and passes domain-specific NEXT_PUBLIC_* build args
  • Configure Traefik ingress with cert-manager for LetsEncrypt TLS and per-subdomain routing
  • Debug platform mismatches by verifying node architecture and building for linux/arm64 or linux/amd64
  • Migrate database credentials to Neon and inject connection strings as Helm secrets at deploy time

FAQ

Why do NEXT_PUBLIC_* variables need build-time arguments?

Next.js embeds NEXT_PUBLIC_* into the compiled client at build time, so setting them as runtime ENV does not change the built assets. Use ARG in Dockerfile and pass --build-arg during image build.

How do I avoid image platform mismatches?

Run kubectl to check node architectures and set docker buildx platforms to match (linux/amd64 or linux/arm64). Disable provenance in build action to avoid complex manifests.