home / skills / derklinke / codex-config / deployment-pipeline-design

deployment-pipeline-design skill

/skills/deployment-pipeline-design

This skill helps you design multi-stage CI/CD pipelines with approval gates, deployment strategies, and GitOps patterns to improve safety and speed.

This is most likely a fork of the deployment-pipeline-design skill from nilecui
npx playbooks add skill derklinke/codex-config --skill deployment-pipeline-design

Review the files below or copy the command above to add this skill to your agents.

Files (1)
SKILL.md
8.3 KB
---
name: deployment-pipeline-design
description: Design multi-stage CI/CD pipelines with approval gates, security checks, and deployment orchestration. Use when architecting deployment workflows, setting up continuous delivery, or implementing GitOps practices.
---

# Deployment Pipeline Design

Architecture patterns for multi-stage CI/CD pipelines with approval gates and deployment strategies.

## Purpose

Design robust, secure deployment pipelines that balance speed with safety through proper stage organization and approval workflows.

## When to Use

- Design CI/CD architecture
- Implement deployment gates
- Configure multi-environment pipelines
- Establish deployment best practices
- Implement progressive delivery

## Pipeline Stages

### Standard Pipeline Flow

```
┌─────────┐   ┌──────┐   ┌─────────┐   ┌────────┐   ┌──────────┐
│  Build  │ → │ Test │ → │ Staging │ → │ Approve│ → │Production│
└─────────┘   └──────┘   └─────────┘   └────────┘   └──────────┘
```

### Detailed Stage Breakdown

1. **Source** - Code checkout
2. **Build** - Compile, package, containerize
3. **Test** - Unit, integration, security scans
4. **Staging Deploy** - Deploy to staging environment
5. **Integration Tests** - E2E, smoke tests
6. **Approval Gate** - Manual approval required
7. **Production Deploy** - Canary, blue-green, rolling
8. **Verification** - Health checks, monitoring
9. **Rollback** - Automated rollback on failure

## Approval Gate Patterns

### Pattern 1: Manual Approval

```yaml
# GitHub Actions
production-deploy:
  needs: staging-deploy
  environment:
    name: production
    url: https://app.example.com
  runs-on: ubuntu-latest
  steps:
    - name: Deploy to production
      run: |
        # Deployment commands
```

### Pattern 2: Time-Based Approval

```yaml
# GitLab CI
deploy:production:
  stage: deploy
  script:
    - deploy.sh production
  environment:
    name: production
  when: delayed
  start_in: 30 minutes
  only:
    - main
```

### Pattern 3: Multi-Approver

```yaml
# Azure Pipelines
stages:
  - stage: Production
    dependsOn: Staging
    jobs:
      - deployment: Deploy
        environment:
          name: production
          resourceType: Kubernetes
        strategy:
          runOnce:
            preDeploy:
              steps:
                - task: ManualValidation@0
                  inputs:
                    notifyUsers: "[email protected]"
                    instructions: "Review staging metrics before approving"
```

**Reference:** See `assets/approval-gate-template.yml`

## Deployment Strategies

### 1. Rolling Deployment

```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
spec:
  replicas: 10
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 2
      maxUnavailable: 1
```

**Characteristics:**

- Gradual rollout
- Zero downtime
- Easy rollback
- Best for most applications

### 2. Blue-Green Deployment

```yaml
# Blue (current)
kubectl apply -f blue-deployment.yaml
kubectl label service my-app version=blue

# Green (new)
kubectl apply -f green-deployment.yaml
# Test green environment
kubectl label service my-app version=green

# Rollback if needed
kubectl label service my-app version=blue
```

**Characteristics:**

- Instant switchover
- Easy rollback
- Doubles infrastructure cost temporarily
- Good for high-risk deployments

### 3. Canary Deployment

```yaml
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: my-app
spec:
  replicas: 10
  strategy:
    canary:
      steps:
        - setWeight: 10
        - pause: { duration: 5m }
        - setWeight: 25
        - pause: { duration: 5m }
        - setWeight: 50
        - pause: { duration: 5m }
        - setWeight: 100
```

**Characteristics:**

- Gradual traffic shift
- Risk mitigation
- Real user testing
- Requires service mesh or similar

### 4. Feature Flags

```python
from flagsmith import Flagsmith

flagsmith = Flagsmith(environment_key="API_KEY")

if flagsmith.has_feature("new_checkout_flow"):
    # New code path
    process_checkout_v2()
else:
    # Existing code path
    process_checkout_v1()
```

**Characteristics:**

- Deploy without releasing
- A/B testing
- Instant rollback
- Granular control

## Pipeline Orchestration

### Multi-Stage Pipeline Example

```yaml
name: Production Pipeline

on:
  push:
    branches: [main]

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Build application
        run: make build
      - name: Build Docker image
        run: docker build -t myapp:${{ github.sha }} .
      - name: Push to registry
        run: docker push myapp:${{ github.sha }}

  test:
    needs: build
    runs-on: ubuntu-latest
    steps:
      - name: Unit tests
        run: make test
      - name: Security scan
        run: trivy image myapp:${{ github.sha }}

  deploy-staging:
    needs: test
    runs-on: ubuntu-latest
    environment:
      name: staging
    steps:
      - name: Deploy to staging
        run: kubectl apply -f k8s/staging/

  integration-test:
    needs: deploy-staging
    runs-on: ubuntu-latest
    steps:
      - name: Run E2E tests
        run: npm run test:e2e

  deploy-production:
    needs: integration-test
    runs-on: ubuntu-latest
    environment:
      name: production
    steps:
      - name: Canary deployment
        run: |
          kubectl apply -f k8s/production/
          kubectl argo rollouts promote my-app

  verify:
    needs: deploy-production
    runs-on: ubuntu-latest
    steps:
      - name: Health check
        run: curl -f https://app.example.com/health
      - name: Notify team
        run: |
          curl -X POST ${{ secrets.SLACK_WEBHOOK }} \
            -d '{"text":"Production deployment successful!"}'
```

## Pipeline Best Practices

1. **Fail fast** - Run quick tests first
2. **Parallel execution** - Run independent jobs concurrently
3. **Caching** - Cache dependencies between runs
4. **Artifact management** - Store build artifacts
5. **Environment parity** - Keep environments consistent
6. **Secrets management** - Use secret stores (Vault, etc.)
7. **Deployment windows** - Schedule deployments appropriately
8. **Monitoring integration** - Track deployment metrics
9. **Rollback automation** - Auto-rollback on failures
10. **Documentation** - Document pipeline stages

## Rollback Strategies

### Automated Rollback

```yaml
deploy-and-verify:
  steps:
    - name: Deploy new version
      run: kubectl apply -f k8s/

    - name: Wait for rollout
      run: kubectl rollout status deployment/my-app

    - name: Health check
      id: health
      run: |
        for i in {1..10}; do
          if curl -sf https://app.example.com/health; then
            exit 0
          fi
          sleep 10
        done
        exit 1

    - name: Rollback on failure
      if: failure()
      run: kubectl rollout undo deployment/my-app
```

### Manual Rollback

```bash
# List revision history
kubectl rollout history deployment/my-app

# Rollback to previous version
kubectl rollout undo deployment/my-app

# Rollback to specific revision
kubectl rollout undo deployment/my-app --to-revision=3
```

## Monitoring and Metrics

### Key Pipeline Metrics

- **Deployment Frequency** - How often deployments occur
- **Lead Time** - Time from commit to production
- **Change Failure Rate** - Percentage of failed deployments
- **Mean Time to Recovery (MTTR)** - Time to recover from failure
- **Pipeline Success Rate** - Percentage of successful runs
- **Average Pipeline Duration** - Time to complete pipeline

### Integration with Monitoring

```yaml
- name: Post-deployment verification
  run: |
    # Wait for metrics stabilization
    sleep 60

    # Check error rate
    ERROR_RATE=$(curl -s "$PROMETHEUS_URL/api/v1/query?query=rate(http_errors_total[5m])" | jq '.data.result[0].value[1]')

    if (( $(echo "$ERROR_RATE > 0.01" | bc -l) )); then
      echo "Error rate too high: $ERROR_RATE"
      exit 1
    fi
```

## Reference Files

- `references/pipeline-orchestration.md` - Complex pipeline patterns
- `assets/approval-gate-template.yml` - Approval workflow templates

## Related Skills

- `github-actions-templates` - For GitHub Actions implementation
- `gitlab-ci-patterns` - For GitLab CI implementation
- `secrets-management` - For secrets handling

Overview

This skill designs multi-stage CI/CD pipelines that combine build, test, staging, approval gates, and production deployment strategies. It focuses on balancing delivery speed with safety by prescribing stage order, approval patterns, deployment strategies, and rollback mechanics. Use it to architect predictable, auditable deployment workflows across teams and environments.

How this skill works

The skill inspects pipeline requirements and maps them to a modular stage layout: source, build, test, staging, integration tests, approval gates, production deploy, verification, and rollback. It recommends approval gate patterns (manual, time-delayed, multi-approver), deployment strategies (rolling, blue-green, canary, feature flags), and orchestration examples for CI systems. It also defines monitoring, metrics, and automated rollback checks to close the loop on deployments.

When to use it

  • Design a new CI/CD architecture for microservices or monoliths
  • Add approval gates or compliance checks before production deployments
  • Configure multi-environment pipelines (dev → staging → prod)
  • Implement progressive delivery (canary, blue-green, feature flags)
  • Define rollback and verification procedures for production releases

Best practices

  • Fail fast: run quick unit and static scans early to catch issues before expensive steps
  • Parallelize independent jobs to reduce pipeline duration and improve feedback time
  • Manage artifacts and caches to ensure reproducible builds and faster runs
  • Enforce environment parity and use secret stores (Vault, cloud secrets) for credentials
  • Integrate monitoring and automated health checks to trigger rollback on regressions

Example use cases

  • GitHub Actions pipeline that builds, scans, deploys to staging, runs E2E tests, then manual approval for production
  • GitLab CI delayed deployment (time-based approval) for change freeze windows
  • Azure Pipelines multi-approver production stage for high-risk services and compliance reviews
  • Canary rollout with Argo Rollouts and progressive weight changes plus automated verification queries to Prometheus
  • Feature-flag driven releases to decouple deploy from release and enable instant rollback at runtime

FAQ

How do I choose between canary, blue-green, and rolling deployments?

Choose rolling for routine low-risk updates with limited capacity overhead; blue-green for instant switchover and easy rollback when infrastructure cost is acceptable; canary for gradual exposure and real-user validation when you can control traffic weights or use a service mesh.

What triggers an automated rollback in this design?

Automated rollback is triggered by failing deployment status, failed health checks, or monitored signal thresholds (error rate, latency) evaluated after a stabilization window; pipelines should include a verification step that fails the job and runs a rollback command when thresholds are exceeded.