home / skills / adaptationio / skrillz / ecs-deployment

ecs-deployment skill

/skills/ecs-deployment

npx playbooks add skill adaptationio/skrillz --skill ecs-deployment

Review the files below or copy the command above to add this skill to your agents.

Files (2)
SKILL.md
14.8 KB
---
name: ecs-deployment
description: ECS deployment strategies including rolling updates, blue-green with CodeDeploy, canary releases, and GitOps workflows. Covers deployment circuit breakers, rollback strategies, and production deployment patterns. Use when deploying ECS services, implementing blue-green deployments, setting up CI/CD pipelines, or managing production releases.
---

# ECS Deployment Strategies

Complete guide to deploying ECS services safely and efficiently, from rolling updates to blue-green deployments.

## Quick Reference

| Strategy | Downtime | Rollback Speed | Complexity | Best For |
|----------|----------|----------------|------------|----------|
| Rolling Update | Zero | Medium | Low | Most workloads |
| Blue-Green | Zero | Instant | High | Critical services |
| Canary | Zero | Fast | High | Risk mitigation |

## Rolling Updates (Default)

### Configuration

```hcl
resource "aws_ecs_service" "app" {
  deployment_configuration {
    maximum_percent         = 200  # Allow 2x during deployment
    minimum_healthy_percent = 100  # Keep 100% healthy
  }

  deployment_circuit_breaker {
    enable   = true   # Auto-detect failures
    rollback = true   # Auto-rollback on failure
  }
}
```

### Behavior

1. New task definition registered
2. New tasks launched (up to maximum_percent)
3. Health checks pass on new tasks
4. Old tasks drained and stopped
5. Continues until all tasks updated

### Boto3 Deployment

```python
import boto3

ecs = boto3.client('ecs')

def deploy_rolling_update(cluster: str, service: str,
                          new_image: str, container_name: str):
    """Deploy new image via rolling update"""

    # 1. Get current task definition
    svc = ecs.describe_services(cluster=cluster, services=[service])
    current_task_def = svc['services'][0]['taskDefinition']

    # 2. Create new task definition revision
    task_def = ecs.describe_task_definition(taskDefinition=current_task_def)
    new_task_def = task_def['taskDefinition'].copy()

    # Remove response-only fields
    for field in ['taskDefinitionArn', 'revision', 'status',
                  'requiresAttributes', 'compatibilities',
                  'registeredAt', 'registeredBy']:
        new_task_def.pop(field, None)

    # Update image
    for container in new_task_def['containerDefinitions']:
        if container['name'] == container_name:
            container['image'] = new_image

    response = ecs.register_task_definition(**new_task_def)
    new_task_def_arn = response['taskDefinition']['taskDefinitionArn']

    # 3. Update service
    ecs.update_service(
        cluster=cluster,
        service=service,
        taskDefinition=new_task_def_arn,
        forceNewDeployment=True
    )

    print(f"Deploying {new_task_def_arn}")
    return new_task_def_arn

# Usage
deploy_rolling_update(
    cluster='production',
    service='api',
    new_image='123456789.dkr.ecr.us-east-1.amazonaws.com/api:v2.0',
    container_name='api'
)
```

### Monitor Deployment

```python
def wait_for_deployment(cluster: str, service: str, timeout: int = 600):
    """Wait for deployment to complete"""
    import time

    start = time.time()
    while time.time() - start < timeout:
        response = ecs.describe_services(cluster=cluster, services=[service])
        svc = response['services'][0]

        for deployment in svc['deployments']:
            print(f"Deployment {deployment['id'][:8]}: "
                  f"{deployment['rolloutState']} "
                  f"({deployment['runningCount']}/{deployment['desiredCount']})")

            if deployment['status'] == 'PRIMARY':
                if deployment['rolloutState'] == 'COMPLETED':
                    print("Deployment successful!")
                    return True
                elif deployment['rolloutState'] == 'FAILED':
                    print(f"Deployment failed: {deployment.get('rolloutStateReason')}")
                    return False

        time.sleep(15)

    print("Deployment timed out")
    return False
```

## Blue-Green Deployments

### Architecture

```
                    ┌─────────────┐
                    │    ALB      │
                    └──────┬──────┘
                           │
           ┌───────────────┴───────────────┐
           │                               │
    ┌──────▼──────┐                 ┌──────▼──────┐
    │ Target Group│                 │ Target Group│
    │    (Blue)   │                 │   (Green)   │
    └──────┬──────┘                 └──────┬──────┘
           │                               │
    ┌──────▼──────┐                 ┌──────▼──────┐
    │ ECS Service │                 │ ECS Service │
    │   (Blue)    │                 │   (Green)   │
    └─────────────┘                 └─────────────┘
```

### Terraform with CodeDeploy

```hcl
# Two target groups
resource "aws_lb_target_group" "blue" {
  name        = "app-blue"
  port        = 8080
  protocol    = "HTTP"
  vpc_id      = module.vpc.vpc_id
  target_type = "ip"

  health_check {
    path = "/health"
  }
}

resource "aws_lb_target_group" "green" {
  name        = "app-green"
  port        = 8080
  protocol    = "HTTP"
  vpc_id      = module.vpc.vpc_id
  target_type = "ip"

  health_check {
    path = "/health"
  }
}

# ALB with two listeners
resource "aws_lb_listener" "prod" {
  load_balancer_arn = aws_lb.app.arn
  port              = 443
  protocol          = "HTTPS"

  default_action {
    type             = "forward"
    target_group_arn = aws_lb_target_group.blue.arn
  }

  lifecycle {
    ignore_changes = [default_action]  # Managed by CodeDeploy
  }
}

resource "aws_lb_listener" "test" {
  load_balancer_arn = aws_lb.app.arn
  port              = 8443
  protocol          = "HTTPS"

  default_action {
    type             = "forward"
    target_group_arn = aws_lb_target_group.green.arn
  }

  lifecycle {
    ignore_changes = [default_action]
  }
}

# ECS Service with CodeDeploy
resource "aws_ecs_service" "app" {
  name            = "app"
  cluster         = module.ecs.cluster_id
  task_definition = aws_ecs_task_definition.app.arn
  desired_count   = 3

  deployment_controller {
    type = "CODE_DEPLOY"
  }

  load_balancer {
    target_group_arn = aws_lb_target_group.blue.arn
    container_name   = "app"
    container_port   = 8080
  }

  lifecycle {
    ignore_changes = [task_definition, load_balancer]
  }
}

# CodeDeploy Application
resource "aws_codedeploy_app" "app" {
  compute_platform = "ECS"
  name             = "app-deploy"
}

# CodeDeploy Deployment Group
resource "aws_codedeploy_deployment_group" "app" {
  app_name               = aws_codedeploy_app.app.name
  deployment_group_name  = "app-dg"
  deployment_config_name = "CodeDeployDefault.ECSAllAtOnce"
  service_role_arn       = aws_iam_role.codedeploy.arn

  auto_rollback_configuration {
    enabled = true
    events  = ["DEPLOYMENT_FAILURE", "DEPLOYMENT_STOP_ON_REQUEST"]
  }

  blue_green_deployment_config {
    deployment_ready_option {
      action_on_timeout = "CONTINUE_DEPLOYMENT"
    }

    terminate_blue_instances_on_deployment_success {
      action                           = "TERMINATE"
      termination_wait_time_in_minutes = 5
    }
  }

  deployment_style {
    deployment_option = "WITH_TRAFFIC_CONTROL"
    deployment_type   = "BLUE_GREEN"
  }

  ecs_service {
    cluster_name = module.ecs.cluster_name
    service_name = aws_ecs_service.app.name
  }

  load_balancer_info {
    target_group_pair_info {
      prod_traffic_route {
        listener_arns = [aws_lb_listener.prod.arn]
      }

      test_traffic_route {
        listener_arns = [aws_lb_listener.test.arn]
      }

      target_group {
        name = aws_lb_target_group.blue.name
      }

      target_group {
        name = aws_lb_target_group.green.name
      }
    }
  }
}
```

### Trigger Blue-Green Deployment

```python
import boto3
import json

codedeploy = boto3.client('codedeploy')

def deploy_blue_green(app_name: str, deployment_group: str,
                      task_definition_arn: str, container_name: str,
                      container_port: int):
    """Trigger blue-green deployment via CodeDeploy"""

    app_spec = {
        "version": "0.0",
        "Resources": [{
            "TargetService": {
                "Type": "AWS::ECS::Service",
                "Properties": {
                    "TaskDefinition": task_definition_arn,
                    "LoadBalancerInfo": {
                        "ContainerName": container_name,
                        "ContainerPort": container_port
                    }
                }
            }
        }]
    }

    response = codedeploy.create_deployment(
        applicationName=app_name,
        deploymentGroupName=deployment_group,
        revision={
            'revisionType': 'AppSpecContent',
            'appSpecContent': {
                'content': json.dumps(app_spec)
            }
        }
    )

    deployment_id = response['deploymentId']
    print(f"Started deployment: {deployment_id}")
    return deployment_id

# Usage
deploy_blue_green(
    app_name='app-deploy',
    deployment_group='app-dg',
    task_definition_arn='arn:aws:ecs:us-east-1:123456789:task-definition/app:5',
    container_name='app',
    container_port=8080
)
```

## Canary Releases

### ALB Weighted Routing

```hcl
resource "aws_lb_listener_rule" "canary" {
  listener_arn = aws_lb_listener.prod.arn
  priority     = 100

  action {
    type = "forward"
    forward {
      target_group {
        arn    = aws_lb_target_group.stable.arn
        weight = 90
      }
      target_group {
        arn    = aws_lb_target_group.canary.arn
        weight = 10
      }
    }
  }

  condition {
    path_pattern {
      values = ["/*"]
    }
  }
}
```

### Gradual Traffic Shift

```python
def shift_traffic(listener_rule_arn: str, canary_weight: int):
    """Shift traffic percentage to canary"""
    elb = boto3.client('elbv2')

    stable_weight = 100 - canary_weight

    elb.modify_rule(
        RuleArn=listener_rule_arn,
        Actions=[{
            'Type': 'forward',
            'ForwardConfig': {
                'TargetGroups': [
                    {
                        'TargetGroupArn': stable_tg_arn,
                        'Weight': stable_weight
                    },
                    {
                        'TargetGroupArn': canary_tg_arn,
                        'Weight': canary_weight
                    }
                ]
            }
        }]
    )

    print(f"Traffic: {stable_weight}% stable, {canary_weight}% canary")

# Progressive rollout
shift_traffic(rule_arn, 10)   # 10% to canary
# Monitor metrics...
shift_traffic(rule_arn, 25)   # 25% to canary
# Monitor metrics...
shift_traffic(rule_arn, 50)   # 50% to canary
# Monitor metrics...
shift_traffic(rule_arn, 100)  # 100% to canary (promote)
```

## Deployment Circuit Breaker

### How It Works

1. ECS monitors deployment health
2. Detects repeated task failures
3. Automatically stops deployment
4. Optional: Rolls back to previous version

### Configuration

```hcl
resource "aws_ecs_service" "app" {
  deployment_circuit_breaker {
    enable   = true
    rollback = true  # Auto-rollback on failure
  }
}
```

### Failure Detection

Circuit breaker triggers when:
- Tasks fail to reach RUNNING state
- Health checks fail repeatedly
- Tasks crash shortly after starting

## GitOps Workflow

### GitHub Actions Example

```yaml
name: Deploy to ECS

on:
  push:
    branches: [main]

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Configure AWS credentials
        uses: aws-actions/configure-aws-credentials@v4
        with:
          aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
          aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
          aws-region: us-east-1

      - name: Login to Amazon ECR
        id: login-ecr
        uses: aws-actions/amazon-ecr-login@v2

      - name: Build and push image
        env:
          ECR_REGISTRY: ${{ steps.login-ecr.outputs.registry }}
          IMAGE_TAG: ${{ github.sha }}
        run: |
          docker build -t $ECR_REGISTRY/myapp:$IMAGE_TAG .
          docker push $ECR_REGISTRY/myapp:$IMAGE_TAG

      - name: Update task definition
        id: task-def
        uses: aws-actions/amazon-ecs-render-task-definition@v1
        with:
          task-definition: task-definition.json
          container-name: myapp
          image: ${{ steps.login-ecr.outputs.registry }}/myapp:${{ github.sha }}

      - name: Deploy to ECS
        uses: aws-actions/amazon-ecs-deploy-task-definition@v2
        with:
          task-definition: ${{ steps.task-def.outputs.task-definition }}
          service: myapp-service
          cluster: production
          wait-for-service-stability: true
```

## Rollback Strategies

### Manual Rollback

```python
def rollback_to_previous(cluster: str, service: str):
    """Rollback to previous task definition"""

    # Get current task definition
    svc = ecs.describe_services(cluster=cluster, services=[service])
    current_td = svc['services'][0]['taskDefinition']

    # Parse family and revision
    # arn:aws:ecs:region:account:task-definition/family:revision
    parts = current_td.split('/')[-1].split(':')
    family = parts[0]
    current_revision = int(parts[1])

    # Go back to previous revision
    previous_td = f"{family}:{current_revision - 1}"

    # Update service
    ecs.update_service(
        cluster=cluster,
        service=service,
        taskDefinition=previous_td
    )

    print(f"Rolling back to {previous_td}")

# Usage
rollback_to_previous('production', 'api')
```

### Automatic Rollback (Circuit Breaker)

Enabled via `deployment_circuit_breaker.rollback = true`

## Best Practices

1. **Always enable circuit breaker** with rollback for production
2. **Use blue-green** for critical services requiring instant rollback
3. **Implement health checks** at container, task, and ALB levels
4. **Pin image digests** instead of tags for reproducibility
5. **Use immutable image tags** in ECR
6. **Monitor deployments** with CloudWatch alarms
7. **Test rollback procedures** regularly
8. **Keep previous task definitions** for quick rollback

## Progressive Disclosure

### Quick Start (This File)
- Rolling updates
- Blue-green basics
- Canary releases
- Circuit breaker

### Detailed References
- **[Blue-Green Setup](references/blue-green-setup.md)**: Complete CodeDeploy configuration
- **[CI/CD Pipelines](references/cicd-pipelines.md)**: GitHub Actions, CodePipeline
- **[Monitoring](references/deployment-monitoring.md)**: CloudWatch, alarms

## Related Skills

- **boto3-ecs**: SDK patterns
- **terraform-ecs**: Infrastructure as Code
- **ecs-troubleshooting**: Debugging deployments