home / skills / astronomer / agents / deploying-airflow

deploying-airflow skill

/skills/deploying-airflow

This skill helps you deploy Airflow DAGs and projects to production using Astro or Docker/Kubernetes CI/CD, streamlining deployment strategies.

npx playbooks add skill astronomer/agents --skill deploying-airflow

Review the files below or copy the command above to add this skill to your agents.

Files (1)
SKILL.md
10.8 KB
---
name: deploying-airflow
description: Deploy Airflow DAGs and projects. Use when the user wants to deploy code, push DAGs, set up CI/CD, deploy to production, or asks about deployment strategies for Airflow.
---

# Deploying Airflow

This skill covers deploying Airflow DAGs and projects to production, whether using Astro (Astronomer's managed platform) or open-source Airflow on Docker Compose or Kubernetes.

**Choosing a path:** Astro is a good fit for managed operations and faster CI/CD. For open-source, use Docker Compose for dev and the Helm chart for production.

---

## Astro (Astronomer)

Astro provides CLI commands and GitHub integration for deploying Airflow projects.

### Deploy Commands

| Command | What It Does |
|---------|--------------|
| `astro deploy` | Full project deploy — builds Docker image and deploys DAGs |
| `astro deploy --dags` | DAG-only deploy — pushes only DAG files (fast, no image build) |
| `astro deploy --image` | Image-only deploy — pushes only the Docker image (for multi-repo CI/CD) |
| `astro deploy --dbt` | dbt project deploy — deploys a dbt project to run alongside Airflow |

### Full Project Deploy

Builds a Docker image from your Astro project and deploys everything (DAGs, plugins, requirements, packages):

```bash
astro deploy
```

Use this when you've changed `requirements.txt`, `Dockerfile`, `packages.txt`, plugins, or any non-DAG file.

### DAG-Only Deploy

Pushes only files in the `dags/` directory without rebuilding the Docker image:

```bash
astro deploy --dags
```

This is significantly faster than a full deploy since it skips the image build. Use this when you've only changed DAG files and haven't modified dependencies or configuration.

### Image-Only Deploy

Pushes only the Docker image without updating DAGs:

```bash
astro deploy --image
```

This is useful in multi-repo setups where DAGs are deployed separately from the image, or in CI/CD pipelines that manage image and DAG deploys independently.

### dbt Project Deploy

Deploys a dbt project to run with Cosmos on an Astro deployment:

```bash
astro deploy --dbt
```

### GitHub Integration

Astro supports branch-to-deployment mapping for automated deploys:

- Map branches to specific deployments (e.g., `main` -> production, `develop` -> staging)
- Pushes to mapped branches trigger automatic deploys
- Supports DAG-only deploys on merge for faster iteration

Configure this in the Astro UI under **Deployment Settings > CI/CD**.

### CI/CD Patterns

Common CI/CD strategies on Astro:

1. **DAG-only on feature branches**: Use `astro deploy --dags` for fast iteration during development
2. **Full deploy on main**: Use `astro deploy` on merge to main for production releases
3. **Separate image and DAG pipelines**: Use `--image` and `--dags` in separate CI jobs for independent release cycles

### Deploy Queue

When multiple deploys are triggered in quick succession, Astro processes them sequentially in a deploy queue. Each deploy completes before the next one starts.

### Reference

- [Astro Deploy Documentation](https://www.astronomer.io/docs/astro/deploy-code)

---

## Open-Source: Docker Compose

Deploy Airflow using the official Docker Compose setup. This is recommended for learning and exploration — for production, use Kubernetes with the Helm chart (see below).

### Prerequisites

- Docker and Docker Compose v2.14.0+
- The official `apache/airflow` Docker image

### Quick Start

Download the official Airflow 3 Docker Compose file:

```bash
curl -LfO 'https://airflow.apache.org/docs/apache-airflow/stable/docker-compose.yaml'
```

This sets up the full Airflow 3 architecture:

| Service | Purpose |
|---------|---------|
| `airflow-apiserver` | REST API and UI (port 8080) |
| `airflow-scheduler` | Schedules DAG runs |
| `airflow-dag-processor` | Parses and processes DAG files |
| `airflow-worker` | Executes tasks (CeleryExecutor) |
| `airflow-triggerer` | Handles deferrable/async tasks |
| `postgres` | Metadata database |
| `redis` | Celery message broker |

### Minimal Setup

For a simpler setup with LocalExecutor (no Celery/Redis), create a `docker-compose.yaml`:

```yaml
x-airflow-common: &airflow-common
  image: apache/airflow:3  # Use the latest Airflow 3.x release
  environment: &airflow-common-env
    AIRFLOW__CORE__EXECUTOR: LocalExecutor
    AIRFLOW__DATABASE__SQL_ALCHEMY_CONN: postgresql+psycopg2://airflow:airflow@postgres/airflow
    AIRFLOW__CORE__LOAD_EXAMPLES: 'false'
    AIRFLOW__CORE__DAGS_FOLDER: /opt/airflow/dags
  volumes:
    - ./dags:/opt/airflow/dags
    - ./logs:/opt/airflow/logs
    - ./plugins:/opt/airflow/plugins
  depends_on:
    postgres:
      condition: service_healthy

services:
  postgres:
    image: postgres:16
    environment:
      POSTGRES_USER: airflow
      POSTGRES_PASSWORD: airflow
      POSTGRES_DB: airflow
    volumes:
      - postgres-db-volume:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD", "pg_isready", "-U", "airflow"]
      interval: 10s
      retries: 5
      start_period: 5s

  airflow-init:
    <<: *airflow-common
    entrypoint: /bin/bash
    command:
      - -c
      - |
        airflow db migrate
        airflow users create \
          --username admin \
          --firstname Admin \
          --lastname User \
          --role Admin \
          --email [email protected] \
          --password admin
    depends_on:
      postgres:
        condition: service_healthy

  airflow-apiserver:
    <<: *airflow-common
    command: airflow api-server
    ports:
      - "8080:8080"
    healthcheck:
      test: ["CMD", "curl", "--fail", "http://localhost:8080/health"]
      interval: 30s
      timeout: 10s
      retries: 5
      start_period: 30s

  airflow-scheduler:
    <<: *airflow-common
    command: airflow scheduler

  airflow-dag-processor:
    <<: *airflow-common
    command: airflow dag-processor

  airflow-triggerer:
    <<: *airflow-common
    command: airflow triggerer

volumes:
  postgres-db-volume:
```

> **Airflow 3 architecture note**: The webserver has been replaced by the **API server** (`airflow api-server`), and the **DAG processor** now runs as a standalone process separate from the scheduler.

### Common Operations

```bash
# Start all services
docker compose up -d

# Stop all services
docker compose down

# View logs
docker compose logs -f airflow-scheduler

# Restart after requirements change
docker compose down && docker compose up -d --build

# Run a one-off Airflow CLI command
docker compose exec airflow-apiserver airflow dags list
```

### Installing Python Packages

Add packages to `requirements.txt` and rebuild:

```bash
# Add to requirements.txt, then:
docker compose down
docker compose up -d --build
```

Or use a custom Dockerfile:

```dockerfile
FROM apache/airflow:3  # Pin to a specific version (e.g., 3.1.7) for reproducibility
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
```

Update `docker-compose.yaml` to build from the Dockerfile:

```yaml
x-airflow-common: &airflow-common
  build:
    context: .
    dockerfile: Dockerfile
  # ... rest of config
```

### Environment Variables

Configure Airflow settings via environment variables in `docker-compose.yaml`:

```yaml
environment:
  # Core settings
  AIRFLOW__CORE__EXECUTOR: LocalExecutor
  AIRFLOW__CORE__PARALLELISM: 32
  AIRFLOW__CORE__MAX_ACTIVE_TASKS_PER_DAG: 16

  # Email
  AIRFLOW__EMAIL__EMAIL_BACKEND: airflow.utils.email.send_email_smtp
  AIRFLOW__SMTP__SMTP_HOST: smtp.example.com

  # Connections (as URI)
  AIRFLOW_CONN_MY_DB: postgresql://user:pass@host:5432/db
```

---

## Open-Source: Kubernetes (Helm Chart)

Deploy Airflow on Kubernetes using the official Apache Airflow Helm chart.

### Prerequisites

- A Kubernetes cluster
- `kubectl` configured
- `helm` installed

### Installation

```bash
# Add the Airflow Helm repo
helm repo add apache-airflow https://airflow.apache.org
helm repo update

# Install with default values
helm install airflow apache-airflow/airflow \
  --namespace airflow \
  --create-namespace

# Install with custom values
helm install airflow apache-airflow/airflow \
  --namespace airflow \
  --create-namespace \
  -f values.yaml
```

### Key values.yaml Configuration

```yaml
# Executor type
executor: KubernetesExecutor  # or CeleryExecutor, LocalExecutor

# Airflow image (pin to your desired version)
defaultAirflowRepository: apache/airflow
defaultAirflowTag: "3"  # Or pin: "3.1.7"

# Git-sync for DAGs (recommended for production)
dags:
  gitSync:
    enabled: true
    repo: https://github.com/your-org/your-dags.git
    branch: main
    subPath: dags
    wait: 60  # seconds between syncs

# API server (replaces webserver in Airflow 3)
apiServer:
  resources:
    requests:
      cpu: "250m"
      memory: "512Mi"
    limits:
      cpu: "500m"
      memory: "1Gi"
  replicas: 1

# Scheduler
scheduler:
  resources:
    requests:
      cpu: "500m"
      memory: "1Gi"
    limits:
      cpu: "1000m"
      memory: "2Gi"

# Standalone DAG processor
dagProcessor:
  enabled: true
  resources:
    requests:
      cpu: "250m"
      memory: "512Mi"
    limits:
      cpu: "500m"
      memory: "1Gi"

# Triggerer (for deferrable tasks)
triggerer:
  resources:
    requests:
      cpu: "250m"
      memory: "512Mi"
    limits:
      cpu: "500m"
      memory: "1Gi"

# Worker resources (CeleryExecutor only)
workers:
  resources:
    requests:
      cpu: "500m"
      memory: "1Gi"
    limits:
      cpu: "2000m"
      memory: "4Gi"
  replicas: 2

# Log persistence
logs:
  persistence:
    enabled: true
    size: 10Gi

# PostgreSQL (built-in)
postgresql:
  enabled: true

# Or use an external database
# postgresql:
#   enabled: false
# data:
#   metadataConnection:
#     user: airflow
#     pass: airflow
#     host: your-rds-host.amazonaws.com
#     port: 5432
#     db: airflow
```

### Upgrading

```bash
# Upgrade with new values
helm upgrade airflow apache-airflow/airflow \
  --namespace airflow \
  -f values.yaml

# Upgrade to a new Airflow version
helm upgrade airflow apache-airflow/airflow \
  --namespace airflow \
  --set defaultAirflowTag="<version>"
```

### DAG Deployment Strategies on Kubernetes

1. **Git-sync** (recommended): DAGs are synced from a Git repository automatically
2. **Persistent Volume**: Mount a shared PV containing DAGs
3. **Baked into image**: Include DAGs in a custom Docker image

### Useful Commands

```bash
# Check pod status
kubectl get pods -n airflow

# View scheduler logs
kubectl logs -f deployment/airflow-scheduler -n airflow

# Port-forward the API server
kubectl port-forward svc/airflow-apiserver 8080:8080 -n airflow

# Run a one-off CLI command
kubectl exec -it deployment/airflow-scheduler -n airflow -- airflow dags list
```

---

## Related Skills

- **setting-up-astro-project**: For initializing a new Astro project
- **managing-astro-local-env**: For local development with `astro dev`
- **authoring-dags**: For writing DAGs before deployment
- **testing-dags**: For testing DAGs before deployment

Overview

This skill helps deploy Airflow DAGs and complete Airflow projects to staging and production. It covers managed deployments with Astro and self-hosted deployments using Docker Compose for development or Kubernetes with the official Helm chart for production. The guidance focuses on practical commands, CI/CD patterns, and deployment strategies to minimize downtime and speed iteration.

How this skill works

The skill inspects your deployment context (Astro vs open-source) and recommends appropriate commands and pipelines. For Astro it shows CLI deployment modes (full, DAG-only, image-only, dbt) and branch-to-environment mapping. For self-hosted setups it outlines Docker Compose quick starts for local/dev and Helm values and git-sync patterns for Kubernetes production deployments. It also highlights common operational commands and upgrade steps.

When to use it

  • You need to push DAG changes quickly without rebuilding images (fast iteration).
  • You want a managed CI/CD path with branch-to-environment mapping (Astro).
  • You're setting up a local development environment with Docker Compose.
  • You need a production-grade Airflow on Kubernetes using the Helm chart.
  • You want to design CI jobs that separate image builds from DAG pushes.

Best practices

  • Use astro deploy --dags for DAG-only changes and astro deploy for dependency or image changes.
  • Map branches to environments (main->prod, develop->staging) and automate deploys from CI.
  • For Kubernetes production, prefer git-sync or PV mounts over baking DAGs into images.
  • Pin Airflow image versions for reproducible releases and use Helm values.yaml for config.
  • Keep requirements changes in CI image builds and avoid ad-hoc pip installs on running clusters.

Example use cases

  • Fast feature iteration: run astro deploy --dags on a feature branch to validate DAG logic quickly.
  • Production release: CI merges to main trigger astro deploy (full) or Helm upgrade with a pinned image tag.
  • Local testing: bring up an Airflow 3 stack with docker compose up -d for development and debugging.
  • Kubernetes deploy: install or upgrade Airflow via helm install/upgrade and use git-sync for DAG distribution.
  • Multi-repo CI: build and push an image from one pipeline (astro deploy --image) and push DAGs from another (astro deploy --dags).

FAQ

When should I rebuild the Docker image instead of doing a DAG-only deploy?

Rebuild when you change dependencies, requirements.txt, plugins, the Dockerfile, or any non-DAG code that must be inside the image.

Which DAG deployment method is recommended for production on Kubernetes?

Use git-sync for automatic, reliable DAG sync from a repository. Persistent volumes or baked images are alternatives but git-sync is typically easiest to manage.