home / skills / anton-abyzov / specweave / mlops-dag-builder

mlops-dag-builder skill

/plugins/specweave-ml/skills/mlops-dag-builder

This skill guides design of DAG-based MLOps pipelines with Airflow, Dagster, Kubeflow, or Prefect for scalable, platform-agnostic workflows.

This is most likely a fork of the sw-mlops-dag-builder skill from openclaw
npx playbooks add skill anton-abyzov/specweave --skill mlops-dag-builder

Review the files below or copy the command above to add this skill to your agents.

Files (1)
SKILL.md
6.6 KB
---
name: mlops-dag-builder
description: Design DAG-based MLOps pipeline architectures with Airflow, Dagster, Kubeflow, or Prefect. Activates for DAG orchestration, workflow automation, pipeline design patterns, CI/CD for ML. Use for platform-agnostic MLOps infrastructure - NOT for SpecWeave increment-based ML (use ml-pipeline-orchestrator instead).
---

# MLOps DAG Builder

Design and implement DAG-based ML pipeline architectures using production orchestration tools.

## Overview

This skill provides guidance for building **platform-agnostic MLOps pipelines** using DAG orchestrators (Airflow, Dagster, Kubeflow, Prefect). It focuses on workflow architecture, not SpecWeave integration.

**When to use this skill vs ml-pipeline-orchestrator:**
- **Use this skill**: General MLOps architecture, Airflow/Dagster DAGs, cloud ML platforms
- **Use ml-pipeline-orchestrator**: SpecWeave increment-based ML development with experiment tracking

## When to Use This Skill

- Designing DAG-based workflow orchestration (Airflow, Dagster, Kubeflow)
- Implementing platform-agnostic ML pipeline patterns
- Setting up CI/CD automation for ML training jobs
- Creating reusable pipeline templates for teams
- Integrating with cloud ML services (SageMaker, Vertex AI, Azure ML)

## What This Skill Provides

### Core Capabilities

1. **Pipeline Architecture**
   - End-to-end workflow design
   - DAG orchestration patterns (Airflow, Dagster, Kubeflow)
   - Component dependencies and data flow
   - Error handling and retry strategies

2. **Data Preparation**
   - Data validation and quality checks
   - Feature engineering pipelines
   - Data versioning and lineage
   - Train/validation/test splitting strategies

3. **Model Training**
   - Training job orchestration
   - Hyperparameter management
   - Experiment tracking integration
   - Distributed training patterns

4. **Model Validation**
   - Validation frameworks and metrics
   - A/B testing infrastructure
   - Performance regression detection
   - Model comparison workflows

5. **Deployment Automation**
   - Model serving patterns
   - Canary deployments
   - Blue-green deployment strategies
   - Rollback mechanisms

## Usage Patterns

### Basic Pipeline Setup

```python
# 1. Define pipeline stages
stages = [
    "data_ingestion",
    "data_validation",
    "feature_engineering",
    "model_training",
    "model_validation",
    "model_deployment"
]

# 2. Configure dependencies between stages
```

### Production Workflow

1. **Data Preparation Phase**
   - Ingest raw data from sources
   - Run data quality checks
   - Apply feature transformations
   - Version processed datasets

2. **Training Phase**
   - Load versioned training data
   - Execute training jobs
   - Track experiments and metrics
   - Save trained models

3. **Validation Phase**
   - Run validation test suite
   - Compare against baseline
   - Generate performance reports
   - Approve for deployment

4. **Deployment Phase**
   - Package model artifacts
   - Deploy to serving infrastructure
   - Configure monitoring
   - Validate production traffic

## Best Practices

### Pipeline Design

- **Modularity**: Each stage should be independently testable
- **Idempotency**: Re-running stages should be safe
- **Observability**: Log metrics at every stage
- **Versioning**: Track data, code, and model versions
- **Failure Handling**: Implement retry logic and alerting

### Data Management

- Use data validation libraries (Great Expectations, TFX)
- Version datasets with DVC or similar tools
- Document feature engineering transformations
- Maintain data lineage tracking

### Model Operations

- Separate training and serving infrastructure
- Use model registries (MLflow, Weights & Biases)
- Implement gradual rollouts for new models
- Monitor model performance drift
- Maintain rollback capabilities

### Deployment Strategies

- Start with shadow deployments
- Use canary releases for validation
- Implement A/B testing infrastructure
- Set up automated rollback triggers
- Monitor latency and throughput

## Integration Points

### Orchestration Tools

- **Apache Airflow**: DAG-based workflow orchestration
- **Dagster**: Asset-based pipeline orchestration
- **Kubeflow Pipelines**: Kubernetes-native ML workflows
- **Prefect**: Modern dataflow automation

### Experiment Tracking

- MLflow for experiment tracking and model registry
- Weights & Biases for visualization and collaboration
- TensorBoard for training metrics

### Deployment Platforms

- AWS SageMaker for managed ML infrastructure
- Google Vertex AI for GCP deployments
- Azure ML for Azure cloud
- Kubernetes + KServe for cloud-agnostic serving

## Progressive Disclosure

Start with the basics and gradually add complexity:

1. **Level 1**: Simple linear pipeline (data → train → deploy)
2. **Level 2**: Add validation and monitoring stages
3. **Level 3**: Implement hyperparameter tuning
4. **Level 4**: Add A/B testing and gradual rollouts
5. **Level 5**: Multi-model pipelines with ensemble strategies

## Common Patterns

### Batch Training Pipeline

```yaml
stages:
  - name: data_preparation
    dependencies: []
  - name: model_training
    dependencies: [data_preparation]
  - name: model_evaluation
    dependencies: [model_training]
  - name: model_deployment
    dependencies: [model_evaluation]
```

### Real-time Feature Pipeline

```python
# Stream processing for real-time features
# Combined with batch training for production
```

### Continuous Training

```python
# Automated retraining on schedule
# Triggered by data drift detection
```

## Troubleshooting

### Common Issues

- **Pipeline failures**: Check dependencies and data availability
- **Training instability**: Review hyperparameters and data quality
- **Deployment issues**: Validate model artifacts and serving config
- **Performance degradation**: Monitor data drift and model metrics

### Debugging Steps

1. Check pipeline logs for each stage
2. Validate input/output data at boundaries
3. Test components in isolation
4. Review experiment tracking metrics
5. Inspect model artifacts and metadata

## Next Steps

After setting up your pipeline:

1. Explore **hyperparameter-tuning** skill for optimization
2. Learn **experiment-tracking-setup** for MLflow/W&B
3. Review **model-deployment-patterns** for serving strategies
4. Implement monitoring with observability tools

## Related Skills

- **ml-pipeline-orchestrator**: SpecWeave-integrated ML development (use for increment-based ML)
- **experiment-tracker**: MLflow and Weights & Biases experiment tracking
- **automl-optimizer**: Automated hyperparameter optimization with Optuna/Hyperopt
- **ml-deployment-helper**: Model serving and deployment patterns

Overview

This skill designs DAG-based MLOps pipeline architectures that work across Airflow, Dagster, Kubeflow, and Prefect. It focuses on workflow architecture, reusable pipeline patterns, CI/CD for ML, and production-ready orchestration without SpecWeave-specific flows. Use it to plan, validate, and operationalize pipeline stages from ingestion to deployment.

How this skill works

The skill inspects pipeline goals, stage dependencies, data flow, and failure modes to produce platform-agnostic DAG designs and implementation guidance. It maps architecture decisions to concrete orchestration features (scheduling, retries, sensors), integration points (experiment tracking, model registry), and deployment strategies (canary, blue-green). It outputs stage definitions, dependency graphs, and recommended observability and versioning practices.

When to use it

  • Designing DAG-based workflow orchestration (Airflow, Dagster, Kubeflow, Prefect)
  • Creating platform-agnostic ML pipeline blueprints for teams
  • Setting up CI/CD automation for training, validation, and deployment
  • Defining reusable pipeline templates and modular components
  • Integrating pipelines with cloud ML services (SageMaker, Vertex AI, Azure ML)

Best practices

  • Design modular, independently testable stages with clear inputs/outputs
  • Ensure idempotency so stages can be safely re-run
  • Instrument observability: logs, metrics, lineage at every stage
  • Version data, code, and models consistently (DVC, model registry)
  • Implement retry logic, alerts, and automated rollback policies

Example use cases

  • Create an Airflow DAG that orchestrates ingestion, validation, feature engineering, train, evaluate, and deploy stages
  • Design a Dagster asset graph for shared feature pipelines and model assets
  • Implement a CI/CD flow to validate training jobs and auto-register models to MLflow
  • Build a Kubeflow pipeline for distributed training and hyperparameter sweeps
  • Define a Prefect flow for scheduled retraining triggered by data drift detection

FAQ

When should I choose a specific orchestrator?

Choose based on team needs: Airflow for mature scheduling and ecosystem, Dagster for asset-centric workflows, Kubeflow for Kubernetes-native ML, Prefect for modern Python-first flows and simpler orchestration.

How do I handle experiment tracking and model registry?

Integrate a tracking tool (MLflow, W&B) and a model registry. Record experiment metadata during training tasks and register approved models as part of the DAG to enable reproducible deployments.