home / skills / 404kidwiz / claude-supercode-skills / mlops-engineer-skill

mlops-engineer-skill skill

/mlops-engineer-skill

This skill helps you design and operate end-to-end ML pipelines, version models, deploy serving, and monitor production systems.

npx playbooks add skill 404kidwiz/claude-supercode-skills --skill mlops-engineer-skill

Review the files below or copy the command above to add this skill to your agents.

Files (2)
SKILL.md
3.3 KB
---
name: mlops-engineer
description: Expert in Machine Learning Operations bridging data science and DevOps. Use when building ML pipelines, model versioning, feature stores, or production ML serving. Triggers include "MLOps", "ML pipeline", "model deployment", "feature store", "model versioning", "ML monitoring", "Kubeflow", "MLflow".
---

# MLOps Engineer

## Purpose
Provides expertise in Machine Learning Operations, bridging data science and DevOps practices. Specializes in end-to-end ML lifecycles from training pipelines to production serving, model versioning, and monitoring.

## When to Use
- Building ML training and serving pipelines
- Implementing model versioning and registry
- Setting up feature stores
- Deploying models to production
- Monitoring model performance and drift
- Automating ML workflows (CI/CD for ML)
- Implementing A/B testing for models
- Managing experiment tracking

## Quick Start
**Invoke this skill when:**
- Building ML pipelines and workflows
- Deploying models to production
- Setting up model versioning and registry
- Implementing feature stores
- Monitoring production ML systems

**Do NOT invoke when:**
- Model development and training → use `/ml-engineer`
- Data pipeline ETL → use `/data-engineer`
- Kubernetes infrastructure → use `/kubernetes-specialist`
- General CI/CD without ML → use `/devops-engineer`

## Decision Framework
```
ML Lifecycle Stage?
├── Experimentation
│   └── MLflow/Weights & Biases for tracking
├── Training Pipeline
│   └── Kubeflow/Airflow/Vertex AI
├── Model Registry
│   └── MLflow Registry/Vertex Model Registry
├── Serving
│   ├── Batch → Spark/Dataflow
│   └── Real-time → TF Serving/Seldon/KServe
└── Monitoring
    └── Evidently/Fiddler/custom metrics
```

## Core Workflows

### 1. ML Pipeline Setup
1. Define pipeline stages (data prep, training, eval)
2. Choose orchestrator (Kubeflow, Airflow, Vertex)
3. Containerize each pipeline step
4. Implement artifact storage
5. Add experiment tracking
6. Configure automated retraining triggers

### 2. Model Deployment
1. Register model in model registry
2. Build serving container
3. Deploy to serving infrastructure
4. Configure autoscaling
5. Implement canary/shadow deployment
6. Set up monitoring and alerts

### 3. Model Monitoring
1. Define key metrics (latency, throughput, accuracy)
2. Implement data drift detection
3. Set up prediction monitoring
4. Create alerting thresholds
5. Build dashboards for visibility
6. Automate retraining triggers

## Best Practices
- Version everything: code, data, models, configs
- Use feature stores for consistency between training and serving
- Implement CI/CD specifically designed for ML workflows
- Monitor data drift and model performance continuously
- Use canary deployments for model rollouts
- Keep training and serving environments consistent

## Anti-Patterns
| Anti-Pattern | Problem | Correct Approach |
|--------------|---------|------------------|
| Manual deployments | Error-prone, slow | Automated ML CI/CD |
| Training-serving skew | Prediction errors | Feature stores |
| No model versioning | Can't reproduce or rollback | Model registry |
| Ignoring data drift | Silent degradation | Continuous monitoring |
| Notebook-to-production | Unmaintainable | Proper pipeline code |

Overview

This skill is an MLOps engineer that bridges data science and DevOps to deliver reliable production ML systems. It focuses on end-to-end ML lifecycle tasks including pipeline orchestration, model versioning, feature stores, deployment, and monitoring. Use it to design, implement, or review production-ready ML workflows and operational controls.

How this skill works

The skill inspects project goals and current infrastructure, then recommends concrete components and patterns: orchestrators (Kubeflow, Airflow), registries (MLflow, Vertex), serving options (KServe, TF Serving), and monitoring tooling (Evidently, custom metrics). It prescribes steps for pipeline construction, containerization, CI/CD for models, canary deployments, and automated retraining triggers. Outputs include implementation checklists, architecture diagrams, and prioritized tasks to reduce training-serving skew and enable reproducible rollouts.

When to use it

  • Building or refactoring ML training and serving pipelines
  • Implementing model versioning and a model registry
  • Designing or integrating a feature store for consistency
  • Deploying models to production with safety controls
  • Setting up monitoring, drift detection, and automated retraining
  • Automating ML workflows and model CI/CD

Best practices

  • Version everything: code, data, models, and configs for reproducibility
  • Use a feature store to eliminate training-serving skew and ensure feature consistency
  • Adopt ML-specific CI/CD that tests data, model performance, and infra changes
  • Instrument model serving with latency, throughput, accuracy, and drift metrics
  • Roll out changes via canary or shadow deployments to limit blast radius
  • Automate alerts and retraining triggers when drift or performance regressions occur

Example use cases

  • Design a Kubeflow or Airflow pipeline that covers data prep, training, evaluation, and deployment
  • Set up an MLflow registry and CI pipeline that builds, tests, and versions model artifacts
  • Integrate a feature store so the same features are used in training and serving
  • Deploy a real-time model with KServe and configure autoscaling and canary rollout
  • Create monitoring dashboards and drift detection that trigger automated retraining workflows

FAQ

When should I use a feature store?

Use a feature store when you need consistent, low-latency feature access across training and serving to prevent training-serving skew and simplify feature engineering reuse.

How do I choose between batch and real-time serving?

Choose batch for large-volume, non-latency-sensitive tasks (e.g., nightly scoring) and real-time for low-latency predictions; hybrid approaches are common depending on use case and cost constraints.