home / skills / anton-abyzov / specweave / mlops-engineer

mlops-engineer skill

/plugins/specweave-ml/skills/mlops-engineer

This skill helps automate ML pipelines, experiment tracking, and deployment workflows, enabling scalable MLOps across cloud platforms with reliable monitoring.

This is most likely a fork of the sw-mlops-engineer skill from openclaw
npx playbooks add skill anton-abyzov/specweave --skill mlops-engineer

Review the files below or copy the command above to add this skill to your agents.

Files (1)
SKILL.md
953 B
---
name: mlops-engineer
description: MLOps expert - ML pipelines, experiment tracking, model registries with MLflow/Kubeflow. Use for automated training, deployment, and monitoring.
model: opus
context: fork
---

## ⚠️ Chunking for Large MLOps Platforms

When generating comprehensive MLOps platforms that exceed 1000 lines (e.g., complete ML infrastructure with MLflow, Kubeflow, automated training pipelines, model registry, and deployment automation), generate output **incrementally** to prevent crashes. Break large MLOps implementations into logical components (e.g., Experiment Tracking Setup → Model Registry → Training Pipelines → Deployment Automation → Monitoring) and ask the user which component to implement next. This ensures reliable delivery of MLOps infrastructure without overwhelming the system.

You are an MLOps engineer specializing in ML infrastructure, automation, and production ML systems across cloud platforms.

Overview

This skill is an MLOps engineer focused on building production-ready ML infrastructure: experiment tracking, model registries, automated training pipelines, deployment automation, and monitoring. It guides teams through designing and implementing end-to-end ML workflows using MLflow, Kubeflow, and cloud CI/CD patterns. Use it to create repeatable, testable systems that move models from research to production reliably.

How this skill works

I inspect your current ML stack, data sources, and deployment targets, then produce modular components: experiment tracking setup, model registry, training pipelines, deployment automation, and monitoring. For large platforms, I generate the system incrementally—breaking work into logical components and asking which piece to implement next—to avoid overwhelming the delivery and to enable iterative validation. Output includes TypeScript-friendly automation, CI/CD hooks (Azure DevOps), and CLI-friendly deployment artifacts.

When to use it

  • You need a reproducible experiment tracking and model registry with MLflow or Kubeflow.
  • You want automated training pipelines and retraining schedules tied to data/metric drift.
  • You plan to deploy models to Kubernetes or serverless endpoints with CI/CD (Azure DevOps).
  • You require production monitoring, alerting, and model lineage for compliance.
  • You need to convert research notebooks into tested, deployable pipelines and services.

Best practices

  • Design the platform in modular components (tracking → registry → pipelines → deployment → monitoring) and implement incrementally.
  • Use infrastructure-as-code and declarative CI pipelines (Azure DevOps/YAML) to keep environments reproducible.
  • Version everything: data, code, models, and pipeline definitions; store artifacts in a model registry.
  • Add automated tests and experiment reproducibility checks as part of pre-deploy CI gates.
  • Instrument models with monitoring and automated retraining triggers based on data or performance drift.

Example use cases

  • Set up MLflow experiment tracking and a model registry, with TypeScript-based automation for artifact promotion.
  • Create Kubeflow pipelines for scheduled training and validation, plus CI/CD integration in Azure DevOps.
  • Automate model deployment to Kubernetes with rollout strategies and observability dashboards.
  • Implement automated retraining pipelines that trigger on data drift and push new candidates to the registry.
  • Translate research experiments into production-ready services with tests, docs, and deployment scripts.

FAQ

How do you handle very large MLOps platforms without overloading output?

I break the system into logical components and produce each incrementally, then ask which component to implement next so delivery remains reliable and reviewable.

Which platforms and tools are supported?

Primary patterns target MLflow and Kubeflow with Azure DevOps CI/CD and Kubernetes deployments; artifacts and scripts are produced in TypeScript-friendly formats and CLI-ready assets.