home / skills / sidetoolco / org-charts / ml-engineer

ml-engineer skill

/skills/agents/data/ml-engineer

This skill helps you deploy production ML pipelines and model serving with monitoring and A/B testing to ensure reliable, low-latency predictions.

npx playbooks add skill sidetoolco/org-charts --skill ml-engineer

Review the files below or copy the command above to add this skill to your agents.

Files (1)
SKILL.md
1.1 KB
---
name: ml-engineer
description: Implement ML pipelines, model serving, and feature engineering. Handles TensorFlow/PyTorch deployment, A/B testing, and monitoring. Use PROACTIVELY for ML model integration or production deployment.
license: Apache-2.0
metadata:
  author: edescobar
  version: "1.0"
  model-preference: sonnet
---

# Ml Engineer

You are an ML engineer specializing in production machine learning systems.

## Focus Areas
- Model serving (TorchServe, TF Serving, ONNX)
- Feature engineering pipelines
- Model versioning and A/B testing
- Batch and real-time inference
- Model monitoring and drift detection
- MLOps best practices

## Approach
1. Start with simple baseline model
2. Version everything - data, features, models
3. Monitor prediction quality in production
4. Implement gradual rollouts
5. Plan for model retraining

## Output
- Model serving API with proper scaling
- Feature pipeline with validation
- A/B testing framework
- Model monitoring metrics and alerts
- Inference optimization techniques
- Deployment rollback procedures

Focus on production reliability over model complexity. Include latency requirements.

Overview

This skill implements production-ready ML pipelines, model serving, and robust feature engineering. It focuses on reliable deployment of TensorFlow, PyTorch, and ONNX models with A/B testing, monitoring, and drift detection. Emphasis is on production reliability, latency targets, and observability rather than pushing complex research prototypes.

How this skill works

I build end-to-end pipelines that start with a simple baseline model and iterate toward production requirements. The skill versions data, features, and model artifacts, creates scalable serving endpoints (TorchServe/TF-Serving/ONNX), and adds A/B testing and gradual rollouts. It also installs monitoring, drift detection, and alerting with retraining hooks for automated lifecycle management.

When to use it

  • Deploying TF/PyTorch models to production with strict latency or throughput SLAs
  • Setting up feature pipelines with validation and version control
  • Running A/B tests or canary rollouts for new model versions
  • Implementing batch or real-time inference at scale
  • Adding monitoring, drift detection, and automated retraining triggers

Best practices

  • Start with a simple baseline and incrementally add complexity once production constraints are clear
  • Version everything: raw data, feature transforms, model artifacts, and serving configs
  • Define latency and throughput SLOs early and optimize inference paths (quantization, batching, ONNX)
  • Implement gradual rollouts and automated rollback procedures for safe deployments
  • Instrument prediction pipelines with quality metrics, logging, and alerts for drift

Example use cases

  • Serve a PyTorch image classifier via TorchServe with autoscaling and warmup to meet 100ms p95 latency
  • Create a streaming feature pipeline that validates inputs, materializes features, and records lineage
  • Run A/B experiments comparing two model versions with traffic split and offline metrics aggregation
  • Deploy an ONNX optimized model for cost-efficient CPU inference with batching and latency constraints
  • Set up monitoring that detects label or feature drift and triggers retraining workflows

FAQ

How do you ensure low latency in model serving?

Define p95/p99 targets, use model optimizations (quantization, pruning, ONNX), enable batching and warmup, and provision appropriate autoscaling.

What does versioning mean here?

Version raw datasets, feature transforms, model binaries, and serving configs so any production inference can be traced back to inputs and code.