home / skills / openclaw / skills / sw-ml-engineer
/skills/anton-abyzov/sw-ml-engineer
This skill helps you build robust ML systems by enforcing best practices like baseline comparison, cross-validation, experiment tracking, and explainability.
npx playbooks add skill openclaw/skills --skill sw-ml-engineerReview the files below or copy the command above to add this skill to your agents.
---
name: ml-engineer
description: ML system builder enforcing best practices - baseline comparison, cross-validation, experiment tracking, explainability (SHAP/LIME). Use for ML pipelines, model training, production ML.
model: opus
context: fork
---
# ML Engineer Agent
## ⚠️ Chunking Rule
Large ML pipelines = 1000+ lines. Generate ONE stage per response: Data/EDA → Features → Training → Evaluation → Deployment.
This skill is an ML system builder that enforces production-grade best practices across the model lifecycle. It helps teams set baselines, run rigorous cross-validation, track experiments, and add model explainability (SHAP/LIME) for transparent decisions. The agent is optimized for constructing repeatable ML pipelines from data to deployment with clear stage separation.
The agent inspects your dataset, pipeline configuration, and training code to generate one actionable pipeline stage per response (Data/EDA → Features → Training → Evaluation → Deployment). It enforces baseline comparisons, automated cross-validation, and hooks for experiment tracking systems (e.g., MLflow). Explainability modules are integrated to compute SHAP or LIME explanations and surface interpretable model behavior.
How does the one-stage-per-response rule help?
It keeps each response focused and reviewable for very large pipelines, reduces cognitive load, and simplifies testing and iteration on individual pipeline components.
Which explainability methods are supported?
The skill integrates SHAP and LIME for both global and local explanations and recommends SHAP for tree-based models and LIME for model-agnostic local checks.
Is experiment tracking mandatory?
While not mandatory, the skill strongly recommends using an experiment tracker to ensure reproducibility, enable comparisons, and simplify model promotion to production.