home / skills / anton-abyzov / specweave / ml-engineer

ml-engineer skill

safe

/plugins/specweave-ml/skills/ml-engineer

This skill helps build and optimize ML pipelines by enforcing best practices, experiment tracking, cross-validation, and explainability throughout stages.

This is most likely a fork of the sw-ml-engineer skill from openclaw

npx playbooks add skill anton-abyzov/specweave --skill ml-engineer

Review the files below or copy the command above to add this skill to your agents.

Files (1)

SKILL.md

431 B

---
name: ml-engineer
description: ML system builder enforcing best practices - baseline comparison, cross-validation, experiment tracking, explainability (SHAP/LIME). Use for ML pipelines, model training, production ML.
model: opus
context: fork
---

# ML Engineer Agent

## ⚠️ Chunking Rule

Large ML pipelines = 1000+ lines. Generate ONE stage per response: Data/EDA → Features → Training → Evaluation → Deployment.

Overview

This skill builds ML systems that enforce engineering best practices for robust, production-ready models. It guides pipeline construction, baseline comparison, cross-validation, experiment tracking, and model explainability using SHAP/LIME. The implementation targets TypeScript environments and integrates with CI/CD and developer tooling.

How this skill works

The agent inspects project artifacts and generates one pipeline stage per response: Data/EDA → Features → Training → Evaluation → Deployment, keeping large pipelines manageable. For each stage it emits concrete specs, tests, and TypeScript scaffold code, plus configuration for experiment tracking and model explainability hooks. It validates choices against baseline models and cross-validation protocols, and adds CI/CD and documentation snippets suited to production ML.

When to use it

Starting a new ML project that must meet production standards and traceability
Converting a research notebook into a reliable TypeScript-based pipeline
Implementing experiment tracking, reproducible training, or automatic baseline checks
Adding explainability (SHAP/LIME) and validation to an existing model workflow
Preparing ML components for deployment with CI/CD and automated testing

Best practices

Follow the one-stage-per-response chunking rule for pipelines >1000 lines to keep reviews focused
Always include a simple baseline model and automated baseline comparison tests
Use k-fold cross-validation for performance estimates and log folds in experiment tracking
Integrate SHAP or LIME outputs into evaluation reports and store artifacts with experiments
Make every model step reproducible: fixed seeds, environment capture, and dataset versioning

Example use cases

Generate a TypeScript training scaffold that performs feature engineering, trains models, and logs runs to an experiment server
Add evaluation stage that runs cross-validation, compares to a baseline, and produces SHAP explanations for the top features
Produce deployment specs and CI pipelines to validate model contracts and serve a prediction API
Convert an end-to-end Jupyter notebook into modular pipeline stages with tests and docs
Create monitoring hooks that capture drift metrics and replay data for retraining

FAQ

How does the one-stage-per-response rule affect workflow?

It forces small, reviewable increments: request a specific stage and receive focused specs, code, and tests for that stage only.

Which explainability tools are supported?

The workflow includes SHAP and LIME integrations for feature-level explanations and exports explainability artifacts alongside experiment logs.