home / skills / personamanagmentlayer / pcl / ml-expert

ml-expert skill

/stdlib/ai/ml-expert

This skill provides expert guidance on machine learning, deep learning, and MLOps to streamline model development and deployment.

npx playbooks add skill personamanagmentlayer/pcl --skill ml-expert

Review the files below or copy the command above to add this skill to your agents.

Files (1)
SKILL.md
10.4 KB
---
name: ml-expert
version: 1.0.0
description: Expert-level machine learning, deep learning, model training, and MLOps
category: ai
tags: [machine-learning, deep-learning, neural-networks, mlops, data-science]
allowed-tools:
  - Read
  - Write
  - Edit
  - Bash(python:*)
---

# Machine Learning Expert

Expert guidance for machine learning systems, deep learning, model training, deployment, and MLOps practices.

## Core Concepts

### Machine Learning Fundamentals
- Supervised learning (classification, regression)
- Unsupervised learning (clustering, dimensionality reduction)
- Reinforcement learning
- Feature engineering
- Model evaluation and validation
- Hyperparameter tuning

### Deep Learning
- Neural networks (CNNs, RNNs, Transformers)
- Transfer learning
- Fine-tuning pre-trained models
- Attention mechanisms
- GANs (Generative Adversarial Networks)
- Autoencoders

### MLOps
- Model versioning and tracking
- Experiment management
- Model deployment and serving
- Monitoring and retraining
- CI/CD for ML pipelines
- A/B testing for models

## Supervised Learning

```python
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, confusion_matrix
import joblib

class MLPipeline:
    def __init__(self):
        self.scaler = StandardScaler()
        self.model = None
        self.feature_names = None

    def prepare_data(self, X: pd.DataFrame, y: pd.Series, test_size: float = 0.2):
        """Split and scale data"""
        X_train, X_test, y_train, y_test = train_test_split(
            X, y, test_size=test_size, random_state=42, stratify=y
        )

        # Scale features
        X_train_scaled = self.scaler.fit_transform(X_train)
        X_test_scaled = self.scaler.transform(X_test)

        self.feature_names = X.columns.tolist()

        return X_train_scaled, X_test_scaled, y_train, y_test

    def train_classifier(self, X_train, y_train, n_estimators: int = 100):
        """Train random forest classifier"""
        self.model = RandomForestClassifier(
            n_estimators=n_estimators,
            max_depth=10,
            random_state=42,
            n_jobs=-1
        )

        self.model.fit(X_train, y_train)

        # Cross-validation
        cv_scores = cross_val_score(self.model, X_train, y_train, cv=5)

        return {
            "cv_mean": cv_scores.mean(),
            "cv_std": cv_scores.std(),
            "feature_importance": dict(zip(
                self.feature_names,
                self.model.feature_importances_
            ))
        }

    def evaluate(self, X_test, y_test) -> dict:
        """Evaluate model performance"""
        y_pred = self.model.predict(X_test)
        y_proba = self.model.predict_proba(X_test)

        return {
            "predictions": y_pred,
            "probabilities": y_proba,
            "confusion_matrix": confusion_matrix(y_test, y_pred).tolist(),
            "classification_report": classification_report(y_test, y_pred, output_dict=True)
        }

    def save_model(self, path: str):
        """Save model and scaler"""
        joblib.dump({
            "model": self.model,
            "scaler": self.scaler,
            "feature_names": self.feature_names
        }, path)
```

## Deep Learning with PyTorch

```python
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, Dataset

class NeuralNetwork(nn.Module):
    def __init__(self, input_size: int, hidden_size: int, num_classes: int):
        super().__init__()
        self.fc1 = nn.Linear(input_size, hidden_size)
        self.relu = nn.ReLU()
        self.dropout = nn.Dropout(0.3)
        self.fc2 = nn.Linear(hidden_size, hidden_size // 2)
        self.fc3 = nn.Linear(hidden_size // 2, num_classes)

    def forward(self, x):
        x = self.fc1(x)
        x = self.relu(x)
        x = self.dropout(x)
        x = self.fc2(x)
        x = self.relu(x)
        x = self.fc3(x)
        return x

class Trainer:
    def __init__(self, model, device='cuda' if torch.cuda.is_available() else 'cpu'):
        self.model = model.to(device)
        self.device = device
        self.criterion = nn.CrossEntropyLoss()
        self.optimizer = optim.Adam(model.parameters(), lr=0.001)

    def train_epoch(self, dataloader: DataLoader) -> float:
        """Train for one epoch"""
        self.model.train()
        total_loss = 0

        for batch_idx, (data, target) in enumerate(dataloader):
            data, target = data.to(self.device), target.to(self.device)

            self.optimizer.zero_grad()
            output = self.model(data)
            loss = self.criterion(output, target)

            loss.backward()
            self.optimizer.step()

            total_loss += loss.item()

        return total_loss / len(dataloader)

    def evaluate(self, dataloader: DataLoader) -> dict:
        """Evaluate model"""
        self.model.eval()
        correct = 0
        total = 0

        with torch.no_grad():
            for data, target in dataloader:
                data, target = data.to(self.device), target.to(self.device)
                output = self.model(data)
                _, predicted = torch.max(output.data, 1)
                total += target.size(0)
                correct += (predicted == target).sum().item()

        return {
            "accuracy": 100 * correct / total,
            "total_samples": total
        }

    def train(self, train_loader: DataLoader, val_loader: DataLoader,
              epochs: int = 10):
        """Full training loop"""
        history = {"train_loss": [], "val_acc": []}

        for epoch in range(epochs):
            train_loss = self.train_epoch(train_loader)
            val_metrics = self.evaluate(val_loader)

            history["train_loss"].append(train_loss)
            history["val_acc"].append(val_metrics["accuracy"])

            print(f"Epoch {epoch+1}/{epochs} - Loss: {train_loss:.4f} - Val Acc: {val_metrics['accuracy']:.2f}%")

        return history
```

## Model Deployment

```python
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
import numpy as np

app = FastAPI()

class PredictionRequest(BaseModel):
    features: list[float]

class PredictionResponse(BaseModel):
    prediction: int
    probability: float
    model_version: str

class ModelServer:
    def __init__(self, model_path: str):
        self.model_data = joblib.load(model_path)
        self.model = self.model_data["model"]
        self.scaler = self.model_data["scaler"]
        self.version = "1.0.0"

    def predict(self, features: np.ndarray) -> dict:
        """Make prediction"""
        # Scale features
        features_scaled = self.scaler.transform(features.reshape(1, -1))

        # Predict
        prediction = self.model.predict(features_scaled)[0]
        probability = self.model.predict_proba(features_scaled)[0].max()

        return {
            "prediction": int(prediction),
            "probability": float(probability),
            "model_version": self.version
        }

# Global model instance
model_server = ModelServer("model.pkl")

@app.post("/predict", response_model=PredictionResponse)
async def predict(request: PredictionRequest):
    try:
        features = np.array(request.features)
        result = model_server.predict(features)
        return PredictionResponse(**result)
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))

@app.get("/health")
async def health():
    return {"status": "healthy", "model_version": model_server.version}
```

## MLOps with MLflow

```python
import mlflow
import mlflow.sklearn
from mlflow.tracking import MlflowClient

class MLflowExperiment:
    def __init__(self, experiment_name: str):
        mlflow.set_experiment(experiment_name)
        self.client = MlflowClient()

    def log_training_run(self, model, X_train, y_train, X_test, y_test,
                        params: dict):
        """Log training run with MLflow"""
        with mlflow.start_run():
            # Log parameters
            mlflow.log_params(params)

            # Train model
            model.fit(X_train, y_train)

            # Evaluate
            train_score = model.score(X_train, y_train)
            test_score = model.score(X_test, y_test)

            # Log metrics
            mlflow.log_metric("train_accuracy", train_score)
            mlflow.log_metric("test_accuracy", test_score)

            # Log model
            mlflow.sklearn.log_model(model, "model")

            # Log feature importance
            if hasattr(model, 'feature_importances_'):
                feature_importance = dict(enumerate(model.feature_importances_))
                mlflow.log_dict(feature_importance, "feature_importance.json")

            run_id = mlflow.active_run().info.run_id
            return run_id

    def register_model(self, run_id: str, model_name: str):
        """Register model in MLflow model registry"""
        model_uri = f"runs:/{run_id}/model"
        mlflow.register_model(model_uri, model_name)

    def promote_to_production(self, model_name: str, version: int):
        """Promote model version to production"""
        self.client.transition_model_version_stage(
            name=model_name,
            version=version,
            stage="Production"
        )
```

## Best Practices

### Data Preparation
- Handle missing values appropriately
- Scale/normalize features
- Encode categorical variables properly
- Split data before any preprocessing
- Use stratified splits for imbalanced data
- Create validation set for hyperparameter tuning

### Model Training
- Start with simple baselines
- Use cross-validation
- Monitor training and validation metrics
- Implement early stopping
- Save best model checkpoints
- Track experiments systematically

### Deployment
- Version models and datasets
- Monitor model performance in production
- Implement model A/B testing
- Set up retraining pipelines
- Log predictions for analysis
- Implement fallback mechanisms

## Anti-Patterns

❌ Training on test data (data leakage)
❌ No validation set for hyperparameter tuning
❌ Ignoring class imbalance
❌ Not scaling features
❌ Overfitting to training data
❌ No model versioning
❌ Missing monitoring in production

## Resources

- Scikit-learn: https://scikit-learn.org/
- PyTorch: https://pytorch.org/
- TensorFlow: https://www.tensorflow.org/
- MLflow: https://mlflow.org/
- Hugging Face: https://huggingface.co/

Overview

This skill provides expert-level guidance and practical code patterns for machine learning, deep learning, model training, deployment, and MLOps. It combines concise conceptual summaries with runnable examples for supervised learning, PyTorch training loops, FastAPI model serving, and MLflow experiment tracking. The focus is on reproducible pipelines, production readiness, and scalable model lifecycle management.

How this skill works

The skill inspects and explains end-to-end workflows: data preparation, model training, evaluation, and deployment. It provides code patterns for feature scaling, model training (classical and neural), evaluation metrics, model serialization, REST serving, and MLflow-based experiment tracking and registry operations. It highlights common pitfalls, monitoring needs, and automation points for CI/CD and retraining.

When to use it

  • Building a reproducible supervised learning pipeline for classification or regression
  • Training and fine-tuning neural networks with PyTorch for custom datasets
  • Packaging and serving models as a REST API with versioned artifacts
  • Instrumenting experiments and model lifecycle using MLflow for tracking and registry
  • Designing MLOps practices: CI/CD for models, monitoring, A/B testing, and retraining strategies

Best practices

  • Split data before preprocessing and use stratified splits for imbalanced classes
  • Start with simple baselines, use cross-validation, and track experiments systematically
  • Scale or normalize features and encode categorical variables consistently across train/serve
  • Save best checkpoints, implement early stopping, and log model artifacts and metrics
  • Version models and datasets, monitor production performance, and log predictions for analysis

Example use cases

  • Train a random forest baseline, log runs to MLflow, and promote a best run to production
  • Train a PyTorch model with a reusable Trainer class and export checkpoints for serving
  • Deploy a serialized model with a FastAPI endpoint that returns prediction, probability, and model version
  • Implement automated retraining pipeline that triggers on data drift and promotes improved models via MLflow registry
  • Run A/B tests between model versions and use monitoring to detect performance regressions

FAQ

What model types does this skill cover?

It covers classical supervised models (e.g., random forest), neural networks (CNNs, RNNs, transformers), transfer learning, GANs, and autoencoders.

How do I ensure production stability?

Version models and data, implement monitoring and fallback logic, run A/B tests, log predictions and metrics, and automate retraining when drift is detected.