home / skills / pluginagentmarketplace / custom-plugin-ai-data-scientist / mlops-deployment

mlops-deployment skill

/skills/mlops-deployment

This skill helps deploy and monitor ML models in production with Docker, Kubernetes, and CI/CD pipelines across cloud platforms.

npx playbooks add skill pluginagentmarketplace/custom-plugin-ai-data-scientist --skill mlops-deployment

Review the files below or copy the command above to add this skill to your agents.

Files (4)
SKILL.md
7.3 KB
---
name: mlops-deployment
description: Docker, Kubernetes, CI/CD, model monitoring, and cloud platforms. Use for deploying ML models to production, setting up pipelines, or infrastructure.
sasmp_version: "1.3.0"
bonded_agent: 06-mlops-deployment
bond_type: PRIMARY_BOND
---

# MLOps & Deployment

Deploy and maintain ML models in production with robust infrastructure.

## Quick Start

### Dockerize ML Model
```dockerfile
FROM python:3.10-slim

WORKDIR /app

# Install dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy model and code
COPY model.pkl .
COPY app.py .

# Expose port
EXPOSE 8000

# Health check
HEALTHCHECK --interval=30s --timeout=3s \
  CMD curl -f http://localhost:8000/health || exit 1

# Run
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000"]
```

### FastAPI Model Serving
```python
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
import joblib
import numpy as np

app = FastAPI()
model = joblib.load('model.pkl')

class PredictionRequest(BaseModel):
    features: list[float]

class PredictionResponse(BaseModel):
    prediction: float
    probability: float

@app.post('/predict', response_model=PredictionResponse)
async def predict(request: PredictionRequest):
    try:
        features = np.array(request.features).reshape(1, -1)
        prediction = model.predict(features)[0]
        probability = model.predict_proba(features)[0].max()

        return {
            'prediction': float(prediction),
            'probability': float(probability)
        }
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))

@app.get('/health')
async def health():
    return {'status': 'healthy'}
```

## Kubernetes Deployment

```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: ml-model
spec:
  replicas: 3
  selector:
    matchLabels:
      app: ml-model
  template:
    metadata:
      labels:
        app: ml-model
    spec:
      containers:
      - name: ml-model
        image: myregistry/ml-model:v1.0.0
        ports:
        - containerPort: 8000
        resources:
          requests:
            memory: "512Mi"
            cpu: "500m"
          limits:
            memory: "1Gi"
            cpu: "1000m"
        livenessProbe:
          httpGet:
            path: /health
            port: 8000
          initialDelaySeconds: 30
          periodSeconds: 10
---
apiVersion: v1
kind: Service
metadata:
  name: ml-model-service
spec:
  selector:
    app: ml-model
  ports:
  - protocol: TCP
    port: 80
    targetPort: 8000
  type: LoadBalancer
```

## CI/CD Pipeline (GitHub Actions)

```yaml
name: ML Pipeline

on:
  push:
    branches: [ main ]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v2

    - name: Set up Python
      uses: actions/setup-python@v2
      with:
        python-version: 3.10

    - name: Install dependencies
      run: |
        pip install -r requirements.txt
        pip install pytest pytest-cov

    - name: Run tests
      run: |
        pytest tests/ --cov=src

  train:
    needs: test
    runs-on: ubuntu-latest
    steps:
    - name: Train model
      run: python src/train.py

    - name: Evaluate model
      run: python src/evaluate.py

  deploy:
    needs: train
    runs-on: ubuntu-latest
    steps:
    - name: Build Docker image
      run: |
        docker build -t ${{ secrets.REGISTRY }}/ml-model:${{ github.sha }} .

    - name: Push to registry
      run: |
        docker push ${{ secrets.REGISTRY }}/ml-model:${{ github.sha }}

    - name: Deploy to Kubernetes
      run: |
        kubectl set image deployment/ml-model \
          ml-model=${{ secrets.REGISTRY }}/ml-model:${{ github.sha }}
```

## Model Monitoring

```python
from prometheus_client import Counter, Histogram, start_http_server
import time

# Metrics
prediction_counter = Counter(
    'model_predictions_total',
    'Total predictions'
)

prediction_latency = Histogram(
    'model_prediction_latency_seconds',
    'Prediction latency'
)

@app.post('/predict')
async def predict(request: PredictionRequest):
    start_time = time.time()

    try:
        prediction = model.predict(request.features)
        prediction_counter.inc()

    finally:
        latency = time.time() - start_time
        prediction_latency.observe(latency)

    return {'prediction': prediction}

# Start metrics server
start_http_server(9090)
```

## Data Drift Detection

```python
from evidently.report import Report
from evidently.metric_preset import DataDriftPreset

# Reference data (training)
reference = pd.read_csv('training_data.csv')

# Current production data
current = pd.read_csv('production_data.csv')

# Generate drift report
report = Report(metrics=[DataDriftPreset()])
report.run(reference_data=reference, current_data=current)

# Check drift
drift_detected = report.as_dict()['metrics'][0]['result']['dataset_drift']

if drift_detected:
    print("WARNING: Data drift detected!")
    trigger_retraining()
```

## MLflow Model Registry

```python
import mlflow
import mlflow.sklearn

mlflow.set_tracking_uri("http://localhost:5000")

with mlflow.start_run():
    # Train model
    model = RandomForestClassifier()
    model.fit(X_train, y_train)

    # Log parameters
    mlflow.log_param("n_estimators", 100)

    # Log metrics
    accuracy = model.score(X_test, y_test)
    mlflow.log_metric("accuracy", accuracy)

    # Log model
    mlflow.sklearn.log_model(
        model,
        "model",
        registered_model_name="RandomForest"
    )

# Promote to production
client = mlflow.tracking.MlflowClient()
client.transition_model_version_stage(
    name="RandomForest",
    version=1,
    stage="Production"
)
```

## A/B Testing

```python
@app.route('/predict', methods=['POST'])
def predict():
    user_id = request.json['user_id']
    features = request.json['features']

    # 10% traffic to model B
    if hash(user_id) % 100 < 10:
        model = model_b
        model_version = 'B'
    else:
        model = model_a
        model_version = 'A'

    prediction = model.predict([features])[0]

    # Log for analysis
    log_prediction(user_id, model_version, prediction)

    return {
        'prediction': prediction,
        'model_version': model_version
    }
```

## Cloud Deployment

### AWS SageMaker
```python
import sagemaker
from sagemaker.sklearn import SKLearn

estimator = SKLearn(
    entry_point='train.py',
    framework_version='1.0-1',
    instance_type='ml.m5.xlarge',
    role=sagemaker_role
)

estimator.fit({'training': 's3://bucket/data/train'})

# Deploy
predictor = estimator.deploy(
    initial_instance_count=2,
    instance_type='ml.m5.large'
)
```

### Google Cloud Vertex AI
```python
from google.cloud import aiplatform

aiplatform.init(project='my-project', location='us-central1')

model = aiplatform.Model.upload(
    display_name='sklearn-model',
    artifact_uri='gs://bucket/model',
    serving_container_image_uri='us-docker.pkg.dev/vertex-ai/prediction/sklearn-cpu.1-0:latest'
)

endpoint = model.deploy(
    machine_type='n1-standard-2',
    min_replica_count=1,
    max_replica_count=3
)
```

## Best Practices

1. **Version everything**: Code, data, models
2. **Monitor continuously**: Performance, drift, errors
3. **Automate testing**: Unit, integration, performance
4. **Use feature flags**: Gradual rollouts
5. **Implement rollback**: Quick recovery from issues
6. **Scale horizontally**: Multiple replicas
7. **Log predictions**: For debugging and retraining

Overview

This skill provides practical tooling and examples to deploy and operate machine learning models in production using Docker, Kubernetes, CI/CD, model monitoring, and cloud platforms. It bundles patterns for containerized serving, orchestration, automated pipelines, monitoring, drift detection, A/B testing, and model registry workflows. Use it to move models from experiments into reliable, observable production services.

How this skill works

The skill shows concrete artifacts and code snippets that you can adapt: Dockerfiles and FastAPI apps for model serving, Kubernetes manifests for scaling and health checks, and GitHub Actions pipelines to build, test, and deploy images. It also includes monitoring with Prometheus metrics, data-drift detection using Evidently, model registry interactions with MLflow, and recipes for cloud deployment on SageMaker and Vertex AI. Together these pieces form a repeatable deployment lifecycle from training to production promotion and monitoring.

When to use it

  • Deploy a trained model as a scalable HTTP service
  • Automate training, evaluation, and rollout with CI/CD pipelines
  • Set up observability for prediction latency, volume, and errors
  • Detect data drift and trigger retraining workflows
  • Run controlled experiments with A/B testing and canary rollouts
  • Promote models to production using a registry and versioning

Best practices

  • Version code, data, and model artifacts consistently (Git, artifact registry, model registry)
  • Include health, liveness, and readiness probes to let orchestrators manage traffic
  • Automate tests: unit tests for model code and integration tests for endpoints
  • Instrument metrics (counters, histograms) and export to Prometheus or a monitoring stack
  • Implement gradual rollouts (feature flags, traffic splitting) and automated rollback paths
  • Log predictions and metadata for debugging and retraining, respecting privacy and compliance

Example use cases

  • Containerize a scikit-learn model with FastAPI, build a Docker image, and deploy to Kubernetes with autoscaling
  • Create a GitHub Actions workflow that runs tests, trains a model, pushes a Docker image, and updates a deployment
  • Expose Prometheus metrics for prediction count and latency, and alert on SLA or error-rise
  • Use Evidently to compare production data against training data and trigger retraining when drift is detected
  • Register and promote a model via MLflow, then deploy the production version to SageMaker or Vertex AI

FAQ

Can I use this with frameworks other than scikit-learn?

Yes. The serving and deployment patterns are framework-agnostic; swap model loading and prediction code for TensorFlow, PyTorch, or custom frameworks.

How do I handle secrets and credentials in CI/CD?

Store registry credentials and cloud keys in the CI secret store, use least-privilege service roles, and avoid committing secrets to code or repo configuration.