home / skills / pluginagentmarketplace / custom-plugin-ai-data-scientist / mlops-deployment
This skill helps deploy and monitor ML models in production with Docker, Kubernetes, and CI/CD pipelines across cloud platforms.
npx playbooks add skill pluginagentmarketplace/custom-plugin-ai-data-scientist --skill mlops-deploymentReview the files below or copy the command above to add this skill to your agents.
---
name: mlops-deployment
description: Docker, Kubernetes, CI/CD, model monitoring, and cloud platforms. Use for deploying ML models to production, setting up pipelines, or infrastructure.
sasmp_version: "1.3.0"
bonded_agent: 06-mlops-deployment
bond_type: PRIMARY_BOND
---
# MLOps & Deployment
Deploy and maintain ML models in production with robust infrastructure.
## Quick Start
### Dockerize ML Model
```dockerfile
FROM python:3.10-slim
WORKDIR /app
# Install dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Copy model and code
COPY model.pkl .
COPY app.py .
# Expose port
EXPOSE 8000
# Health check
HEALTHCHECK --interval=30s --timeout=3s \
CMD curl -f http://localhost:8000/health || exit 1
# Run
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000"]
```
### FastAPI Model Serving
```python
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
import joblib
import numpy as np
app = FastAPI()
model = joblib.load('model.pkl')
class PredictionRequest(BaseModel):
features: list[float]
class PredictionResponse(BaseModel):
prediction: float
probability: float
@app.post('/predict', response_model=PredictionResponse)
async def predict(request: PredictionRequest):
try:
features = np.array(request.features).reshape(1, -1)
prediction = model.predict(features)[0]
probability = model.predict_proba(features)[0].max()
return {
'prediction': float(prediction),
'probability': float(probability)
}
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))
@app.get('/health')
async def health():
return {'status': 'healthy'}
```
## Kubernetes Deployment
```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: ml-model
spec:
replicas: 3
selector:
matchLabels:
app: ml-model
template:
metadata:
labels:
app: ml-model
spec:
containers:
- name: ml-model
image: myregistry/ml-model:v1.0.0
ports:
- containerPort: 8000
resources:
requests:
memory: "512Mi"
cpu: "500m"
limits:
memory: "1Gi"
cpu: "1000m"
livenessProbe:
httpGet:
path: /health
port: 8000
initialDelaySeconds: 30
periodSeconds: 10
---
apiVersion: v1
kind: Service
metadata:
name: ml-model-service
spec:
selector:
app: ml-model
ports:
- protocol: TCP
port: 80
targetPort: 8000
type: LoadBalancer
```
## CI/CD Pipeline (GitHub Actions)
```yaml
name: ML Pipeline
on:
push:
branches: [ main ]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Set up Python
uses: actions/setup-python@v2
with:
python-version: 3.10
- name: Install dependencies
run: |
pip install -r requirements.txt
pip install pytest pytest-cov
- name: Run tests
run: |
pytest tests/ --cov=src
train:
needs: test
runs-on: ubuntu-latest
steps:
- name: Train model
run: python src/train.py
- name: Evaluate model
run: python src/evaluate.py
deploy:
needs: train
runs-on: ubuntu-latest
steps:
- name: Build Docker image
run: |
docker build -t ${{ secrets.REGISTRY }}/ml-model:${{ github.sha }} .
- name: Push to registry
run: |
docker push ${{ secrets.REGISTRY }}/ml-model:${{ github.sha }}
- name: Deploy to Kubernetes
run: |
kubectl set image deployment/ml-model \
ml-model=${{ secrets.REGISTRY }}/ml-model:${{ github.sha }}
```
## Model Monitoring
```python
from prometheus_client import Counter, Histogram, start_http_server
import time
# Metrics
prediction_counter = Counter(
'model_predictions_total',
'Total predictions'
)
prediction_latency = Histogram(
'model_prediction_latency_seconds',
'Prediction latency'
)
@app.post('/predict')
async def predict(request: PredictionRequest):
start_time = time.time()
try:
prediction = model.predict(request.features)
prediction_counter.inc()
finally:
latency = time.time() - start_time
prediction_latency.observe(latency)
return {'prediction': prediction}
# Start metrics server
start_http_server(9090)
```
## Data Drift Detection
```python
from evidently.report import Report
from evidently.metric_preset import DataDriftPreset
# Reference data (training)
reference = pd.read_csv('training_data.csv')
# Current production data
current = pd.read_csv('production_data.csv')
# Generate drift report
report = Report(metrics=[DataDriftPreset()])
report.run(reference_data=reference, current_data=current)
# Check drift
drift_detected = report.as_dict()['metrics'][0]['result']['dataset_drift']
if drift_detected:
print("WARNING: Data drift detected!")
trigger_retraining()
```
## MLflow Model Registry
```python
import mlflow
import mlflow.sklearn
mlflow.set_tracking_uri("http://localhost:5000")
with mlflow.start_run():
# Train model
model = RandomForestClassifier()
model.fit(X_train, y_train)
# Log parameters
mlflow.log_param("n_estimators", 100)
# Log metrics
accuracy = model.score(X_test, y_test)
mlflow.log_metric("accuracy", accuracy)
# Log model
mlflow.sklearn.log_model(
model,
"model",
registered_model_name="RandomForest"
)
# Promote to production
client = mlflow.tracking.MlflowClient()
client.transition_model_version_stage(
name="RandomForest",
version=1,
stage="Production"
)
```
## A/B Testing
```python
@app.route('/predict', methods=['POST'])
def predict():
user_id = request.json['user_id']
features = request.json['features']
# 10% traffic to model B
if hash(user_id) % 100 < 10:
model = model_b
model_version = 'B'
else:
model = model_a
model_version = 'A'
prediction = model.predict([features])[0]
# Log for analysis
log_prediction(user_id, model_version, prediction)
return {
'prediction': prediction,
'model_version': model_version
}
```
## Cloud Deployment
### AWS SageMaker
```python
import sagemaker
from sagemaker.sklearn import SKLearn
estimator = SKLearn(
entry_point='train.py',
framework_version='1.0-1',
instance_type='ml.m5.xlarge',
role=sagemaker_role
)
estimator.fit({'training': 's3://bucket/data/train'})
# Deploy
predictor = estimator.deploy(
initial_instance_count=2,
instance_type='ml.m5.large'
)
```
### Google Cloud Vertex AI
```python
from google.cloud import aiplatform
aiplatform.init(project='my-project', location='us-central1')
model = aiplatform.Model.upload(
display_name='sklearn-model',
artifact_uri='gs://bucket/model',
serving_container_image_uri='us-docker.pkg.dev/vertex-ai/prediction/sklearn-cpu.1-0:latest'
)
endpoint = model.deploy(
machine_type='n1-standard-2',
min_replica_count=1,
max_replica_count=3
)
```
## Best Practices
1. **Version everything**: Code, data, models
2. **Monitor continuously**: Performance, drift, errors
3. **Automate testing**: Unit, integration, performance
4. **Use feature flags**: Gradual rollouts
5. **Implement rollback**: Quick recovery from issues
6. **Scale horizontally**: Multiple replicas
7. **Log predictions**: For debugging and retraining
This skill provides practical tooling and examples to deploy and operate machine learning models in production using Docker, Kubernetes, CI/CD, model monitoring, and cloud platforms. It bundles patterns for containerized serving, orchestration, automated pipelines, monitoring, drift detection, A/B testing, and model registry workflows. Use it to move models from experiments into reliable, observable production services.
The skill shows concrete artifacts and code snippets that you can adapt: Dockerfiles and FastAPI apps for model serving, Kubernetes manifests for scaling and health checks, and GitHub Actions pipelines to build, test, and deploy images. It also includes monitoring with Prometheus metrics, data-drift detection using Evidently, model registry interactions with MLflow, and recipes for cloud deployment on SageMaker and Vertex AI. Together these pieces form a repeatable deployment lifecycle from training to production promotion and monitoring.
Can I use this with frameworks other than scikit-learn?
Yes. The serving and deployment patterns are framework-agnostic; swap model loading and prediction code for TensorFlow, PyTorch, or custom frameworks.
How do I handle secrets and credentials in CI/CD?
Store registry credentials and cloud keys in the CI secret store, use least-privilege service roles, and avoid committing secrets to code or repo configuration.