home / skills / eyadsibai / ltk / ml-engineering

ml-engineering skill

Q: How do I secure model endpoints that handle sensitive data?

Enforce authentication/authorization, TLS for in-transit encryption, at-rest encryption, input validation, PII redaction, and audit logging. Consider fine-grained access controls in the model registry.

safe

/plugins/ltk-data/skills/ml-engineering

This skill helps you deploy ML models and manage MLOps pipelines, enabling reliable production systems, monitoring, and scalable feature stores.

npx playbooks add skill eyadsibai/ltk --skill ml-engineering

Review the files below or copy the command above to add this skill to your agents.

Files (1)

SKILL.md

3.6 KB

---
name: ml-engineering
description: Use when "deploying ML models", "MLOps", "model serving", "feature stores", "model monitoring", or asking about "PyTorch deployment", "TensorFlow production", "RAG systems", "LLM integration", "ML infrastructure"
version: 1.0.0
---

<!-- Adapted from: claude-skills/engineering-team/senior-ml-engineer -->

# ML Engineering Guide

Production-grade ML/AI systems, MLOps, and model deployment.

## When to Use

- Deploying ML models to production
- Building ML platforms and infrastructure
- Implementing MLOps pipelines
- Integrating LLMs into production systems
- Setting up model monitoring and drift detection

## Tech Stack

| Category | Tools |
|----------|-------|
| ML Frameworks | PyTorch, TensorFlow, Scikit-learn, XGBoost |
| LLM Frameworks | LangChain, LlamaIndex, DSPy |
| Data Tools | Spark, Airflow, dbt, Kafka, Databricks |
| Deployment | Docker, Kubernetes, AWS/GCP/Azure |
| Monitoring | MLflow, Weights & Biases, Prometheus |
| Databases | PostgreSQL, BigQuery, Snowflake, Pinecone |

## Production Patterns

### Model Deployment Pipeline

```python
# Model serving with FastAPI
from fastapi import FastAPI
import torch

app = FastAPI()
model = torch.load("model.pth")

@app.post("/predict")
async def predict(data: dict):
    tensor = preprocess(data)
    with torch.no_grad():
        prediction = model(tensor)
    return {"prediction": prediction.tolist()}
```

### Feature Store Integration

```python
# Feast feature store
from feast import FeatureStore

store = FeatureStore(repo_path=".")
features = store.get_online_features(
    features=["user_features:age", "user_features:location"],
    entity_rows=[{"user_id": 123}]
).to_dict()
```

### Model Monitoring

```python
# Drift detection
from evidently import ColumnMapping
from evidently.report import Report
from evidently.metric_preset import DataDriftPreset

report = Report(metrics=[DataDriftPreset()])
report.run(reference_data=ref_df, current_data=curr_df)
```

## MLOps Best Practices

### Development

- Test-driven development for ML pipelines
- Version control models and data
- Reproducible experiments with MLflow

### Production

- A/B testing infrastructure
- Canary deployments for models
- Automated retraining pipelines
- Model monitoring and drift detection

### Performance Targets

| Metric | Target |
|--------|--------|
| P50 Latency | < 50ms |
| P95 Latency | < 100ms |
| P99 Latency | < 200ms |
| Throughput | > 1000 RPS |
| Availability | 99.9% |

## LLM Integration Patterns

### RAG System

```python
# Basic RAG with LangChain
from langchain.vectorstores import Pinecone
from langchain.embeddings import OpenAIEmbeddings
from langchain.chains import RetrievalQA

vectorstore = Pinecone.from_existing_index(
    index_name="docs",
    embedding=OpenAIEmbeddings()
)
qa = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=vectorstore.as_retriever()
)
```

### Prompt Management

```python
# Structured prompts with DSPy
import dspy

class QA(dspy.Signature):
    """Answer questions based on context."""
    context = dspy.InputField()
    question = dspy.InputField()
    answer = dspy.OutputField()

qa = dspy.Predict(QA)
```

## Common Commands

```bash
# Development
python -m pytest tests/ -v --cov
python -m black src/
python -m pylint src/

# Training
python scripts/train.py --config prod.yaml
mlflow run . -P epochs=10

# Deployment
docker build -t model:v1 .
kubectl apply -f k8s/model-serving.yaml

# Monitoring
mlflow ui --port 5000
```

## Security & Compliance

- Authentication for model endpoints
- Data encryption (at rest & in transit)
- PII handling and anonymization
- GDPR/CCPA compliance
- Model access audit logging

Overview

This skill codifies practical guidance for building, deploying, and operating production-grade ML and AI systems. It focuses on MLOps patterns, model serving, feature stores, monitoring, and integrating large language models and retrieval systems into production. The content emphasizes reproducibility, performance, and secure operational practices.

How this skill works

The skill inspects common production workflows and provides patterns, example snippets, and tool recommendations for model serving, feature store access, monitoring, and RAG/LLM integration. It covers end-to-end concerns: development best practices, CI/CD for models, deployment topology (Docker/Kubernetes), and runtime monitoring and drift detection. It also highlights security, compliance, and performance targets to validate production readiness.

When to use it

Deploying models to public or private production endpoints
Designing MLOps pipelines, model versioning, and automated retraining
Integrating LLMs, RAG systems, or vector stores into applications
Setting up model monitoring, drift detection, and alerting
Choosing infra components: feature stores, serving layers, and observability

Best practices

Adopt test-driven development and version control for code, models, and data
Use reproducible experiment tracking (e.g., MLflow) and model registry
Deploy with canary/A-B strategies and incremental rollouts to reduce risk
Instrument latency, throughput, and accuracy metrics; set SLOs and alerts
Require auth, encryption, and audit logging for model endpoints handling PII

Example use cases

Serve a PyTorch model behind FastAPI with GPU-aware inference and batching
Wire an online feature store (Feast) into a real-time scoring service
Build a RAG pipeline using LangChain + vector DB for contextual retrieval
Detect data drift with Evidently and trigger automated retraining workflows
Deploy scalable model serving on Kubernetes with Docker images and health checks

FAQ

What latency and availability targets should I aim for?

Common targets are P50 < 50ms, P95 < 100ms, P99 < 200ms and availability around 99.9%, but set SLOs based on user expectations and cost trade-offs.

How do I secure model endpoints that handle sensitive data?

Enforce authentication/authorization, TLS for in-transit encryption, at-rest encryption, input validation, PII redaction, and audit logging. Consider fine-grained access controls in the model registry.