home / skills / orchestra-research / ai-research-skills / mlflow
This skill helps you manage end-to-end ML lifecycles with MLflow, including experiment tracking, model registry, and reproducible deployments across frameworks.
npx playbooks add skill orchestra-research/ai-research-skills --skill mlflowReview the files below or copy the command above to add this skill to your agents.
---
name: mlflow
description: Track ML experiments, manage model registry with versioning, deploy models to production, and reproduce experiments with MLflow - framework-agnostic ML lifecycle platform
version: 1.0.0
author: Orchestra Research
license: MIT
tags: [MLOps, MLflow, Experiment Tracking, Model Registry, ML Lifecycle, Deployment, Model Versioning, PyTorch, TensorFlow, Scikit-Learn, HuggingFace]
dependencies: [mlflow, sqlalchemy, boto3]
---
# MLflow: ML Lifecycle Management Platform
## When to Use This Skill
Use MLflow when you need to:
- **Track ML experiments** with parameters, metrics, and artifacts
- **Manage model registry** with versioning and stage transitions
- **Deploy models** to various platforms (local, cloud, serving)
- **Reproduce experiments** with project configurations
- **Compare model versions** and performance metrics
- **Collaborate** on ML projects with team workflows
- **Integrate** with any ML framework (framework-agnostic)
**Users**: 20,000+ organizations | **GitHub Stars**: 23k+ | **License**: Apache 2.0
## Installation
```bash
# Install MLflow
pip install mlflow
# Install with extras
pip install mlflow[extras] # Includes SQLAlchemy, boto3, etc.
# Start MLflow UI
mlflow ui
# Access at http://localhost:5000
```
## Quick Start
### Basic Tracking
```python
import mlflow
# Start a run
with mlflow.start_run():
# Log parameters
mlflow.log_param("learning_rate", 0.001)
mlflow.log_param("batch_size", 32)
# Your training code
model = train_model()
# Log metrics
mlflow.log_metric("train_loss", 0.15)
mlflow.log_metric("val_accuracy", 0.92)
# Log model
mlflow.sklearn.log_model(model, "model")
```
### Autologging (Automatic Tracking)
```python
import mlflow
from sklearn.ensemble import RandomForestClassifier
# Enable autologging
mlflow.autolog()
# Train (automatically logged)
model = RandomForestClassifier(n_estimators=100, max_depth=5)
model.fit(X_train, y_train)
# Metrics, parameters, and model logged automatically!
```
## Core Concepts
### 1. Experiments and Runs
**Experiment**: Logical container for related runs
**Run**: Single execution of ML code (parameters, metrics, artifacts)
```python
import mlflow
# Create/set experiment
mlflow.set_experiment("my-experiment")
# Start a run
with mlflow.start_run(run_name="baseline-model"):
# Log params
mlflow.log_param("model", "ResNet50")
mlflow.log_param("epochs", 10)
# Train
model = train()
# Log metrics
mlflow.log_metric("accuracy", 0.95)
# Log model
mlflow.pytorch.log_model(model, "model")
# Run ID is automatically generated
print(f"Run ID: {mlflow.active_run().info.run_id}")
```
### 2. Logging Parameters
```python
with mlflow.start_run():
# Single parameter
mlflow.log_param("learning_rate", 0.001)
# Multiple parameters
mlflow.log_params({
"batch_size": 32,
"epochs": 50,
"optimizer": "Adam",
"dropout": 0.2
})
# Nested parameters (as dict)
config = {
"model": {
"architecture": "ResNet50",
"pretrained": True
},
"training": {
"lr": 0.001,
"weight_decay": 1e-4
}
}
# Log as JSON string or individual params
for key, value in config.items():
mlflow.log_param(key, str(value))
```
### 3. Logging Metrics
```python
with mlflow.start_run():
# Training loop
for epoch in range(NUM_EPOCHS):
train_loss = train_epoch()
val_loss = validate()
# Log metrics at each step
mlflow.log_metric("train_loss", train_loss, step=epoch)
mlflow.log_metric("val_loss", val_loss, step=epoch)
# Log multiple metrics
mlflow.log_metrics({
"train_accuracy": train_acc,
"val_accuracy": val_acc
}, step=epoch)
# Log final metrics (no step)
mlflow.log_metric("final_accuracy", final_acc)
```
### 4. Logging Artifacts
```python
with mlflow.start_run():
# Log file
model.save('model.pkl')
mlflow.log_artifact('model.pkl')
# Log directory
os.makedirs('plots', exist_ok=True)
plt.savefig('plots/loss_curve.png')
mlflow.log_artifacts('plots')
# Log text
with open('config.txt', 'w') as f:
f.write(str(config))
mlflow.log_artifact('config.txt')
# Log dict as JSON
mlflow.log_dict({'config': config}, 'config.json')
```
### 5. Logging Models
```python
# PyTorch
import mlflow.pytorch
with mlflow.start_run():
model = train_pytorch_model()
mlflow.pytorch.log_model(model, "model")
# Scikit-learn
import mlflow.sklearn
with mlflow.start_run():
model = train_sklearn_model()
mlflow.sklearn.log_model(model, "model")
# Keras/TensorFlow
import mlflow.keras
with mlflow.start_run():
model = train_keras_model()
mlflow.keras.log_model(model, "model")
# HuggingFace Transformers
import mlflow.transformers
with mlflow.start_run():
mlflow.transformers.log_model(
transformers_model={
"model": model,
"tokenizer": tokenizer
},
artifact_path="model"
)
```
## Autologging
Automatically log metrics, parameters, and models for popular frameworks.
### Enable Autologging
```python
import mlflow
# Enable for all supported frameworks
mlflow.autolog()
# Or enable for specific framework
mlflow.sklearn.autolog()
mlflow.pytorch.autolog()
mlflow.keras.autolog()
mlflow.xgboost.autolog()
```
### Autologging with Scikit-learn
```python
import mlflow
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
# Enable autologging
mlflow.sklearn.autolog()
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
# Train (automatically logs params, metrics, model)
with mlflow.start_run():
model = RandomForestClassifier(n_estimators=100, max_depth=5, random_state=42)
model.fit(X_train, y_train)
# Metrics like accuracy, f1_score logged automatically
# Model logged automatically
# Training duration logged
```
### Autologging with PyTorch Lightning
```python
import mlflow
import pytorch_lightning as pl
# Enable autologging
mlflow.pytorch.autolog()
# Train
with mlflow.start_run():
trainer = pl.Trainer(max_epochs=10)
trainer.fit(model, datamodule=dm)
# Hyperparameters logged
# Training metrics logged
# Best model checkpoint logged
```
## Model Registry
Manage model lifecycle with versioning and stage transitions.
### Register Model
```python
import mlflow
# Log and register model
with mlflow.start_run():
model = train_model()
# Log model
mlflow.sklearn.log_model(
model,
"model",
registered_model_name="my-classifier" # Register immediately
)
# Or register later
run_id = "abc123"
model_uri = f"runs:/{run_id}/model"
mlflow.register_model(model_uri, "my-classifier")
```
### Model Stages
Transition models between stages: **None** → **Staging** → **Production** → **Archived**
```python
from mlflow.tracking import MlflowClient
client = MlflowClient()
# Promote to staging
client.transition_model_version_stage(
name="my-classifier",
version=3,
stage="Staging"
)
# Promote to production
client.transition_model_version_stage(
name="my-classifier",
version=3,
stage="Production",
archive_existing_versions=True # Archive old production versions
)
# Archive model
client.transition_model_version_stage(
name="my-classifier",
version=2,
stage="Archived"
)
```
### Load Model from Registry
```python
import mlflow.pyfunc
# Load latest production model
model = mlflow.pyfunc.load_model("models:/my-classifier/Production")
# Load specific version
model = mlflow.pyfunc.load_model("models:/my-classifier/3")
# Load from staging
model = mlflow.pyfunc.load_model("models:/my-classifier/Staging")
# Use model
predictions = model.predict(X_test)
```
### Model Versioning
```python
client = MlflowClient()
# List all versions
versions = client.search_model_versions("name='my-classifier'")
for v in versions:
print(f"Version {v.version}: {v.current_stage}")
# Get latest version by stage
latest_prod = client.get_latest_versions("my-classifier", stages=["Production"])
latest_staging = client.get_latest_versions("my-classifier", stages=["Staging"])
# Get model version details
version_info = client.get_model_version(name="my-classifier", version="3")
print(f"Run ID: {version_info.run_id}")
print(f"Stage: {version_info.current_stage}")
print(f"Tags: {version_info.tags}")
```
### Model Annotations
```python
client = MlflowClient()
# Add description
client.update_model_version(
name="my-classifier",
version="3",
description="ResNet50 classifier trained on 1M images with 95% accuracy"
)
# Add tags
client.set_model_version_tag(
name="my-classifier",
version="3",
key="validation_status",
value="approved"
)
client.set_model_version_tag(
name="my-classifier",
version="3",
key="deployed_date",
value="2025-01-15"
)
```
## Searching Runs
Find runs programmatically.
```python
from mlflow.tracking import MlflowClient
client = MlflowClient()
# Search all runs in experiment
experiment_id = client.get_experiment_by_name("my-experiment").experiment_id
runs = client.search_runs(
experiment_ids=[experiment_id],
filter_string="metrics.accuracy > 0.9",
order_by=["metrics.accuracy DESC"],
max_results=10
)
for run in runs:
print(f"Run ID: {run.info.run_id}")
print(f"Accuracy: {run.data.metrics['accuracy']}")
print(f"Params: {run.data.params}")
# Search with complex filters
runs = client.search_runs(
experiment_ids=[experiment_id],
filter_string="""
metrics.accuracy > 0.9 AND
params.model = 'ResNet50' AND
tags.dataset = 'ImageNet'
""",
order_by=["metrics.f1_score DESC"]
)
```
## Integration Examples
### PyTorch
```python
import mlflow
import torch
import torch.nn as nn
# Enable autologging
mlflow.pytorch.autolog()
with mlflow.start_run():
# Log config
config = {
"lr": 0.001,
"epochs": 10,
"batch_size": 32
}
mlflow.log_params(config)
# Train
model = create_model()
optimizer = torch.optim.Adam(model.parameters(), lr=config["lr"])
for epoch in range(config["epochs"]):
train_loss = train_epoch(model, optimizer, train_loader)
val_loss, val_acc = validate(model, val_loader)
# Log metrics
mlflow.log_metrics({
"train_loss": train_loss,
"val_loss": val_loss,
"val_accuracy": val_acc
}, step=epoch)
# Log model
mlflow.pytorch.log_model(model, "model")
```
### HuggingFace Transformers
```python
import mlflow
from transformers import Trainer, TrainingArguments
# Enable autologging
mlflow.transformers.autolog()
training_args = TrainingArguments(
output_dir="./results",
num_train_epochs=3,
per_device_train_batch_size=16,
evaluation_strategy="epoch",
save_strategy="epoch",
load_best_model_at_end=True
)
# Start MLflow run
with mlflow.start_run():
trainer = Trainer(
model=model,
args=training_args,
train_dataset=train_dataset,
eval_dataset=eval_dataset
)
# Train (automatically logged)
trainer.train()
# Log final model to registry
mlflow.transformers.log_model(
transformers_model={
"model": trainer.model,
"tokenizer": tokenizer
},
artifact_path="model",
registered_model_name="hf-classifier"
)
```
### XGBoost
```python
import mlflow
import xgboost as xgb
# Enable autologging
mlflow.xgboost.autolog()
with mlflow.start_run():
dtrain = xgb.DMatrix(X_train, label=y_train)
dval = xgb.DMatrix(X_val, label=y_val)
params = {
'max_depth': 6,
'learning_rate': 0.1,
'objective': 'binary:logistic',
'eval_metric': ['logloss', 'auc']
}
# Train (automatically logged)
model = xgb.train(
params,
dtrain,
num_boost_round=100,
evals=[(dtrain, 'train'), (dval, 'val')],
early_stopping_rounds=10
)
# Model and metrics logged automatically
```
## Best Practices
### 1. Organize with Experiments
```python
# ✅ Good: Separate experiments for different tasks
mlflow.set_experiment("sentiment-analysis")
mlflow.set_experiment("image-classification")
mlflow.set_experiment("recommendation-system")
# ❌ Bad: Everything in one experiment
mlflow.set_experiment("all-models")
```
### 2. Use Descriptive Run Names
```python
# ✅ Good: Descriptive names
with mlflow.start_run(run_name="resnet50-imagenet-lr0.001-bs32"):
train()
# ❌ Bad: No name (auto-generated UUID)
with mlflow.start_run():
train()
```
### 3. Log Comprehensive Metadata
```python
with mlflow.start_run():
# Log hyperparameters
mlflow.log_params({
"learning_rate": 0.001,
"batch_size": 32,
"epochs": 50
})
# Log system info
mlflow.set_tags({
"dataset": "ImageNet",
"framework": "PyTorch 2.0",
"gpu": "A100",
"git_commit": get_git_commit()
})
# Log data info
mlflow.log_param("train_samples", len(train_dataset))
mlflow.log_param("val_samples", len(val_dataset))
```
### 4. Track Model Lineage
```python
# Link runs to understand lineage
with mlflow.start_run(run_name="preprocessing"):
data = preprocess()
mlflow.log_artifact("data.csv")
preprocessing_run_id = mlflow.active_run().info.run_id
with mlflow.start_run(run_name="training"):
# Reference parent run
mlflow.set_tag("preprocessing_run_id", preprocessing_run_id)
model = train(data)
```
### 5. Use Model Registry for Deployment
```python
# ✅ Good: Use registry for production
model_uri = "models:/my-classifier/Production"
model = mlflow.pyfunc.load_model(model_uri)
# ❌ Bad: Hard-code run IDs
model_uri = "runs:/abc123/model"
model = mlflow.pyfunc.load_model(model_uri)
```
## Deployment
### Serve Model Locally
```bash
# Serve registered model
mlflow models serve -m "models:/my-classifier/Production" -p 5001
# Serve from run
mlflow models serve -m "runs:/<RUN_ID>/model" -p 5001
# Test endpoint
curl http://127.0.0.1:5001/invocations -H 'Content-Type: application/json' -d '{
"inputs": [[1.0, 2.0, 3.0, 4.0]]
}'
```
### Deploy to Cloud
```bash
# Deploy to AWS SageMaker
mlflow sagemaker deploy -m "models:/my-classifier/Production" --region-name us-west-2
# Deploy to Azure ML
mlflow azureml deploy -m "models:/my-classifier/Production"
```
## Configuration
### Tracking Server
```bash
# Start tracking server with backend store
mlflow server \
--backend-store-uri postgresql://user:password@localhost/mlflow \
--default-artifact-root s3://my-bucket/mlflow \
--host 0.0.0.0 \
--port 5000
```
### Client Configuration
```python
import mlflow
# Set tracking URI
mlflow.set_tracking_uri("http://localhost:5000")
# Or use environment variable
# export MLFLOW_TRACKING_URI=http://localhost:5000
```
## Resources
- **Documentation**: https://mlflow.org/docs/latest
- **GitHub**: https://github.com/mlflow/mlflow (23k+ stars)
- **Examples**: https://github.com/mlflow/mlflow/tree/master/examples
- **Community**: https://mlflow.org/community
## See Also
- `references/tracking.md` - Comprehensive tracking guide
- `references/model-registry.md` - Model lifecycle management
- `references/deployment.md` - Production deployment patterns
This skill packages MLflow capabilities to track experiments, version models, and deploy production-ready artifacts across frameworks. It helps teams reproduce runs, compare model variants, and manage model lifecycle with a central registry and stage transitions. Use it to standardize metadata, artifacts, and deployment workflows for repeatable ML engineering.
The skill instruments training code to log parameters, metrics, artifacts, and models into an MLflow tracking server and UI. It supports autologging for common frameworks, programmatic search of runs, and a model registry that versions models and controls stage transitions (Staging, Production, Archived). Models can be loaded via model URIs or served with built-in serving commands for local and cloud deployment.
How do I enable automatic logging for my framework?
Call mlflow.autolog() or the framework-specific autolog method (mlflow.sklearn.autolog(), mlflow.pytorch.autolog(), etc.) before training.
How should I load the production model in code?
Use mlflow.pyfunc.load_model with a registry URI like models:/my-model/Production to load the latest production version.