home / skills / eyadsibai / ltk / experiment-tracking
This skill helps you track ML experiments, metrics, and models across platforms like MLflow and W&B for better reproducibility and collaboration.
npx playbooks add skill eyadsibai/ltk --skill experiment-trackingReview the files below or copy the command above to add this skill to your agents.
---
name: experiment-tracking
description: Use when "experiment tracking", "MLflow", "Weights & Biases", "wandb", "model registry", "hyperparameter logging", "ML experiments", "training metrics"
version: 1.0.0
---
# Experiment Tracking
Track ML experiments, metrics, and models.
## Comparison
| Platform | Best For | Self-hosted | Visualization |
|----------|----------|-------------|---------------|
| **MLflow** | Open-source, model registry | Yes | Basic |
| **W&B** | Collaboration, sweeps | Limited | Excellent |
| **Neptune** | Team collaboration | No | Good |
| **ClearML** | Full MLOps | Yes | Good |
---
## MLflow
Open-source platform from Databricks.
**Core components:**
- **Tracking**: Log parameters, metrics, artifacts
- **Projects**: Reproducible runs (MLproject file)
- **Models**: Package and deploy models
- **Registry**: Model versioning and staging
**Strengths**: Self-hosted, open-source, model registry, framework integrations
**Limitations**: Basic visualization, less collaborative features
**Key concept**: Autologging for major frameworks - automatic metric capture with one line.
---
## Weights & Biases (W&B)
Cloud-first experiment tracking with excellent visualization.
**Core features:**
- **Experiment tracking**: Metrics, hyperparameters, system stats
- **Sweeps**: Hyperparameter search (grid, random, Bayesian)
- **Artifacts**: Dataset and model versioning
- **Reports**: Shareable documentation
**Strengths**: Beautiful visualizations, team collaboration, hyperparameter sweeps
**Limitations**: Cloud-dependent, limited self-hosting
**Key concept**: `wandb.init()` + `wandb.log()` - simple API, powerful features.
---
## What to Track
| Category | Examples |
|----------|----------|
| **Hyperparameters** | Learning rate, batch size, architecture |
| **Metrics** | Loss, accuracy, F1, per-epoch values |
| **Artifacts** | Model checkpoints, configs, datasets |
| **System** | GPU usage, memory, runtime |
| **Code** | Git commit, diff, requirements |
---
## Model Registry Concepts
| Stage | Purpose |
|-------|---------|
| **None** | Just logged, not registered |
| **Staging** | Testing, validation |
| **Production** | Serving live traffic |
| **Archived** | Deprecated, kept for reference |
---
## Decision Guide
| Scenario | Recommendation |
|----------|----------------|
| Self-hosted requirement | MLflow |
| Team collaboration | W&B |
| Model registry focus | MLflow |
| Hyperparameter sweeps | W&B |
| Beautiful dashboards | W&B |
| Full MLOps pipeline | MLflow + deployment tools |
## Resources
- MLflow: <https://mlflow.org/docs/latest/>
- W&B: <https://docs.wandb.ai/>
This skill helps teams track machine learning experiments, log hyperparameters and metrics, manage artifacts, and maintain model versions across platforms like MLflow and Weights & Biases. It is focused on practical experiment tracking choices, trade-offs between self-hosted and cloud options, and how to implement reliable logging for reproducible ML workflows. Use it to pick tools and patterns that fit your deployment, collaboration, and registry needs.
The skill compares core experiment-tracking platforms and explains their primary components: tracking APIs for parameters and metrics, artifact storage for checkpoints and datasets, hyperparameter sweeps, and model registries for versioning. It outlines key integrations (autologging, wandb.init/wandb.log) and shows what types of data to capture (hyperparameters, metrics, system stats, code). It also maps stages used in registries (staging, production, archived) to common workflows.
Which platform is best if I must self-host?
MLflow is the preferred choice for self-hosting and includes a built-in model registry for versioning.
When should I use Weights & Biases instead of MLflow?
Use W&B if you prioritize collaborative dashboards, rich visualizations, and managed hyperparameter sweeps.