home / skills / terrylica / cc-skills / mlflow-python

mlflow-python skill

safe

/plugins/devops-tools/skills/mlflow-python

This skill enables unified log and query of MLflow experiments from Python, exposing 70+ QuantStats metrics to evaluate backtests and strategies.

npx playbooks add skill terrylica/cc-skills --skill mlflow-python

Review the files below or copy the command above to add this skill to your agents.

Files (9)

SKILL.md

6.4 KB

---
name: mlflow-python
description: MLflow experiment tracking via Python API. TRIGGERS - MLflow metrics, log backtest, experiment tracking, search runs.
allowed-tools: Read, Bash, Grep, Glob
---

# MLflow Python Skill

Unified read/write MLflow operations via Python API with QuantStats integration for comprehensive trading metrics.

**ADR**: [2025-12-12-mlflow-python-skill](/docs/adr/2025-12-12-mlflow-python-skill.md)

> **Note**: This skill uses Pandas (MLflow API requires it). The `mlflow-python` path is auto-skipped by the Polars preference hook.

## When to Use This Skill

**CAN Do**:

- Log backtest metrics (Sharpe, max_drawdown, total_return, etc.)
- Log experiment parameters (strategy config, timeframes)
- Create and manage experiments
- Query runs with SQL-like filtering
- Calculate 70+ trading metrics via QuantStats
- Retrieve metric history (time-series data)

**CANNOT Do**:

- Direct database access to MLflow backend
- Artifact storage management (S3/GCS configuration)
- MLflow server administration

## Prerequisites

### Authentication Setup

MLflow uses separate environment variables for credentials (NOT embedded in URI):

```bash
# Option 1: mise + .env.local (recommended)
# Create .env.local in skill directory with:
MLFLOW_TRACKING_URI=http://mlflow.eonlabs.com:5000
MLFLOW_TRACKING_USERNAME=eonlabs
MLFLOW_TRACKING_PASSWORD=<password>

# Option 2: Direct environment variables
export MLFLOW_TRACKING_URI="http://mlflow.eonlabs.com:5000"
export MLFLOW_TRACKING_USERNAME="eonlabs"
export MLFLOW_TRACKING_PASSWORD="<password>"
```

### Verify Connection

```bash
/usr/bin/env bash << 'SKILL_SCRIPT_EOF'
cd ${CLAUDE_PLUGIN_ROOT}/skills/mlflow-python
uv run scripts/query_experiments.py experiments
SKILL_SCRIPT_EOF
```

## Quick Start Workflows

### A. Log Backtest Results (Primary Use Case)

```bash
/usr/bin/env bash << 'SKILL_SCRIPT_EOF_2'
cd ${CLAUDE_PLUGIN_ROOT}/skills/mlflow-python
uv run scripts/log_backtest.py \
  --experiment "crypto-backtests" \
  --run-name "btc_momentum_v2" \
  --returns path/to/returns.csv \
  --params '{"strategy": "momentum", "timeframe": "1h"}'
SKILL_SCRIPT_EOF_2
```

### B. Search Experiments

```bash
uv run scripts/query_experiments.py experiments
```

### C. Query Runs with Filter

```bash
uv run scripts/query_experiments.py runs \
  --experiment "crypto-backtests" \
  --filter "metrics.sharpe_ratio > 1.5" \
  --order-by "metrics.sharpe_ratio DESC"
```

### D. Create New Experiment

```bash
uv run scripts/create_experiment.py \
  --name "crypto-backtests-2025" \
  --description "Q1 2025 cryptocurrency trading strategy backtests"
```

### E. Get Metric History

```bash
uv run scripts/get_metric_history.py \
  --run-id abc123 \
  --metrics sharpe_ratio,cumulative_return
```

## QuantStats Metrics Available

The `log_backtest.py` script calculates 70+ metrics via QuantStats, including:

| Category     | Metrics                                                           |
| ------------ | ----------------------------------------------------------------- |
| **Ratios**   | sharpe, sortino, calmar, omega, treynor                           |
| **Returns**  | cagr, total_return, avg_return, best, worst                       |
| **Drawdown** | max_drawdown, avg_drawdown, drawdown_days                         |
| **Trade**    | win_rate, profit_factor, payoff_ratio, consecutive_wins/losses    |
| **Risk**     | volatility, var, cvar, ulcer_index, serenity_index                |
| **Advanced** | kelly_criterion, recovery_factor, risk_of_ruin, information_ratio |

See [quantstats-metrics.md](./references/quantstats-metrics.md) for full list.

## Bundled Scripts

| Script                  | Purpose                                      |
| ----------------------- | -------------------------------------------- |
| `log_backtest.py`       | Log backtest returns with QuantStats metrics |
| `query_experiments.py`  | Search experiments and runs (replaces CLI)   |
| `create_experiment.py`  | Create new experiment with metadata          |
| `get_metric_history.py` | Retrieve metric time-series data             |

## Configuration

The skill uses mise `[env]` pattern for configuration. See `.mise.toml` for defaults.

Create `.env.local` (gitignored) for credentials:

```bash
MLFLOW_TRACKING_URI=http://mlflow.eonlabs.com:5000
MLFLOW_TRACKING_USERNAME=eonlabs
MLFLOW_TRACKING_PASSWORD=<password>
```

## Reference Documentation

- [Authentication Patterns](./references/authentication.md) - Idiomatic MLflow auth
- [QuantStats Metrics](./references/quantstats-metrics.md) - Full list of 70+ metrics
- [Query Patterns](./references/query-patterns.md) - DataFrame operations
- [Migration from CLI](./references/migration-from-cli.md) - CLI to Python API mapping

## Migration from mlflow-query

This skill replaces the CLI-based `mlflow-query` skill. Key differences:

| Feature        | mlflow-query (old) | mlflow-python (new)    |
| -------------- | ------------------ | ---------------------- |
| Log metrics    | Not supported      | `mlflow.log_metrics()` |
| Log params     | Not supported      | `mlflow.log_params()`  |
| Query runs     | CLI text parsing   | DataFrame output       |
| Metric history | Workaround only    | Native support         |
| Auth pattern   | Embedded in URI    | Separate env vars      |

See [migration-from-cli.md](./references/migration-from-cli.md) for detailed mapping.

---

## Troubleshooting

| Issue                   | Cause                        | Solution                                            |
| ----------------------- | ---------------------------- | --------------------------------------------------- |
| Connection refused      | MLflow server not running    | Verify MLFLOW_TRACKING_URI and server status        |
| Authentication failed   | Wrong credentials            | Check MLFLOW_TRACKING_USERNAME and PASSWORD in .env |
| Experiment not found    | Experiment name typo         | Run `query_experiments.py experiments` to list all  |
| QuantStats import error | Missing dependency           | `uv add quantstats` in skill directory              |
| Pandas import warning   | Expected for this skill      | Ignore - MLflow requires Pandas (hook-excluded)     |
| Run creation fails      | Experiment doesn't exist     | Use `create_experiment.py` to create first          |
| Metric history empty    | Wrong run_id or metric name  | Verify run_id with `query_experiments.py runs`      |
| Returns CSV parse error | Wrong date format or columns | Check CSV has date index and returns column         |

Overview

This skill provides unified MLflow experiment tracking via the Python API, with built-in QuantStats integration to compute 70+ trading and risk metrics for backtests. It streamlines logging of backtest returns, parameters, and metrics, and offers DataFrame-based querying and metric history retrieval. Authentication uses environment variables and the skill includes scripts for common workflows.

How this skill works

The skill wraps MLflow Python client calls into scripts that create experiments, log params/metrics, and read runs. For backtests, it loads returns (CSV or DataFrame), computes QuantStats metrics, and logs both metrics and time-series to MLflow. Query scripts return pandas DataFrames and support SQL-like filters, ordering, and metric-history extraction.

When to use it

Log and version backtest runs with full trading metrics (Sharpe, drawdown, CAGR).
Record experiment parameters and strategy configuration for reproducible research.
Search and filter runs using DataFrame-friendly queries and order-by clauses.
Retrieve metric time series for plotting or downstream analysis.
Create and manage MLflow experiments from automation scripts or CI jobs.

Best practices

Set MLFLOW_TRACKING_URI, MLFLOW_TRACKING_USERNAME, and MLFLOW_TRACKING_PASSWORD in a local .env or environment variables before running scripts.
Provide returns CSV with a date index and single returns column to avoid parse errors.
Create experiments first when automating runs to avoid run-creation failures.
Install QuantStats and pandas in the skill environment so all metrics are available.
Treat artifact storage and MLflow server admin as separate concerns; this skill focuses on API-level tracking.

Example use cases

Log a nightly backtest pipeline that records 70+ QuantStats metrics and the returns series to MLflow.
Compare multiple strategy runs by filtering runs where metrics.sharpe_ratio > 1.5 and ordering by sharpe descending.
Automate experiment creation for quarterly strategy batches and capture experiment metadata.
Fetch metric history for a run to produce cumulative return and drawdown plots in a reporting job.
Migrate legacy CLI-based MLflow workflows to programmatic, DataFrame-first queries and logging.

FAQ

Do I need pandas installed?

Yes. The MLflow Python APIs used here expect pandas; install it in the skill environment.

Can this skill configure MLflow remote storage or the server?

No. This skill uses the MLflow client API for logging and querying; backend admin and artifact storage configuration must be handled separately.