home / skills / dkyazzentwatwa / chatgpt-skills / model-comparison-tool

model-comparison-tool skill

safe

This skill helps you compare multiple ML models using cross-validation, metrics, and auto-selection to pick the best classifier or regressor.

npx playbooks add skill dkyazzentwatwa/chatgpt-skills --skill model-comparison-tool

Review the files below or copy the command above to add this skill to your agents.

Files (3)

SKILL.md

1.9 KB

---
name: model-comparison-tool
description: Use when asked to compare multiple ML models, perform cross-validation, evaluate metrics, or select the best model for a classification/regression task.
---

# Model Comparison Tool

Compare multiple machine learning models systematically with cross-validation, metric evaluation, and automated model selection.

## Purpose

Model comparison for:
- Algorithm selection and benchmarking
- Hyperparameter tuning comparison
- Model performance validation
- Feature engineering evaluation
- Production model selection

## Features

- **Multi-Model Comparison**: Test 5+ algorithms simultaneously
- **Cross-Validation**: K-fold, stratified, time-series splits
- **Comprehensive Metrics**: Accuracy, F1, ROC-AUC, RMSE, MAE, R²
- **Statistical Testing**: Paired t-tests for significance
- **Visualization**: Performance charts, ROC curves, learning curves
- **Auto-Selection**: Recommend best model based on criteria

## Quick Start

```python
from model_comparison_tool import ModelComparisonTool

# Compare classifiers
comparator = ModelComparisonTool()
comparator.load_data(X_train, y_train, task='classification')

results = comparator.compare_models(
    models=['rf', 'gb', 'lr', 'svm'],
    cv_folds=5
)

best_model = comparator.get_best_model(metric='f1')
```

## CLI Usage

```bash
# Compare models on CSV data
python model_comparison_tool.py --data data.csv --target target --task classification

# Custom model comparison
python model_comparison_tool.py --data data.csv --target price --task regression --models rf,gb,lr --cv 10

# Export results
python model_comparison_tool.py --data data.csv --target y --output comparison_report.html
```

## Limitations

- Requires sufficient data for meaningful cross-validation
- Large datasets may have long comparison times
- Deep learning models not included (use dedicated frameworks)
- Feature engineering must be done beforehand

Overview

This skill compares multiple machine learning models systematically to identify the best performer for classification or regression tasks. It runs cross-validation, computes standard metrics, performs statistical tests, and can recommend a winning model. The tool supports common split strategies and produces visualizations for deeper inspection.

How this skill works

Load your dataset and specify task type (classification or regression). The tool trains several algorithms in parallel using k-fold, stratified, or time-series cross-validation, collects metrics (accuracy, F1, ROC-AUC, RMSE, MAE, R²), and optionally runs paired statistical tests to check significance. It generates comparison tables, learning and ROC curves, and an auto-selection based on a chosen metric and criteria.

When to use it

Benchmark multiple candidate algorithms to choose the best baseline.
Validate model performance reliably with cross-validation before deployment.
Compare hyperparameter variants or feature-engineering pipelines.
Select a production-ready model based on specified business metrics.
Evaluate time-series models using proper split strategies.

Best practices

Ensure sufficient data per class or time window for reliable cross-validation.
Preprocess and engineer features before running comparisons; the tool assumes prepared inputs.
Choose evaluation metrics that reflect business goals (e.g., F1 for imbalanced classes).
Limit very large datasets or use subsampling to keep runtimes manageable.
Combine visual inspection of curves with statistical tests before final model choice.

Example use cases

Compare random forest, gradient boosting, logistic regression, and SVM on a classification dataset to pick the best F1 score.
Run 10-fold CV on regression models to identify the lowest RMSE and most stable predictions.
Evaluate effect of two different feature sets by comparing model families on the same folds.
Use time-series split to compare forecasting algorithms while preserving temporal order.
Produce a shareable HTML report summarizing metrics and recommended model for stakeholders.

FAQ

Can the tool handle imbalanced classes?

Yes — use stratified CV and choose metrics like F1 or ROC-AUC; consider sampling or class weighting before comparison.

Does it include deep learning models?

No — the tool focuses on classical ML algorithms. For deep learning, use a dedicated framework and export results for comparison.