home / skills / dkyazzentwatwa / chatgpt-skills / feature-engineering-kit
This skill auto-generates features for ML pipelines, including encodings, scaling, polynomial terms, and temporal, text, and missing-value handling.
npx playbooks add skill dkyazzentwatwa/chatgpt-skills --skill feature-engineering-kitReview the files below or copy the command above to add this skill to your agents.
---
name: feature-engineering-kit
description: Auto-generate features with encodings, scaling, polynomial features, and interaction terms for ML pipelines.
---
# Feature Engineering Kit
Automated feature engineering with encodings, scaling, and transformations.
## Features
- **Encodings**: One-hot, label, target encoding
- **Scaling**: Standard, min-max, robust scaling
- **Polynomial Features**: Generate interactions
- **Binning**: Discretize continuous features
- **Date Features**: Extract time-based features
- **Text Features**: TF-IDF, word counts
- **Missing Value Handling**: Imputation strategies
## CLI Usage
```bash
python feature_engineering.py --data train.csv --output engineered.csv --config config.json
```
## Dependencies
- scikit-learn>=1.3.0
- pandas>=2.0.0
- numpy>=1.24.0
This skill auto-generates machine learning features including encodings, scalings, polynomial interactions, and common transformations. It produces ready-to-use engineered datasets or pipeline components for training and inference. The toolkit supports categorical and numerical handling, date/text extraction, and configurable imputation strategies. Use it to accelerate feature prep and reduce manual engineering work.
The skill inspects input columns and applies selected transformations such as one-hot, label, or target encoding for categoricals and standard/min-max/robust scaling for numerics. It can create polynomial and interaction terms, discretize continuous variables, extract date-time parts, and compute text features like TF-IDF or word counts. Missing values are handled with configurable imputers before transformations, and the output is written as an engineered dataset or pipeline artifact. Configuration is provided via a JSON file or CLI flags to control columns and transformation parameters.
What input formats are supported?
CSV files and pandas DataFrames; configuration is provided via JSON or CLI options.
How does the skill avoid target leakage with target encoding?
It supports cross-validated and holdout target encoding schemes so encodings are learned without using the target from the same fold.