home / skills / jeremylongshore / claude-code-plugins-plus-skills / dataset-loader-creator

dataset-loader-creator skill

safe

/skills/07-ml-training/dataset-loader-creator

This skill helps you streamline dataset loader creator workflows by generating production-ready code, configurations, and best-practice guidance for ML

npx playbooks add skill jeremylongshore/claude-code-plugins-plus-skills --skill dataset-loader-creator

Review the files below or copy the command above to add this skill to your agents.

Files (1)

SKILL.md

2.2 KB

---
name: "dataset-loader-creator"
description: |
  Create dataset loader creator operations. Auto-activating skill for ML Training.
  Triggers on: dataset loader creator, dataset loader creator
  Part of the ML Training skill category. Use when working with dataset loader creator functionality. Trigger with phrases like "dataset loader creator", "dataset creator", "dataset".
allowed-tools: "Read, Write, Edit, Bash(python:*), Bash(pip:*)"
version: 1.0.0
license: MIT
author: "Jeremy Longshore <[email protected]>"
---

# Dataset Loader Creator

## Overview

This skill provides automated assistance for dataset loader creator tasks within the ML Training domain.

## When to Use

This skill activates automatically when you:
- Mention "dataset loader creator" in your request
- Ask about dataset loader creator patterns or best practices
- Need help with machine learning training skills covering data preparation, model training, hyperparameter tuning, and experiment tracking.

## Instructions

1. Provides step-by-step guidance for dataset loader creator
2. Follows industry best practices and patterns
3. Generates production-ready code and configurations
4. Validates outputs against common standards

## Examples

**Example: Basic Usage**
Request: "Help me with dataset loader creator"
Result: Provides step-by-step guidance and generates appropriate configurations


## Prerequisites

- Relevant development environment configured
- Access to necessary tools and services
- Basic understanding of ml training concepts


## Output

- Generated configurations and code
- Best practice recommendations
- Validation results


## Error Handling

| Error | Cause | Solution |
|-------|-------|----------|
| Configuration invalid | Missing required fields | Check documentation for required parameters |
| Tool not found | Dependency not installed | Install required tools per prerequisites |
| Permission denied | Insufficient access | Verify credentials and permissions |


## Resources

- Official documentation for related tools
- Best practices guides
- Community examples and tutorials

## Related Skills

Part of the **ML Training** skill category.
Tags: ml, training, pytorch, tensorflow, sklearn

Overview

This skill automates creation of dataset loader creator operations for ML training workflows. It provides step-by-step guidance, generates production-ready loader code and configurations, and validates outputs against common standards to accelerate data preparation for model training.

How this skill works

The skill inspects user intent for dataset loader creation and responds by producing loader templates, configuration files, and usage examples for common frameworks (PyTorch, TensorFlow, scikit-learn). It enforces best-practice patterns such as batching, shuffling, augmentation hooks, and dataset validation, and returns runnable code plus notes on prerequisites and dependencies.

When to use it

When you need a reusable dataset loader for PyTorch, TensorFlow, or NumPy-based training pipelines
When preparing datasets that require splitting, augmentation, normalization, or on-the-fly transforms
When you want production-ready loader code and configuration for CI/CD or training pipelines
When you need validation checks for schema, missing values, and sample distribution
When you require examples and instructions to integrate loaders into training loops

Best practices

Define clear schema and data contracts (types, ranges, expected shapes) before coding loaders
Implement streaming/batching for large datasets and keep transforms stateless where possible
Separate data ingestion, transformation, and augmentation into modular functions
Include validation steps for sample counts, label distribution, and missing values
Provide configuration-driven parameters (batch_size, shuffle, num_workers, seed) for reproducibility

Example use cases

Generate a PyTorch DataLoader with custom Dataset class that handles image augmentation and label mapping
Produce a TensorFlow tf.data pipeline for CSV tabular data with normalization and caching
Create a scikit-learn compatible loader that yields feature matrices and label arrays for cross-validation
Build configuration files for multiple environments (local dev, CI, cloud training) with dependency notes
Validate dataset schema and generate a small test harness to assert loader correctness

FAQ

What frameworks does this skill support?

It covers common frameworks including PyTorch, TensorFlow, and scikit-learn, plus plain NumPy/pandas pipelines.

Will the generated loaders handle large datasets?

Yes. Generated code includes streaming, batching, and optional disk-backed caching patterns to handle large datasets efficiently.