home / skills / jeremylongshore / claude-code-plugins-plus-skills / dataset-loader-creator

dataset-loader-creator skill

/skills/07-ml-training/dataset-loader-creator

This skill helps you streamline dataset loader creator workflows by generating production-ready code, configurations, and best-practice guidance for ML

npx playbooks add skill jeremylongshore/claude-code-plugins-plus-skills --skill dataset-loader-creator

Review the files below or copy the command above to add this skill to your agents.

Files (1)
SKILL.md
2.2 KB
---
name: "dataset-loader-creator"
description: |
  Create dataset loader creator operations. Auto-activating skill for ML Training.
  Triggers on: dataset loader creator, dataset loader creator
  Part of the ML Training skill category. Use when working with dataset loader creator functionality. Trigger with phrases like "dataset loader creator", "dataset creator", "dataset".
allowed-tools: "Read, Write, Edit, Bash(python:*), Bash(pip:*)"
version: 1.0.0
license: MIT
author: "Jeremy Longshore <[email protected]>"
---

# Dataset Loader Creator

## Overview

This skill provides automated assistance for dataset loader creator tasks within the ML Training domain.

## When to Use

This skill activates automatically when you:
- Mention "dataset loader creator" in your request
- Ask about dataset loader creator patterns or best practices
- Need help with machine learning training skills covering data preparation, model training, hyperparameter tuning, and experiment tracking.

## Instructions

1. Provides step-by-step guidance for dataset loader creator
2. Follows industry best practices and patterns
3. Generates production-ready code and configurations
4. Validates outputs against common standards

## Examples

**Example: Basic Usage**
Request: "Help me with dataset loader creator"
Result: Provides step-by-step guidance and generates appropriate configurations


## Prerequisites

- Relevant development environment configured
- Access to necessary tools and services
- Basic understanding of ml training concepts


## Output

- Generated configurations and code
- Best practice recommendations
- Validation results


## Error Handling

| Error | Cause | Solution |
|-------|-------|----------|
| Configuration invalid | Missing required fields | Check documentation for required parameters |
| Tool not found | Dependency not installed | Install required tools per prerequisites |
| Permission denied | Insufficient access | Verify credentials and permissions |


## Resources

- Official documentation for related tools
- Best practices guides
- Community examples and tutorials

## Related Skills

Part of the **ML Training** skill category.
Tags: ml, training, pytorch, tensorflow, sklearn

Overview

This skill automates creation of dataset loader creator operations for ML training workflows. It provides step-by-step guidance, generates production-ready loader code and configurations, and validates outputs against common standards to accelerate data preparation for model training.

How this skill works

The skill inspects user intent for dataset loader creation and responds by producing loader templates, configuration files, and usage examples for common frameworks (PyTorch, TensorFlow, scikit-learn). It enforces best-practice patterns such as batching, shuffling, augmentation hooks, and dataset validation, and returns runnable code plus notes on prerequisites and dependencies.

When to use it

  • When you need a reusable dataset loader for PyTorch, TensorFlow, or NumPy-based training pipelines
  • When preparing datasets that require splitting, augmentation, normalization, or on-the-fly transforms
  • When you want production-ready loader code and configuration for CI/CD or training pipelines
  • When you need validation checks for schema, missing values, and sample distribution
  • When you require examples and instructions to integrate loaders into training loops

Best practices

  • Define clear schema and data contracts (types, ranges, expected shapes) before coding loaders
  • Implement streaming/batching for large datasets and keep transforms stateless where possible
  • Separate data ingestion, transformation, and augmentation into modular functions
  • Include validation steps for sample counts, label distribution, and missing values
  • Provide configuration-driven parameters (batch_size, shuffle, num_workers, seed) for reproducibility

Example use cases

  • Generate a PyTorch DataLoader with custom Dataset class that handles image augmentation and label mapping
  • Produce a TensorFlow tf.data pipeline for CSV tabular data with normalization and caching
  • Create a scikit-learn compatible loader that yields feature matrices and label arrays for cross-validation
  • Build configuration files for multiple environments (local dev, CI, cloud training) with dependency notes
  • Validate dataset schema and generate a small test harness to assert loader correctness

FAQ

What frameworks does this skill support?

It covers common frameworks including PyTorch, TensorFlow, and scikit-learn, plus plain NumPy/pandas pipelines.

Will the generated loaders handle large datasets?

Yes. Generated code includes streaming, batching, and optional disk-backed caching patterns to handle large datasets efficiently.