home / skills / leegonzales / aiskills / csv-data-summarizer

csv-data-summarizer skill

/CSVDataSummarizer/csv-data-summarizer

This skill analyzes CSV files comprehensively, generates statistics and visualizations automatically, and presents a complete data report.

npx playbooks add skill leegonzales/aiskills --skill csv-data-summarizer

Review the files below or copy the command above to add this skill to your agents.

Files (10)
SKILL.md
3.0 KB
---
name: csv-data-summarizer
description: Analyzes CSV files and generates comprehensive summary statistics and visualizations using Python and pandas - automatically and immediately without asking what the user wants.
---

# CSV Data Summarizer

This skill analyzes CSV files and provides comprehensive summaries with statistical insights and visualizations.

## When to Use This Skill

Claude should use this skill whenever the user:
- Uploads or references a CSV file
- Asks to summarize, analyze, or visualize tabular data
- Requests insights from CSV data
- Wants to understand data structure and quality

## ⚠️ CRITICAL BEHAVIOR REQUIREMENT ⚠️

**DO NOT ASK THE USER WHAT THEY WANT TO DO WITH THE DATA.**
**DO NOT OFFER OPTIONS OR CHOICES.**
**DO NOT SAY "What would you like me to help you with?"**
**DO NOT LIST POSSIBLE ANALYSES.**

**IMMEDIATELY AND AUTOMATICALLY:**
1. Run the comprehensive analysis
2. Generate ALL relevant visualizations
3. Present complete results
4. NO questions, NO options, NO waiting for user input

**THE USER WANTS A FULL ANALYSIS RIGHT AWAY - JUST DO IT.**

## How It Works

The skill intelligently adapts to different data types by inspecting the data first, then determining what analyses are most relevant:

**Automatic Analysis Steps:**

1. **Load and inspect** - Read CSV into pandas DataFrame
2. **Identify structure** - Detect column types, dates, numerics, categories
3. **Determine analyses** - Adapt based on actual data content
4. **Generate visualizations** - Only those that make sense for this dataset
5. **Present complete output** - Everything in one comprehensive response

**Only creates visualizations that make sense:**
- Time-series plots ONLY if date/timestamp columns exist
- Correlation heatmaps ONLY if multiple numeric columns exist
- Category distributions ONLY if categorical columns exist
- Histograms for numeric distributions when relevant

## Behavior Guidelines

✅ **CORRECT APPROACH - SAY THIS:**
- "I'll analyze this data comprehensively right now."
- "Here's the complete analysis with visualizations:"
- Then IMMEDIATELY show the full analysis

❌ **NEVER SAY THESE PHRASES:**
- "What would you like to do with this data?"
- "Here are some common options:"
- "I can create a comprehensive analysis if you'd like!"
- Any sentence ending with "?" asking for user direction

❌ **FORBIDDEN BEHAVIORS:**
- Asking what the user wants
- Listing options for the user to choose from
- Waiting for user direction before analyzing
- Providing partial analysis that requires follow-up
- Describing what you COULD do instead of DOING it

## Usage

The skill provides a Python function `summarize_csv(file_path)` that returns comprehensive text summary with statistics and generates multiple visualizations automatically.

## Technical Details

**Dependencies:** python>=3.8, pandas>=2.0.0, matplotlib>=3.7.0, seaborn>=0.12.0

**Files:**
- `analyze.py` - Core analysis logic
- `requirements.txt` - Python dependencies
- `examples/` - Sample datasets for testing

Overview

This skill analyzes CSV files and generates a full set of summary statistics and visualizations automatically. I'll analyze this data comprehensively right now and return a complete, ready-to-read report without asking the user for direction. The output combines data structure inspection, quality checks, statistical summaries, and context-aware plots.

How this skill works

The skill reads the CSV into a pandas DataFrame and inspects column types, missing values, and date formats. It selects relevant analyses — numeric summaries, categorical distributions, time-series plots, correlation matrices, and data quality diagnostics — and generates visualizations with matplotlib/seaborn. All applicable plots are created automatically and included alongside textual summaries and key metrics.

When to use it

  • User uploads or references a CSV and wants an immediate full analysis.
  • You need a quick snapshot of dataset structure, quality, and key statistics.
  • Exploring unfamiliar tabular data before modeling or visualization work.
  • Validating data cleanliness, missingness, or unexpected value ranges.

Best practices

  • Provide the CSV with clear headers and consistent encoding (UTF-8 preferred).
  • Ensure date/time columns are in a parseable format for time-series detection.
  • For large files, consider sampling or using a smaller representative CSV to speed analysis.
  • Keep categorical labels consistent to produce meaningful distribution plots.
  • Review generated diagnostics and use them to guide downstream cleaning or modeling.

Example use cases

  • A data analyst uploads sales.csv to get automated summaries, revenue trends, and customer segment distributions.
  • A product manager drops in event logs to receive immediate time-series activity charts and anomaly highlights.
  • A researcher provides experiment_results.csv to obtain descriptive statistics, missing-data reports, and correlation heatmaps.
  • A QA engineer scans exported system metrics CSV to surface unexpected value ranges and data-quality issues.

FAQ

Will the skill ask what analysis I want before running?

No. The skill runs a complete, automatic analysis immediately and presents the full results without prompting.

Which visualizations are produced?

Only visualizations relevant to the data are produced: histograms for numeric columns, correlation heatmaps for multiple numerics, category distributions, and time-series plots if date/time columns are present.