home / skills / jst-well-dan / skill-box / csv-data-summarizer

csv-data-summarizer skill

/business-analyst/csv-data-summarizer

This skill analyzes a CSV file comprehensively, generates summary statistics and visualizations, and delivers actionable insights immediately.

This is most likely a fork of the csv-data-summarizer-claude-skill skill from coffeefuelbump
npx playbooks add skill jst-well-dan/skill-box --skill csv-data-summarizer

Review the files below or copy the command above to add this skill to your agents.

Files (7)
SKILL.md
5.6 KB
---
name: csv-data-summarizer
description: Analyzes CSV files, generates summary stats, and plots quick visualizations using Python and pandas.
metadata:
  version: 2.1.0
  dependencies: python>=3.8, pandas>=2.0.0, matplotlib>=3.7.0, seaborn>=0.12.0
---

# CSV Data Summarizer

This Skill analyzes CSV files and provides comprehensive summaries with statistical insights and visualizations.

## When to Use This Skill

Claude should use this Skill whenever the user:
- Uploads or references a CSV file
- Asks to summarize, analyze, or visualize tabular data
- Requests insights from CSV data
- Wants to understand data structure and quality

## How It Works

## ⚠️ CRITICAL BEHAVIOR REQUIREMENT ⚠️

**DO NOT ASK THE USER WHAT THEY WANT TO DO WITH THE DATA.**
**DO NOT OFFER OPTIONS OR CHOICES.**
**DO NOT SAY "What would you like me to help you with?"**
**DO NOT LIST POSSIBLE ANALYSES.**

**IMMEDIATELY AND AUTOMATICALLY:**
1. Run the comprehensive analysis
2. Generate ALL relevant visualizations
3. Present complete results
4. NO questions, NO options, NO waiting for user input

**THE USER WANTS A FULL ANALYSIS RIGHT AWAY - JUST DO IT.**

### Automatic Analysis Steps:

**The skill intelligently adapts to different data types and industries by inspecting the data first, then determining what analyses are most relevant.**

1. **Load and inspect** the CSV file into pandas DataFrame
2. **Identify data structure** - column types, date columns, numeric columns, categories
3. **Determine relevant analyses** based on what's actually in the data:
   - **Sales/E-commerce data** (order dates, revenue, products): Time-series trends, revenue analysis, product performance
   - **Customer data** (demographics, segments, regions): Distribution analysis, segmentation, geographic patterns
   - **Financial data** (transactions, amounts, dates): Trend analysis, statistical summaries, correlations
   - **Operational data** (timestamps, metrics, status): Time-series, performance metrics, distributions
   - **Survey data** (categorical responses, ratings): Frequency analysis, cross-tabulations, distributions
   - **Generic tabular data**: Adapts based on column types found

4. **Only create visualizations that make sense** for the specific dataset:
   - Time-series plots ONLY if date/timestamp columns exist
   - Correlation heatmaps ONLY if multiple numeric columns exist
   - Category distributions ONLY if categorical columns exist
   - Histograms for numeric distributions when relevant
   
5. **Generate comprehensive output** automatically including:
   - Data overview (rows, columns, types)
   - Key statistics and metrics relevant to the data type
   - Missing data analysis
   - Multiple relevant visualizations (only those that apply)
   - Actionable insights based on patterns found in THIS specific dataset
   
6. **Present everything** in one complete analysis - no follow-up questions

**Example adaptations:**
- Healthcare data with patient IDs → Focus on demographics, treatment patterns, temporal trends
- Inventory data with stock levels → Focus on quantity distributions, reorder patterns, SKU analysis  
- Web analytics with timestamps → Focus on traffic patterns, conversion metrics, time-of-day analysis
- Survey responses → Focus on response distributions, demographic breakdowns, sentiment patterns

### Behavior Guidelines

✅ **CORRECT APPROACH - SAY THIS:**
- "I'll analyze this data comprehensively right now."
- "Here's the complete analysis with visualizations:"
- "I've identified this as [type] data and generated relevant insights:"
- Then IMMEDIATELY show the full analysis

✅ **DO:**
- Immediately run the analysis script
- Generate ALL relevant charts automatically
- Provide complete insights without being asked
- Be thorough and complete in first response
- Act decisively without asking permission

❌ **NEVER SAY THESE PHRASES:**
- "What would you like to do with this data?"
- "What would you like me to help you with?"
- "Here are some common options:"
- "Let me know what you'd like help with"
- "I can create a comprehensive analysis if you'd like!"
- Any sentence ending with "?" asking for user direction
- Any list of options or choices
- Any conditional "I can do X if you want"

❌ **FORBIDDEN BEHAVIORS:**
- Asking what the user wants
- Listing options for the user to choose from
- Waiting for user direction before analyzing
- Providing partial analysis that requires follow-up
- Describing what you COULD do instead of DOING it

### Usage

The Skill provides a Python function `summarize_csv(file_path)` that:
- Accepts a path to a CSV file
- Returns a comprehensive text summary with statistics
- Generates multiple visualizations automatically based on data structure

### Example Prompts

> "Here's `sales_data.csv`. Can you summarize this file?"

> "Analyze this customer data CSV and show me trends."

> "What insights can you find in `orders.csv`?"

### Example Output

**Dataset Overview**
- 5,000 rows × 8 columns  
- 3 numeric columns, 1 date column  

**Summary Statistics**
- Average order value: $58.2  
- Standard deviation: $12.4
- Missing values: 2% (100 cells)

**Insights**
- Sales show upward trend over time
- Peak activity in Q4
*(Attached: trend plot)*

## Files

- `analyze.py` - Core analysis logic
- `requirements.txt` - Python dependencies
- `resources/sample.csv` - Example dataset for testing
- `resources/README.md` - Additional documentation

## Notes

- Automatically detects date columns (columns containing 'date' in name)
- Handles missing data gracefully
- Generates visualizations only when date columns are present
- All numeric columns are included in statistical summary

Overview

This skill analyzes CSV files to produce a complete, automated summary with statistics and visualizations. It runs a full analysis immediately on the provided file and returns a consolidated text report plus plots. Use the summarize_csv(file_path) function to get a ready-to-read summary and relevant charts without further prompts.

How this skill works

The skill loads the CSV into a pandas DataFrame, infers column types (numeric, categorical, date), and inspects data quality and missingness. It automatically selects and runs relevant analyses — descriptive stats, correlations, time-series trends, and category distributions — then generates only the visualizations that make sense for the detected columns. Finally, it assembles a single comprehensive report containing overview metrics, key statistics, missing-data diagnostics, visual outputs, and concise actionable insights.

When to use it

  • When a CSV file is uploaded or referenced and you want a full, immediate analysis.
  • To get quick descriptive statistics and data-quality checks across tabular data.
  • To discover time trends automatically if the file contains date/timestamp columns.
  • When you need correlation matrices, histograms, or category frequency charts without manual setup.
  • To obtain an end-to-end report for sales, customer, financial, operational, or survey data.

Best practices

  • Provide a well-formatted CSV with headers in the first row and consistent delimiters.
  • Name date-related columns using clear terms (e.g., 'date', 'order_date') to improve automatic detection.
  • Ensure sensitive or personal data is anonymized before uploading for analysis.
  • Include representative samples if the dataset is very large to speed iterative exploration.
  • Use consistent category labels and numeric formats to improve accuracy of summaries and charts.

Example use cases

  • Summarize monthly revenue and product performance from an e-commerce orders CSV.
  • Inspect customer demographics and segment distributions from a CRM export.
  • Analyze transaction time-series and detect trends or seasonality in financial logs.
  • Assess data quality, missingness, and basic statistics for a survey response file.
  • Generate histograms and correlation heatmaps for a metrics dataset with multiple numeric columns.

FAQ

What output does the skill produce?

It returns a consolidated text summary with dataset overview, descriptive statistics, missing-data diagnostics, and actionable insights, plus generated plots saved or embedded depending on the environment.

Which visualizations are created?

Only charts that match the detected data: time-series for date columns, histograms for numeric fields, bar charts for categorical distributions, and correlation heatmaps when multiple numeric columns exist.