home / skills / 224-industries / 224-agent-skills / csv-data-analyst
This skill analyzes uploaded CSV files with Python and pandas, providing comprehensive statistics, missing data insights, and relevant visualizations.
npx playbooks add skill 224-industries/224-agent-skills --skill csv-data-analystReview the files below or copy the command above to add this skill to your agents.
---
name: csv-data-analyst
description: Analyze CSV files, generate summary statistics, and create visualizations using Python and pandas. Use when the user uploads, attaches, or references a CSV file, asks to summarize or analyze tabular data, requests insights from CSV data, or wants to understand data structure and quality.
license: MIT
compatibility: "python>=3.8, pandas>=2.0.0, matplotlib>=3.7.0, seaborn>=0.12.0"
metadata:
author: "[Ben Sabic](https://bensabic.ca)"
role: "Fractional CTO"
version: "1.0.0"
---
# CSV Data Analyst
This skill analyzes CSV files and provides comprehensive summaries with statistical insights and visualizations.
## When to Use This Skill
Claude MUST use this Skill whenever the user:
- Uploads, attaches or references a CSV file
- Asks to summarize, analyze, or visualize tabular data
- Requests insights from CSV data
- Wants to understand data structure and quality
## ⚠️ CRITICAL BEHAVIOR REQUIREMENT ⚠️
**DO NOT ASK THE USER WHAT THEY WANT TO DO WITH THE DATA.**
**DO NOT OFFER OPTIONS OR CHOICES.**
**DO NOT SAY "What would you like me to help you with?"**
**DO NOT LIST POSSIBLE ANALYSES.**
**IMMEDIATELY AND AUTOMATICALLY:**
1. Run the comprehensive analysis
2. Generate ALL relevant visualizations
3. Present complete results
4. NO questions, NO options, NO waiting for user input
**THE USER WANTS A FULL ANALYSIS RIGHT AWAY - JUST DO IT.**
### Automatic Analysis Steps:
**The skill intelligently adapts to different data types and industries by inspecting the data first, then determining what analyses are most relevant.**
1. **Load and inspect** the CSV file into pandas DataFrame
2. **Identify data structure** - column types, date columns, numeric columns, categories
3. **Determine relevant analyses** based on what's actually in the data:
- **Sales/E-commerce data** (order dates, revenue, products): Time-series trends, revenue analysis, product performance
- **Customer data** (demographics, segments, regions): Distribution analysis, segmentation, geographic patterns
- **Financial data** (transactions, amounts, dates): Trend analysis, statistical summaries, correlations
- **Operational data** (timestamps, metrics, status): Time-series, performance metrics, distributions
- **Survey data** (categorical responses, ratings): Frequency analysis, cross-tabulations, distributions
- **Generic tabular data**: Adapts based on column types found
4. **Only create visualizations that make sense** for the specific dataset:
- Time-series plots ONLY if date/timestamp columns exist
- Correlation heatmaps ONLY if multiple numeric columns exist
- Category distributions ONLY if categorical columns exist
- Histograms for numeric distributions when relevant
5. **Generate comprehensive output** automatically including:
- Data overview (rows, columns, types)
- Key statistics and metrics relevant to the data type
- Missing data analysis
- Multiple relevant visualizations (only those that apply)
- Actionable insights based on patterns found in THIS specific dataset
6. **Present everything** in one complete analysis - no follow-up questions
**Example adaptations:**
- Healthcare data with patient IDs → Focus on demographics, treatment patterns, temporal trends
- Inventory data with stock levels → Focus on quantity distributions, reorder patterns, SKU analysis
- Web analytics with timestamps → Focus on traffic patterns, conversion metrics, time-of-day analysis
- Survey responses → Focus on response distributions, demographic breakdowns, sentiment patterns
### Behavior Guidelines
✅ **CORRECT APPROACH - SAY THIS:**
- "I'll analyze this data comprehensively right now."
- "Here's the complete analysis with visualizations:"
- "I've identified this as [type] data and generated relevant insights:"
- Then IMMEDIATELY show the full analysis
✅ **DO:**
- Immediately run the analysis script
- Generate ALL relevant charts automatically
- Provide complete insights without being asked
- Be thorough and complete in first response
- Act decisively without asking permission
❌ **NEVER SAY THESE PHRASES:**
- "What would you like to do with this data?"
- "What would you like me to help you with?"
- "Here are some common options:"
- "Let me know what you'd like help with"
- "I can create a comprehensive analysis if you'd like!"
- Any sentence ending with "?" asking for user direction
- Any list of options or choices
- Any conditional "I can do X if you want"
❌ **FORBIDDEN BEHAVIORS:**
- Asking what the user wants
- Listing options for the user to choose from
- Waiting for user direction before analyzing
- Providing partial analysis that requires follow-up
- Describing what you COULD do instead of DOING it
### Usage
The Skill provides a Python function `summarize_csv(file_path)` that:
- Accepts a path to a CSV file
- Returns a comprehensive text summary with statistics
- Generates multiple visualizations automatically based on data structure
### Example Prompts
> "Here's `sales_data.csv`. Can you summarize this file?"
> "Analyze this customer data CSV and show me trends."
> "What insights can you find in `orders.csv`?"
### Example Output
**Dataset Overview**
- 5,000 rows × 8 columns
- 3 numeric columns, 1 date column
**Summary Statistics**
- Average order value: $58.2
- Standard deviation: $12.4
- Missing values: 2% (100 cells)
**Insights**
- Sales show upward trend over time
- Peak activity in Q4
*(Attached: trend plot)*
## Files
- `scripts/analyze.py` - Core analysis logic
- `assets/sample.csv` - Example dataset for testing
## Notes
- Automatically detects date columns (columns containing 'date' or 'time' in name)
- Handles missing data gracefully
- Generates visualizations based on data types present (time-series, distributions, correlations, categorical)
- All numeric columns are included in statistical summary
This skill analyzes CSV files to produce a complete, automated data summary and visual report. It inspects structure, produces statistical summaries, detects data types and missing values, and generates relevant visualizations. The skill is optimized for immediate, end-to-end analysis whenever a CSV is provided—no follow-up questions required.
The skill loads the CSV into a pandas DataFrame and inspects columns to identify numeric, categorical, and date/time fields. It selects appropriate analyses (time-series, distributions, correlations, category counts) based on detected column types and generates visualizations only where they make sense. Output includes data overview, missing-data diagnostics, key statistics for numeric fields, category breakdowns, and actionable insights tailored to the dataset.
What does the analysis include by default?
A data overview (rows/columns/types), missing-value summary, numeric statistics, relevant visualizations (time-series, histograms, correlation heatmap, category counts), and actionable insights tailored to the file.
Can I limit which charts are generated?
The skill is designed to run a comprehensive, automatic analysis and will generate only the visualizations that are appropriate for the detected column types.