home / skills / dkyazzentwatwa / chatgpt-skills / data-storyteller

data-storyteller skill

/data-storyteller

This skill converts CSV or Excel data into narrative reports with auto-generated insights, visuals, and PDF exports to inform decisions.

npx playbooks add skill dkyazzentwatwa/chatgpt-skills --skill data-storyteller

Review the files below or copy the command above to add this skill to your agents.

Files (4)
SKILL.md
6.0 KB
---
name: data-storyteller
description: Transform CSV/Excel data into narrative reports with auto-generated insights, visualizations, and PDF export. Auto-detects patterns and creates plain-English summaries.
---

# Data Storyteller

Automatically transform raw data into compelling, insight-rich reports. Upload any CSV or Excel file and get back a complete analysis with visualizations, statistical summaries, and narrative explanations - all without writing code.

## Core Workflow

### 1. Load and Analyze Data

```python
from scripts.data_storyteller import DataStoryteller

# Initialize with your data file
storyteller = DataStoryteller("your_data.csv")

# Or from a pandas DataFrame
import pandas as pd
df = pd.read_csv("your_data.csv")
storyteller = DataStoryteller(df)
```

### 2. Generate Full Report

```python
# Generate comprehensive report
report = storyteller.generate_report()

# Access components
print(report['summary'])           # Executive summary
print(report['insights'])          # Key findings
print(report['statistics'])        # Statistical analysis
print(report['visualizations'])    # Generated chart info
```

### 3. Export Options

```python
# Export to PDF
storyteller.export_pdf("analysis_report.pdf")

# Export to HTML (interactive charts)
storyteller.export_html("analysis_report.html")

# Export charts only
storyteller.export_charts("charts/", format="png")
```

## Quick Start Examples

### Basic Analysis
```python
from scripts.data_storyteller import DataStoryteller

# One-liner full analysis
DataStoryteller("sales_data.csv").generate_report().export_pdf("report.pdf")
```

### Custom Analysis
```python
storyteller = DataStoryteller("data.csv")

# Focus on specific columns
storyteller.analyze_columns(['revenue', 'customers', 'date'])

# Set analysis parameters
report = storyteller.generate_report(
    include_correlations=True,
    include_outliers=True,
    include_trends=True,
    time_column='date',
    chart_style='business'
)
```

## Features

### Auto-Detection
- **Column Types**: Numeric, categorical, datetime, text, boolean
- **Data Quality**: Missing values, duplicates, outliers
- **Relationships**: Correlations, dependencies, groupings
- **Time Series**: Trends, seasonality, anomalies

### Generated Visualizations
| Data Type | Charts Generated |
|-----------|-----------------|
| Numeric | Histogram, box plot, trend line |
| Categorical | Bar chart, pie chart, frequency table |
| Time Series | Line chart, decomposition, forecast |
| Correlations | Heatmap, scatter matrix |
| Comparisons | Grouped bar, stacked area |

### Narrative Insights
The storyteller generates plain-English insights including:
- Executive summary of key findings
- Notable patterns and anomalies
- Statistical significance notes
- Actionable recommendations
- Data quality warnings

## Output Sections

### 1. Executive Summary
High-level overview of the dataset and key findings in 2-3 paragraphs.

### 2. Data Profile
- Row/column counts
- Memory usage
- Missing value analysis
- Duplicate detection
- Data type distribution

### 3. Statistical Analysis
For each numeric column:
- Central tendency (mean, median, mode)
- Dispersion (std dev, IQR, range)
- Distribution shape (skewness, kurtosis)
- Outlier count

### 4. Categorical Analysis
For each categorical column:
- Unique values count
- Top/bottom categories
- Frequency distribution
- Category balance assessment

### 5. Correlation Analysis
- Correlation matrix with significance
- Strongest relationships highlighted
- Multicollinearity warnings

### 6. Time-Based Analysis
If datetime column detected:
- Trend direction and strength
- Seasonality patterns
- Year-over-year comparisons
- Growth rate calculations

### 7. Visualizations
Auto-generated charts saved to report:
- Distribution plots
- Trend charts
- Comparison charts
- Correlation heatmaps

### 8. Recommendations
Data-driven suggestions:
- Columns needing attention
- Potential data quality fixes
- Analysis suggestions
- Business implications

## Chart Styles

```python
# Available styles
styles = ['business', 'scientific', 'minimal', 'dark', 'colorful']

storyteller.generate_report(chart_style='business')
```

## Configuration

```python
storyteller = DataStoryteller(df)

# Configure analysis
storyteller.config.update({
    'max_categories': 20,       # Max categories to show
    'outlier_method': 'iqr',    # 'iqr', 'zscore', 'isolation'
    'correlation_threshold': 0.5,
    'significance_level': 0.05,
    'date_format': 'auto',      # Or specify like '%Y-%m-%d'
    'language': 'en',           # Narrative language
})
```

## Supported File Formats

| Format | Extension | Notes |
|--------|-----------|-------|
| CSV | .csv | Auto-detect delimiter |
| Excel | .xlsx, .xls | Multi-sheet support |
| JSON | .json | Records or columnar |
| Parquet | .parquet | For large datasets |
| TSV | .tsv | Tab-separated |

## Example Output

### Sample Executive Summary
> "This dataset contains 10,847 records across 15 columns, covering sales transactions from January 2023 to December 2024. Revenue shows a strong upward trend (+23% YoY) with clear seasonal peaks in Q4. The top 3 product categories account for 67% of total revenue. Notable finding: Customer acquisition cost has increased 15% while retention rate dropped 8%, suggesting potential profitability concerns worth investigating."

### Sample Insight
> "Strong correlation detected between marketing_spend and new_customers (r=0.78, p<0.001). However, this relationship weakens significantly after $50K monthly spend, suggesting diminishing returns beyond this threshold."

## Best Practices

1. **Clean data first**: Remove obvious errors before analysis
2. **Name columns clearly**: Helps auto-detection and narratives
3. **Include dates**: Enables time-series analysis
4. **Provide context**: Tell the storyteller what the data represents

## Limitations

- Maximum recommended: 1M rows, 100 columns
- Complex nested data may need flattening
- Images/binary data not supported
- PDF export requires reportlab package

## Dependencies

```
pandas>=2.0.0
numpy>=1.24.0
matplotlib>=3.7.0
seaborn>=0.12.0
scipy>=1.10.0
reportlab>=4.0.0
openpyxl>=3.1.0
```

Overview

This skill transforms CSV or Excel data into polished narrative reports with auto-generated insights, visualizations, and PDF export. It detects column types, data quality issues, relationships, and time-series patterns, then produces plain-English summaries and actionable recommendations. Output includes statistical analyses, charts, and exportable PDF/HTML reports.

How this skill works

Upload a CSV, Excel, or DataFrame and the skill auto-detects column types, missing values, duplicates, outliers, and relationships. It generates statistical summaries for numeric fields, frequency analyses for categorical fields, correlation matrices, and time-series decomposition when a date column exists. Charts are produced automatically (histograms, box plots, line charts, heatmaps), and a narrative engine converts findings into executive summaries, insights, and recommendations. Export options include PDF, HTML with interactive charts, and image files for visualizations.

When to use it

  • Quickly summarize unfamiliar datasets without writing code
  • Produce executive-ready reports from sales, marketing, or operational data
  • Detect data quality issues before deeper analysis or modeling
  • Generate time-series trend and seasonality reports when dates are present
  • Create visual reports for stakeholders with minimal setup

Best practices

  • Provide clearly named columns and a date column for time analysis
  • Pre-clean obvious errors and remove irrelevant columns for faster, cleaner reports
  • Limit analyses to recommended sizes (up to ~1M rows, ~100 columns) for performance
  • Adjust config settings like outlier method, correlation threshold, and max categories to match your needs
  • Include brief context about the dataset to improve recommendation relevance

Example use cases

  • Monthly sales dashboard: auto-generate trends, top products, and seasonality notes with a downloadable PDF
  • Marketing spend analysis: detect correlations, diminishing returns, and recommend budget reallocation
  • Customer analytics: summarize retention, acquisition cost trends, and highlight anomalies
  • Data QA pass: identify missing data, duplicates, and columns needing attention before modeling
  • Board-ready report: produce a concise executive summary and visual appendix for stakeholder presentations

FAQ

What file formats are supported?

CSV, Excel (.xlsx/.xls), JSON, Parquet, and TSV are supported; multi-sheet Excel is handled automatically.

Can I customize analysis parameters?

Yes. You can set options like outlier method, correlation threshold, significance level, max categories, chart style, and date format via the configuration API.