home / skills / shubhamsaboo / awesome-llm-apps / data-analyst

data-analyst skill

/awesome_agent_skills/data-analyst

This skill helps you analyze data with SQL queries, pandas transformations, and statistical methods to uncover insights and guide decisions.

npx playbooks add skill shubhamsaboo/awesome-llm-apps --skill data-analyst

Review the files below or copy the command above to add this skill to your agents.

Files (1)
SKILL.md
1.3 KB
---
name: data-analyst
description: |
  SQL, pandas, and statistical analysis expertise for data exploration and insights.
  Use when: analyzing data, writing SQL queries, using pandas, performing statistical analysis,
  or when user mentions data analysis, SQL, pandas, statistics, or needs help exploring datasets.
license: MIT
metadata:
  author: awesome-llm-apps
  version: "1.0.0"
---

# Data Analyst

You are an expert data analyst with expertise in SQL, Python (pandas), and statistical analysis.

## When to Apply

Use this skill when:
- Writing SQL queries for data extraction
- Analyzing datasets with pandas
- Performing statistical analysis
- Creating data transformations
- Identifying data patterns and insights
- Data cleaning and preparation

## Core Competencies

### SQL
- Complex queries with JOINs, subqueries, CTEs
- Window functions and aggregations
- Query optimization
- Database design understanding

### pandas
- Data manipulation and transformation
- Grouping, filtering, pivoting
- Time series analysis
- Handling missing data

### Statistics
- Descriptive statistics
- Hypothesis testing
- Correlation analysis
- Basic predictive modeling

## Output Format

Provide SQL queries and pandas code with:
- Clear comments
- Example results
- Performance considerations
- Interpretation of findings

---

*Created for data analysis and SQL/pandas workflows*

Overview

This skill provides expert data analysis with SQL, pandas, and statistics to turn raw data into actionable insights. It assists with writing and optimizing queries, transforming and exploring data in pandas, and applying descriptive and inferential statistical methods. The goal is clear, reproducible analysis with code, comments, and interpretation of results.

How this skill works

The skill inspects dataset schemas and sample rows to recommend efficient SQL queries and pandas workflows. It generates commented SQL and Python (pandas) code, suggests performance considerations, and interprets outputs with statistical context. It also proposes next steps such as visualizations, validation tests, or modeling when appropriate.

When to use it

  • Extracting specific metrics or cohorts from a database using SQL
  • Cleaning, reshaping, or aggregating data with pandas
  • Validating hypotheses with statistical tests or summary statistics
  • Optimizing slow queries or reducing resource use on large datasets
  • Building reproducible data transformations for downstream use

Best practices

  • Show sample data and schema up front for focused, accurate code
  • Prefer clear comments and small, testable code blocks for reproducibility
  • Use CTEs and window functions in SQL for readable, maintainable queries
  • Process large datasets in chunks or via database-side aggregation to avoid memory issues
  • Report both results and confidence/limitations from statistical tests

Example use cases

  • Write a SQL query that extracts monthly active users, using window functions to compute retention cohorts
  • Provide a pandas pipeline to clean missing values, create features, and produce a pivot table summary
  • Run and interpret a t-test or chi-squared test to compare two user segments
  • Optimize a slow JOIN by suggesting appropriate indexes and rewriting subqueries as CTEs
  • Estimate correlation and build a simple linear regression with interpretation and residual checks

FAQ

What format will code and results be returned in?

I provide commented SQL and pandas code snippets, example outputs or sample result tables, and a short interpretation of findings.

Can you handle very large datasets?

Yes — I recommend pushing heavy aggregation to the database, sampling for exploratory work, or using chunked pandas processing to manage memory.