home / skills / microck / ordinary-claude-skills / excel-analysis

excel-analysis skill

/skills_all/excel-analysis

This skill analyzes Excel spreadsheets, creates pivot tables and charts, and generates actionable insights to accelerate data-driven decision making.

npx playbooks add skill microck/ordinary-claude-skills --skill excel-analysis

Review the files below or copy the command above to add this skill to your agents.

Files (2)
SKILL.md
5.0 KB
---
name: Excel Analysis
description: Analyze Excel spreadsheets, create pivot tables, generate charts, and perform data analysis. Use when analyzing Excel files, spreadsheets, tabular data, or .xlsx files.
---

# Excel Analysis

## Quick start

Read Excel files with pandas:

```python
import pandas as pd

# Read Excel file
df = pd.read_excel("data.xlsx", sheet_name="Sheet1")

# Display first few rows
print(df.head())

# Basic statistics
print(df.describe())
```

## Reading multiple sheets

Process all sheets in a workbook:

```python
import pandas as pd

# Read all sheets
excel_file = pd.ExcelFile("workbook.xlsx")

for sheet_name in excel_file.sheet_names:
    df = pd.read_excel(excel_file, sheet_name=sheet_name)
    print(f"\n{sheet_name}:")
    print(df.head())
```

## Data analysis

Perform common analysis tasks:

```python
import pandas as pd

df = pd.read_excel("sales.xlsx")

# Group by and aggregate
sales_by_region = df.groupby("region")["sales"].sum()
print(sales_by_region)

# Filter data
high_sales = df[df["sales"] > 10000]

# Calculate metrics
df["profit_margin"] = (df["revenue"] - df["cost"]) / df["revenue"]

# Sort by column
df_sorted = df.sort_values("sales", ascending=False)
```

## Creating Excel files

Write data to Excel with formatting:

```python
import pandas as pd

df = pd.DataFrame({
    "Product": ["A", "B", "C"],
    "Sales": [100, 200, 150],
    "Profit": [20, 40, 30]
})

# Write to Excel
writer = pd.ExcelWriter("output.xlsx", engine="openpyxl")
df.to_excel(writer, sheet_name="Sales", index=False)

# Get worksheet for formatting
worksheet = writer.sheets["Sales"]

# Auto-adjust column widths
for column in worksheet.columns:
    max_length = 0
    column_letter = column[0].column_letter
    for cell in column:
        if len(str(cell.value)) > max_length:
            max_length = len(str(cell.value))
    worksheet.column_dimensions[column_letter].width = max_length + 2

writer.close()
```

## Pivot tables

Create pivot tables programmatically:

```python
import pandas as pd

df = pd.read_excel("sales_data.xlsx")

# Create pivot table
pivot = pd.pivot_table(
    df,
    values="sales",
    index="region",
    columns="product",
    aggfunc="sum",
    fill_value=0
)

print(pivot)

# Save pivot table
pivot.to_excel("pivot_report.xlsx")
```

## Charts and visualization

Generate charts from Excel data:

```python
import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_excel("data.xlsx")

# Create bar chart
df.plot(x="category", y="value", kind="bar")
plt.title("Sales by Category")
plt.xlabel("Category")
plt.ylabel("Sales")
plt.tight_layout()
plt.savefig("chart.png")

# Create pie chart
df.set_index("category")["value"].plot(kind="pie", autopct="%1.1f%%")
plt.title("Market Share")
plt.ylabel("")
plt.savefig("pie_chart.png")
```

## Data cleaning

Clean and prepare Excel data:

```python
import pandas as pd

df = pd.read_excel("messy_data.xlsx")

# Remove duplicates
df = df.drop_duplicates()

# Handle missing values
df = df.fillna(0)  # or df.dropna()

# Remove whitespace
df["name"] = df["name"].str.strip()

# Convert data types
df["date"] = pd.to_datetime(df["date"])
df["amount"] = pd.to_numeric(df["amount"], errors="coerce")

# Save cleaned data
df.to_excel("cleaned_data.xlsx", index=False)
```

## Merging and joining

Combine multiple Excel files:

```python
import pandas as pd

# Read multiple files
df1 = pd.read_excel("sales_q1.xlsx")
df2 = pd.read_excel("sales_q2.xlsx")

# Concatenate vertically
combined = pd.concat([df1, df2], ignore_index=True)

# Merge on common column
customers = pd.read_excel("customers.xlsx")
sales = pd.read_excel("sales.xlsx")

merged = pd.merge(sales, customers, on="customer_id", how="left")

merged.to_excel("merged_data.xlsx", index=False)
```

## Advanced formatting

Apply conditional formatting and styles:

```python
import pandas as pd
from openpyxl import load_workbook
from openpyxl.styles import PatternFill, Font

# Create Excel file
df = pd.DataFrame({
    "Product": ["A", "B", "C"],
    "Sales": [100, 200, 150]
})

df.to_excel("formatted.xlsx", index=False)

# Load workbook for formatting
wb = load_workbook("formatted.xlsx")
ws = wb.active

# Apply conditional formatting
red_fill = PatternFill(start_color="FF0000", end_color="FF0000", fill_type="solid")
green_fill = PatternFill(start_color="00FF00", end_color="00FF00", fill_type="solid")

for row in range(2, len(df) + 2):
    cell = ws[f"B{row}"]
    if cell.value < 150:
        cell.fill = red_fill
    else:
        cell.fill = green_fill

# Bold headers
for cell in ws[1]:
    cell.font = Font(bold=True)

wb.save("formatted.xlsx")
```

## Performance tips

- Use `read_excel` with `usecols` to read specific columns only
- Use `chunksize` for very large files
- Consider using `engine='openpyxl'` or `engine='xlrd'` based on file type
- Use `dtype` parameter to specify column types for faster reading

## Available packages

- **pandas** - Data analysis and manipulation (primary)
- **openpyxl** - Excel file creation and formatting
- **xlrd** - Reading older .xls files
- **xlsxwriter** - Advanced Excel writing capabilities
- **matplotlib** - Chart generation

Overview

This skill analyzes Excel spreadsheets, creates pivot tables, generates charts, and performs common data-cleaning and transformation tasks. It uses pandas for reading and manipulating .xlsx files and integrates openpyxl, xlsxwriter, and matplotlib for formatting and visualization. The skill is designed to turn raw tabular data into summarized reports, charts, and formatted Excel outputs. It supports multi-sheet workbooks, merging files, and performance options for large datasets.

How this skill works

The skill reads Excel workbooks into pandas DataFrames, optionally loading specific sheets or columns to save memory. It performs grouping, aggregation, filtering, joins, and pivot table creation using pandas functions, then writes results back to Excel with optional formatting via openpyxl or xlsxwriter. Charts are generated with matplotlib and saved as image files or embedded into workbooks. It also provides data-cleaning helpers for duplicates, missing values, type conversion, and whitespace trimming.

When to use it

  • You need a quick summary or pivot report from one or more .xlsx files.
  • You must clean and standardize messy Excel data before analysis or loading into other systems.
  • You want to generate charts (bar, pie, line) from spreadsheet data for presentations or dashboards.
  • You need to merge multiple workbooks or sheets into a single consolidated file.
  • You want formatted Excel outputs with column widths, conditional formatting, or styled headers.

Best practices

  • Read only needed columns (usecols) and set dtype for large files to improve speed and memory usage.
  • Use chunksize for very large workbooks and concat chunks incrementally to avoid memory spikes.
  • Validate and coerce data types early (to_datetime, to_numeric) to prevent aggregation errors later.
  • Save intermediate results to Excel when workflows are long; use descriptive sheet names for traceability.
  • Prefer openpyxl or xlsxwriter explicitly as the engine to avoid compatibility issues with specific Excel features.

Example use cases

  • Create a sales pivot table by region and product and export it as a new workbook.
  • Clean a customer list by removing duplicates, trimming whitespace, and converting dates before importing into a CRM.
  • Merge quarterly sales files into a single year-to-date report and calculate cumulative metrics.
  • Generate a bar chart of revenue by category and save it as PNG for a slide deck.
  • Apply conditional formatting to highlight low-performing products and save a styled Excel report.

FAQ

Which Python packages are required?

pandas is primary; use openpyxl or xlsxwriter for writing/formatting, xlrd for older .xls, and matplotlib for charts.

How do I handle very large Excel files?

Read specific columns with usecols, use chunksize to iterate in pieces, and set dtype to reduce memory; consider pre-filtering in Excel if possible.