home / skills / starlitnightly / omicverse / data-export-excel

data-export-excel skill

/.claude/skills/data-export-excel

This skill exports bioinformatics results and tables to formatted Excel files using openpyxl, running locally for compatibility with all LLM providers.

npx playbooks add skill starlitnightly/omicverse --skill data-export-excel

Review the files below or copy the command above to add this skill to your agents.

Files (1)
SKILL.md
7.1 KB
---
name: data-export-excel
title: Excel Data Export (Universal)
description: Export analysis results, data tables, and formatted spreadsheets to Excel files using openpyxl. Works with ANY LLM provider (GPT, Gemini, Claude, etc.).
---

# Excel Data Export (Universal)

## Overview
This skill enables you to export bioinformatics data, analysis results, and formatted tables to professional Excel spreadsheets. Unlike cloud-hosted solutions, this skill uses the **openpyxl** Python library and executes **locally** in your environment, making it compatible with **ALL LLM providers** including GPT, Gemini, Claude, DeepSeek, and Qwen.

## When to Use This Skill
- Export AnnData observations (.obs) or variables (.var) to Excel
- Save DEG analysis results with formatting
- Create multi-sheet workbooks with different data types
- Generate formatted Excel reports with cell styling
- Export cluster annotations, cell type assignments, or quality control metrics

## How to Use

### Step 1: Import Required Libraries
```python
import openpyxl
from openpyxl import Workbook
from openpyxl.styles import Font, PatternFill, Alignment, Border, Side
from openpyxl.utils.dataframe import dataframe_to_rows
import pandas as pd
import numpy as np
```

### Step 2: Prepare Your Data
Convert your data to pandas DataFrame format:
```python
# Example: Export AnnData observations
df = adata.obs.copy()

# Example: Export DEG results
deg_df = pd.DataFrame({
    'gene': gene_names,
    'log2FC': log2_fold_changes,
    'pvalue': pvalues,
    'qvalue': qvalues
})

# Example: Export cluster statistics
cluster_stats = adata.obs.groupby('clusters').size().reset_index(name='count')
```

### Step 3: Create Excel Workbook
```python
# Create new workbook
wb = Workbook()
ws = wb.active
ws.title = "Sheet Name"

# Write DataFrame to worksheet
for r in dataframe_to_rows(df, index=False, header=True):
    ws.append(r)
```

### Step 4: Add Formatting (Optional)
```python
# Style header row
header_fill = PatternFill(start_color="4472C4", end_color="4472C4", fill_type="solid")
header_font = Font(bold=True, color="FFFFFF")

for cell in ws[1]:
    cell.fill = header_fill
    cell.font = header_font
    cell.alignment = Alignment(horizontal='center')

# Auto-adjust column widths
for column in ws.columns:
    max_length = 0
    column_letter = column[0].column_letter
    for cell in column:
        try:
            if len(str(cell.value)) > max_length:
                max_length = len(str(cell.value))
        except:
            pass
    adjusted_width = min(max_length + 2, 50)
    ws.column_dimensions[column_letter].width = adjusted_width

# Add borders
thin_border = Border(
    left=Side(style='thin'),
    right=Side(style='thin'),
    top=Side(style='thin'),
    bottom=Side(style='thin')
)
for row in ws.iter_rows(min_row=1, max_row=ws.max_row, min_col=1, max_col=ws.max_column):
    for cell in row:
        cell.border = thin_border
```

### Step 5: Save the Workbook
```python
# Save to file
output_path = "analysis_results.xlsx"
wb.save(output_path)
print(f"✅ Excel file saved to: {output_path}")
```

## Multi-Sheet Workbooks

Create workbooks with multiple sheets for different data types:

```python
wb = Workbook()

# Sheet 1: Cell metadata
ws1 = wb.active
ws1.title = "Cell Metadata"
for r in dataframe_to_rows(adata.obs, index=True, header=True):
    ws1.append(r)

# Sheet 2: Gene metadata
ws2 = wb.create_sheet("Gene Metadata")
for r in dataframe_to_rows(adata.var, index=True, header=True):
    ws2.append(r)

# Sheet 3: DEG results
ws3 = wb.create_sheet("DEG Results")
for r in dataframe_to_rows(deg_df, index=False, header=True):
    ws3.append(r)

wb.save("multi_sheet_analysis.xlsx")
```

## Best Practices

1. **Column Headers**: Always include column headers in the first row
2. **Data Types**: Convert numpy arrays to lists before writing
3. **Large Datasets**: For datasets >100K rows, consider CSV export instead
4. **File Paths**: Use absolute paths or ensure output directory exists
5. **Formatting**: Apply formatting sparingly to reduce file size
6. **Index**: Decide whether to include DataFrame index (set `index=True/False` in `dataframe_to_rows`)

## Common Use Cases

### Export Quality Control Metrics
```python
qc_metrics = adata.obs[['n_genes', 'n_counts', 'percent_mito', 'clusters']].copy()

wb = Workbook()
ws = wb.active
ws.title = "QC Metrics"

for r in dataframe_to_rows(qc_metrics, index=False, header=True):
    ws.append(r)

# Highlight cells with high mitochondrial content
for row in range(2, ws.max_row + 1):
    if ws.cell(row, 3).value > 0.2:  # percent_mito > 20%
        ws.cell(row, 3).fill = PatternFill(start_color="FFC7CE", end_color="FFC7CE", fill_type="solid")

wb.save("qc_metrics.xlsx")
```

### Export Marker Genes by Cluster
```python
# Assuming you have marker genes for each cluster
marker_dict = {
    'Cluster_0': ['CD3D', 'CD3E', 'CD8A'],
    'Cluster_1': ['CD79A', 'MS4A1', 'CD19'],
    'Cluster_2': ['LYZ', 'S100A9', 'CD14']
}

wb = Workbook()

for cluster_name, genes in marker_dict.items():
    ws = wb.create_sheet(cluster_name)
    ws.append(['Marker Gene'])
    for gene in genes:
        ws.append([gene])

# Remove default sheet
if 'Sheet' in wb.sheetnames:
    wb.remove(wb['Sheet'])

wb.save("marker_genes.xlsx")
```

### Export DEG Analysis with Conditional Formatting
```python
wb = Workbook()
ws = wb.active
ws.title = "DEG Analysis"

# Write DEG results
for r in dataframe_to_rows(deg_df, index=False, header=True):
    ws.append(r)

# Color code by fold change
for row in range(2, ws.max_row + 1):
    log2fc = ws.cell(row, 2).value  # Assuming log2FC in column 2
    if log2fc > 1:  # Upregulated
        ws.cell(row, 2).fill = PatternFill(start_color="C6EFCE", end_color="C6EFCE", fill_type="solid")
    elif log2fc < -1:  # Downregulated
        ws.cell(row, 2).fill = PatternFill(start_color="FFC7CE", end_color="FFC7CE", fill_type="solid")

wb.save("deg_results_formatted.xlsx")
```

## Troubleshooting

### Issue: "openpyxl not found"
**Solution**: Install the library:
```python
import subprocess
subprocess.check_call(['pip', 'install', 'openpyxl'])
```

### Issue: "Invalid data type for cell"
**Solution**: Convert numpy/pandas types to native Python types:
```python
# Convert numpy types
df = df.astype(object).where(pd.notnull(df), None)

# Or convert specific columns
df['column_name'] = df['column_name'].astype(str)
```

### Issue: "Memory error with large datasets"
**Solution**: Export in chunks or use CSV format instead:
```python
# Fallback to CSV for large data
df.to_csv('large_dataset.csv', index=False)
print("Dataset too large for Excel, saved as CSV instead")
```

## Technical Notes

- **Library**: Uses `openpyxl` (pure Python, no external dependencies)
- **Execution**: Runs locally in the agent's sandbox
- **Compatibility**: Works with ALL LLM providers (GPT, Gemini, Claude, DeepSeek, Qwen, etc.)
- **File Limits**: Excel has a 1,048,576 row limit (use CSV for larger datasets)
- **Performance**: Writing ~10K rows takes 1-2 seconds; 100K rows takes 10-20 seconds

## References
- openpyxl documentation: https://openpyxl.readthedocs.io/
- pandas DataFrame export: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_excel.html

Overview

This skill exports bioinformatics analysis results, tables, and formatted reports to Excel files using openpyxl. It runs locally and is provider-agnostic, so it works with any LLM provider. The tool produces multi-sheet workbooks, applies styling and conditional formatting, and handles AnnData, DEG tables, and other pandas DataFrames.

How this skill works

The skill converts input data (AnnData .obs/.var, pandas DataFrames, numpy arrays) into workbook sheets using openpyxl and dataframe_to_rows. It applies optional styling: header colors, borders, column width autosizing, and conditional fills for values like log2FC or percent mitochondrial. The workbook is saved to a local path; for very large tables the workflow falls back to chunked exports or CSV.

When to use it

  • Export AnnData observations (.obs) or variables (.var) to Excel
  • Save and style DEG results for sharing or publication
  • Build multi-sheet reports combining metadata, QC, and DEG tables
  • Generate Excel-ready marker gene lists per cluster
  • Highlight quality-control flags (high mito, low counts) with conditional colors

Best practices

  • Always include clear column headers in the first row
  • Convert numpy/pandas-specific dtypes to native Python types before writing
  • Limit heavy styling for very large sheets to reduce file size
  • Use absolute output paths and ensure the output directory exists
  • For datasets >100k rows prefer CSV or chunked Excel exports

Example use cases

  • Create a multi-sheet workbook with cell metadata, gene metadata, and DEG results for collaborator review
  • Export QC metrics and highlight cells with percent_mito >20% for quick filtering
  • Produce cluster-specific sheets listing marker genes for downstream validation
  • Format DEG tables with color coding for up/down regulation and save a publication-ready table

FAQ

What if openpyxl is not installed?

Install it with pip (pip install openpyxl) or use the provided subprocess install snippet before running exports.

How do I handle very large datasets?

For >100k rows write to CSV or export in chunks; Excel has a 1,048,576 row limit and large styled workbooks consume memory.