home / skills / starlitnightly / omicverse / data-export-excel

data-export-excel skill

safe

This skill exports bioinformatics results and tables to formatted Excel files using openpyxl, running locally for compatibility with all LLM providers.

npx playbooks add skill starlitnightly/omicverse --skill data-export-excel

Review the files below or copy the command above to add this skill to your agents.

Files (1)

SKILL.md

7.1 KB

---
name: data-export-excel
title: Excel Data Export (Universal)
description: Export analysis results, data tables, and formatted spreadsheets to Excel files using openpyxl. Works with ANY LLM provider (GPT, Gemini, Claude, etc.).
---

# Excel Data Export (Universal)

## Overview
This skill enables you to export bioinformatics data, analysis results, and formatted tables to professional Excel spreadsheets. Unlike cloud-hosted solutions, this skill uses the **openpyxl** Python library and executes **locally** in your environment, making it compatible with **ALL LLM providers** including GPT, Gemini, Claude, DeepSeek, and Qwen.

## When to Use This Skill
- Export AnnData observations (.obs) or variables (.var) to Excel
- Save DEG analysis results with formatting
- Create multi-sheet workbooks with different data types
- Generate formatted Excel reports with cell styling
- Export cluster annotations, cell type assignments, or quality control metrics

## How to Use

### Step 1: Import Required Libraries
```python
import openpyxl
from openpyxl import Workbook
from openpyxl.styles import Font, PatternFill, Alignment, Border, Side
from openpyxl.utils.dataframe import dataframe_to_rows
import pandas as pd
import numpy as np
```

### Step 2: Prepare Your Data
Convert your data to pandas DataFrame format:
```python
# Example: Export AnnData observations
df = adata.obs.copy()

# Example: Export DEG results
deg_df = pd.DataFrame({
    'gene': gene_names,
    'log2FC': log2_fold_changes,
    'pvalue': pvalues,
    'qvalue': qvalues
})

# Example: Export cluster statistics
cluster_stats = adata.obs.groupby('clusters').size().reset_index(name='count')
```

### Step 3: Create Excel Workbook
```python
# Create new workbook
wb = Workbook()
ws = wb.active
ws.title = "Sheet Name"

# Write DataFrame to worksheet
for r in dataframe_to_rows(df, index=False, header=True):
    ws.append(r)
```

### Step 4: Add Formatting (Optional)
```python
# Style header row
header_fill = PatternFill(start_color="4472C4", end_color="4472C4", fill_type="solid")
header_font = Font(bold=True, color="FFFFFF")

for cell in ws[1]:
    cell.fill = header_fill
    cell.font = header_font
    cell.alignment = Alignment(horizontal='center')

# Auto-adjust column widths
for column in ws.columns:
    max_length = 0
    column_letter = column[0].column_letter
    for cell in column:
        try:
            if len(str(cell.value)) > max_length:
                max_length = len(str(cell.value))
        except:
            pass
    adjusted_width = min(max_length + 2, 50)
    ws.column_dimensions[column_letter].width = adjusted_width

# Add borders
thin_border = Border(
    left=Side(style='thin'),
    right=Side(style='thin'),
    top=Side(style='thin'),
    bottom=Side(style='thin')
)
for row in ws.iter_rows(min_row=1, max_row=ws.max_row, min_col=1, max_col=ws.max_column):
    for cell in row:
        cell.border = thin_border
```

### Step 5: Save the Workbook
```python
# Save to file
output_path = "analysis_results.xlsx"
wb.save(output_path)
print(f"✅ Excel file saved to: {output_path}")
```

## Multi-Sheet Workbooks

Create workbooks with multiple sheets for different data types:

```python
wb = Workbook()

# Sheet 1: Cell metadata
ws1 = wb.active
ws1.title = "Cell Metadata"
for r in dataframe_to_rows(adata.obs, index=True, header=True):
    ws1.append(r)

# Sheet 2: Gene metadata
ws2 = wb.create_sheet("Gene Metadata")
for r in dataframe_to_rows(adata.var, index=True, header=True):
    ws2.append(r)

# Sheet 3: DEG results
ws3 = wb.create_sheet("DEG Results")
for r in dataframe_to_rows(deg_df, index=False, header=True):
    ws3.append(r)

wb.save("multi_sheet_analysis.xlsx")
```

## Best Practices

1. **Column Headers**: Always include column headers in the first row
2. **Data Types**: Convert numpy arrays to lists before writing
3. **Large Datasets**: For datasets >100K rows, consider CSV export instead
4. **File Paths**: Use absolute paths or ensure output directory exists
5. **Formatting**: Apply formatting sparingly to reduce file size
6. **Index**: Decide whether to include DataFrame index (set `index=True/False` in `dataframe_to_rows`)

## Common Use Cases

### Export Quality Control Metrics
```python
qc_metrics = adata.obs[['n_genes', 'n_counts', 'percent_mito', 'clusters']].copy()

wb = Workbook()
ws = wb.active
ws.title = "QC Metrics"

for r in dataframe_to_rows(qc_metrics, index=False, header=True):
    ws.append(r)

# Highlight cells with high mitochondrial content
for row in range(2, ws.max_row + 1):
    if ws.cell(row, 3).value > 0.2:  # percent_mito > 20%
        ws.cell(row, 3).fill = PatternFill(start_color="FFC7CE", end_color="FFC7CE", fill_type="solid")

wb.save("qc_metrics.xlsx")
```

### Export Marker Genes by Cluster
```python
# Assuming you have marker genes for each cluster
marker_dict = {
    'Cluster_0': ['CD3D', 'CD3E', 'CD8A'],
    'Cluster_1': ['CD79A', 'MS4A1', 'CD19'],
    'Cluster_2': ['LYZ', 'S100A9', 'CD14']
}

wb = Workbook()

for cluster_name, genes in marker_dict.items():
    ws = wb.create_sheet(cluster_name)
    ws.append(['Marker Gene'])
    for gene in genes:
        ws.append([gene])

# Remove default sheet
if 'Sheet' in wb.sheetnames:
    wb.remove(wb['Sheet'])

wb.save("marker_genes.xlsx")
```

### Export DEG Analysis with Conditional Formatting
```python
wb = Workbook()
ws = wb.active
ws.title = "DEG Analysis"

# Write DEG results
for r in dataframe_to_rows(deg_df, index=False, header=True):
    ws.append(r)

# Color code by fold change
for row in range(2, ws.max_row + 1):
    log2fc = ws.cell(row, 2).value  # Assuming log2FC in column 2
    if log2fc > 1:  # Upregulated
        ws.cell(row, 2).fill = PatternFill(start_color="C6EFCE", end_color="C6EFCE", fill_type="solid")
    elif log2fc < -1:  # Downregulated
        ws.cell(row, 2).fill = PatternFill(start_color="FFC7CE", end_color="FFC7CE", fill_type="solid")

wb.save("deg_results_formatted.xlsx")
```

## Troubleshooting

### Issue: "openpyxl not found"
**Solution**: Install the library:
```python
import subprocess
subprocess.check_call(['pip', 'install', 'openpyxl'])
```

### Issue: "Invalid data type for cell"
**Solution**: Convert numpy/pandas types to native Python types:
```python
# Convert numpy types
df = df.astype(object).where(pd.notnull(df), None)

# Or convert specific columns
df['column_name'] = df['column_name'].astype(str)
```

### Issue: "Memory error with large datasets"
**Solution**: Export in chunks or use CSV format instead:
```python
# Fallback to CSV for large data
df.to_csv('large_dataset.csv', index=False)
print("Dataset too large for Excel, saved as CSV instead")
```

## Technical Notes

- **Library**: Uses `openpyxl` (pure Python, no external dependencies)
- **Execution**: Runs locally in the agent's sandbox
- **Compatibility**: Works with ALL LLM providers (GPT, Gemini, Claude, DeepSeek, Qwen, etc.)
- **File Limits**: Excel has a 1,048,576 row limit (use CSV for larger datasets)
- **Performance**: Writing ~10K rows takes 1-2 seconds; 100K rows takes 10-20 seconds

## References
- openpyxl documentation: https://openpyxl.readthedocs.io/
- pandas DataFrame export: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_excel.html

Overview

This skill exports bioinformatics analysis results, tables, and formatted reports to Excel files using openpyxl. It runs locally and is provider-agnostic, so it works with any LLM provider. The tool produces multi-sheet workbooks, applies styling and conditional formatting, and handles AnnData, DEG tables, and other pandas DataFrames.

How this skill works

The skill converts input data (AnnData .obs/.var, pandas DataFrames, numpy arrays) into workbook sheets using openpyxl and dataframe_to_rows. It applies optional styling: header colors, borders, column width autosizing, and conditional fills for values like log2FC or percent mitochondrial. The workbook is saved to a local path; for very large tables the workflow falls back to chunked exports or CSV.

When to use it

Export AnnData observations (.obs) or variables (.var) to Excel
Save and style DEG results for sharing or publication
Build multi-sheet reports combining metadata, QC, and DEG tables
Generate Excel-ready marker gene lists per cluster
Highlight quality-control flags (high mito, low counts) with conditional colors

Best practices

Always include clear column headers in the first row
Convert numpy/pandas-specific dtypes to native Python types before writing
Limit heavy styling for very large sheets to reduce file size
Use absolute output paths and ensure the output directory exists
For datasets >100k rows prefer CSV or chunked Excel exports

Example use cases

Create a multi-sheet workbook with cell metadata, gene metadata, and DEG results for collaborator review
Export QC metrics and highlight cells with percent_mito >20% for quick filtering
Produce cluster-specific sheets listing marker genes for downstream validation
Format DEG tables with color coding for up/down regulation and save a publication-ready table

FAQ

What if openpyxl is not installed?

Install it with pip (pip install openpyxl) or use the provided subprocess install snippet before running exports.

How do I handle very large datasets?

For >100k rows write to CSV or export in chunks; Excel has a 1,048,576 row limit and large styled workbooks consume memory.