home / skills / starlitnightly / omicverse / data-stats-analysis

data-stats-analysis skill

safe

This skill performs rigorous statistical analyses locally using scipy and statsmodels, enabling robust t-tests, ANOVA, correlations, and multiple-testing

npx playbooks add skill starlitnightly/omicverse --skill data-stats-analysis

Review the files below or copy the command above to add this skill to your agents.

Files (1)

SKILL.md

15.1 KB

---
name: data-stats-analysis
title: Statistical Analysis (Universal)
description: Perform statistical tests, hypothesis testing, correlation analysis, and multiple testing corrections using scipy and statsmodels. Works with ANY LLM provider (GPT, Gemini, Claude, etc.).
---

# Statistical Analysis (Universal)

## Overview
This skill enables you to perform rigorous statistical analyses including t-tests, ANOVA, correlation analysis, hypothesis testing, and multiple testing corrections. Unlike cloud-hosted solutions, this skill uses standard Python statistical libraries (**scipy**, **statsmodels**, **numpy**) and executes **locally** in your environment, making it compatible with **ALL LLM providers** including GPT, Gemini, Claude, DeepSeek, and Qwen.

## When to Use This Skill
- Compare means between groups (t-tests, ANOVA)
- Test for correlations between variables
- Perform hypothesis testing with p-value calculation
- Apply multiple testing corrections (FDR, Bonferroni)
- Calculate statistical summaries and confidence intervals
- Test for normality and distribution fitting
- Perform non-parametric tests (Mann-Whitney, Kruskal-Wallis)

## How to Use

### Step 1: Import Required Libraries
```python
import numpy as np
import pandas as pd
from scipy import stats
from scipy.stats import ttest_ind, mannwhitneyu, pearsonr, spearmanr
from scipy.stats import f_oneway, kruskal, chi2_contingency
from statsmodels.stats.multitest import multipletests
from statsmodels.stats.proportion import proportions_ztest
import warnings
warnings.filterwarnings('ignore')
```

### Step 2: Two-Sample t-Test
```python
# Compare means between two groups
# group1, group2: arrays of numeric values

# Perform independent t-test
t_statistic, p_value = ttest_ind(group1, group2)

print(f"t-statistic: {t_statistic:.4f}")
print(f"p-value: {p_value:.4e}")

if p_value < 0.05:
    print("✅ Significant difference between groups (p < 0.05)")
else:
    print("❌ No significant difference (p >= 0.05)")

# With equal variance assumption check
# Levene's test for equal variances
_, levene_p = stats.levene(group1, group2)
if levene_p < 0.05:
    # Use Welch's t-test (unequal variances)
    t_stat, p_val = ttest_ind(group1, group2, equal_var=False)
    print(f"Welch's t-test p-value: {p_val:.4e}")
else:
    print("Equal variances assumed")
```

### Step 3: One-Way ANOVA
```python
# Compare means across multiple groups
# groups: list of arrays, e.g., [group1, group2, group3]

# Perform one-way ANOVA
f_statistic, p_value = f_oneway(*groups)

print(f"F-statistic: {f_statistic:.4f}")
print(f"p-value: {p_value:.4e}")

if p_value < 0.05:
    print("✅ Significant difference between groups (p < 0.05)")
    print("Note: Use post-hoc tests to identify which groups differ")
else:
    print("❌ No significant difference between groups")

# Post-hoc pairwise t-tests with Bonferroni correction
from itertools import combinations

group_names = ['Group A', 'Group B', 'Group C']
pairwise_results = []

for (name1, data1), (name2, data2) in combinations(zip(group_names, groups), 2):
    _, p = ttest_ind(data1, data2)
    pairwise_results.append({
        'comparison': f'{name1} vs {name2}',
        'p_value': p
    })

# Apply Bonferroni correction
pairwise_df = pd.DataFrame(pairwise_results)
n_tests = len(pairwise_df)
pairwise_df['p_adjusted'] = pairwise_df['p_value'] * n_tests
pairwise_df['p_adjusted'] = pairwise_df['p_adjusted'].clip(upper=1.0)

print("\nPairwise Comparisons (Bonferroni-corrected):")
print(pairwise_df)
```

### Step 4: Correlation Analysis
```python
# Pearson correlation (linear relationships)
r_pearson, p_pearson = pearsonr(variable1, variable2)

print(f"Pearson correlation: r = {r_pearson:.4f}, p = {p_pearson:.4e}")

# Spearman correlation (monotonic relationships, robust to outliers)
r_spearman, p_spearman = spearmanr(variable1, variable2)

print(f"Spearman correlation: ρ = {r_spearman:.4f}, p = {p_spearman:.4e}")

# Interpretation
if abs(r_pearson) < 0.3:
    strength = "weak"
elif abs(r_pearson) < 0.7:
    strength = "moderate"
else:
    strength = "strong"

direction = "positive" if r_pearson > 0 else "negative"
print(f"Interpretation: {strength} {direction} correlation")

if p_pearson < 0.05:
    print("✅ Statistically significant (p < 0.05)")
else:
    print("❌ Not statistically significant")
```

### Step 5: Multiple Testing Correction
```python
# Scenario: Testing 1000 genes for differential expression
# p_values: array of p-values from individual tests

# Method 1: Benjamini-Hochberg FDR correction (recommended)
reject_fdr, p_adjusted_fdr, _, _ = multipletests(p_values, alpha=0.05, method='fdr_bh')

# Method 2: Bonferroni correction (more conservative)
reject_bonf, p_adjusted_bonf, _, _ = multipletests(p_values, alpha=0.05, method='bonferroni')

# Create results DataFrame
results_df = pd.DataFrame({
    'gene': gene_names,
    'p_value': p_values,
    'q_value_fdr': p_adjusted_fdr,
    'p_adjusted_bonferroni': p_adjusted_bonf,
    'significant_fdr': reject_fdr,
    'significant_bonf': reject_bonf
})

# Summary
print(f"Original significant (p < 0.05): {(p_values < 0.05).sum()}")
print(f"Significant after FDR correction: {reject_fdr.sum()}")
print(f"Significant after Bonferroni correction: {reject_bonf.sum()}")

# Save results
results_df.to_csv('statistical_results.csv', index=False)
print("✅ Results saved to: statistical_results.csv")
```

### Step 6: Non-Parametric Tests
```python
# Use when data is not normally distributed

# Mann-Whitney U test (alternative to t-test)
u_statistic, p_value_mw = mannwhitneyu(group1, group2, alternative='two-sided')

print(f"Mann-Whitney U test:")
print(f"U-statistic: {u_statistic:.4f}")
print(f"p-value: {p_value_mw:.4e}")

# Kruskal-Wallis H test (alternative to ANOVA)
h_statistic, p_value_kw = kruskal(*groups)

print(f"\nKruskal-Wallis H test:")
print(f"H-statistic: {h_statistic:.4f}")
print(f"p-value: {p_value_kw:.4e}")
```

## Advanced Features

### Normality Testing
```python
from scipy.stats import shapiro, normaltest, kstest

# Test if data follows normal distribution

# Shapiro-Wilk test (best for n < 5000)
stat_sw, p_sw = shapiro(data)
print(f"Shapiro-Wilk test: W={stat_sw:.4f}, p={p_sw:.4e}")

# D'Agostino-Pearson test
stat_dp, p_dp = normaltest(data)
print(f"D'Agostino-Pearson test: stat={stat_dp:.4f}, p={p_dp:.4e}")

# Interpretation
if p_sw < 0.05:
    print("❌ Data does NOT follow normal distribution (p < 0.05)")
    print("→ Recommendation: Use non-parametric tests (Mann-Whitney, Kruskal-Wallis)")
else:
    print("✅ Data appears normally distributed (p >= 0.05)")
    print("→ OK to use parametric tests (t-test, ANOVA)")
```

### Chi-Square Test for Contingency Tables
```python
# Test independence between categorical variables
# contingency_table: 2D array (rows=categories1, columns=categories2)

# Example: Cell type distribution across conditions
contingency_table = np.array([
    [50, 30, 20],  # Condition A: T cells, B cells, NK cells
    [40, 45, 15],  # Condition B
    [35, 25, 40]   # Condition C
])

chi2, p_value, dof, expected = chi2_contingency(contingency_table)

print(f"Chi-square statistic: {chi2:.4f}")
print(f"p-value: {p_value:.4e}")
print(f"Degrees of freedom: {dof}")
print(f"\nExpected frequencies:\n{expected}")

if p_value < 0.05:
    print("✅ Significant association between variables (p < 0.05)")
else:
    print("❌ No significant association")
```

### Confidence Intervals
```python
from scipy.stats import t as t_dist

def calculate_confidence_interval(data, confidence=0.95):
    """Calculate confidence interval for mean"""
    n = len(data)
    mean = np.mean(data)
    std_err = stats.sem(data)  # Standard error of mean

    # t-distribution critical value
    t_crit = t_dist.ppf((1 + confidence) / 2, df=n-1)

    margin_error = t_crit * std_err
    ci_lower = mean - margin_error
    ci_upper = mean + margin_error

    return mean, ci_lower, ci_upper

# Usage
mean, ci_low, ci_high = calculate_confidence_interval(data, confidence=0.95)

print(f"Mean: {mean:.4f}")
print(f"95% CI: [{ci_low:.4f}, {ci_high:.4f}]")
```

### Effect Size Calculation
```python
def cohens_d(group1, group2):
    """Calculate Cohen's d effect size"""
    n1, n2 = len(group1), len(group2)
    var1, var2 = np.var(group1, ddof=1), np.var(group2, ddof=1)

    # Pooled standard deviation
    pooled_std = np.sqrt(((n1-1)*var1 + (n2-1)*var2) / (n1+n2-2))

    # Cohen's d
    d = (np.mean(group1) - np.mean(group2)) / pooled_std

    return d

# Usage
effect_size = cohens_d(group1, group2)
print(f"Cohen's d: {effect_size:.4f}")

# Interpretation
if abs(effect_size) < 0.2:
    print("Effect size: negligible")
elif abs(effect_size) < 0.5:
    print("Effect size: small")
elif abs(effect_size) < 0.8:
    print("Effect size: medium")
else:
    print("Effect size: large")
```

## Common Use Cases

### Differential Gene Expression Statistical Testing
```python
# Compare gene expression between two conditions
# gene_expression_df: rows=genes, columns=samples
# condition_labels: array indicating which condition each sample belongs to

results = []

for gene in gene_expression_df.index:
    # Get expression values for each condition
    cond1_expr = gene_expression_df.loc[gene, condition_labels == 'Condition1']
    cond2_expr = gene_expression_df.loc[gene, condition_labels == 'Condition2']

    # t-test
    t_stat, p_val = ttest_ind(cond1_expr, cond2_expr)

    # Log2 fold change
    log2fc = np.log2(cond2_expr.mean() / cond1_expr.mean())

    results.append({
        'gene': gene,
        'log2FC': log2fc,
        'p_value': p_val,
        'mean_cond1': cond1_expr.mean(),
        'mean_cond2': cond2_expr.mean()
    })

deg_results = pd.DataFrame(results)

# Apply FDR correction
_, deg_results['q_value'], _, _ = multipletests(
    deg_results['p_value'],
    alpha=0.05,
    method='fdr_bh'
)

# Filter significant genes
significant_genes = deg_results[
    (deg_results['q_value'] < 0.05) &
    (abs(deg_results['log2FC']) > 1)
]

print(f"✅ Identified {len(significant_genes)} differentially expressed genes")
print(f"   - Upregulated: {(significant_genes['log2FC'] > 1).sum()}")
print(f"   - Downregulated: {(significant_genes['log2FC'] < -1).sum()}")

# Save
significant_genes.to_csv('deg_results.csv', index=False)
```

### Cluster Enrichment Analysis
```python
# Test if a cell type is enriched in a specific cluster
# total_cells: total number of cells
# cluster_cells: number of cells in cluster
# celltype_total: total cells of this type
# celltype_in_cluster: cells of this type in cluster

from scipy.stats import fisher_exact

# Create contingency table
contingency = [
    [celltype_in_cluster, cluster_cells - celltype_in_cluster],  # In cluster
    [celltype_total - celltype_in_cluster, total_cells - cluster_cells - (celltype_total - celltype_in_cluster)]  # Not in cluster
]

odds_ratio, p_value = fisher_exact(contingency, alternative='greater')

print(f"Odds ratio: {odds_ratio:.4f}")
print(f"p-value: {p_value:.4e}")

if p_value < 0.05 and odds_ratio > 1:
    print(f"✅ Cell type is significantly ENRICHED in cluster (p < 0.05)")
elif p_value < 0.05 and odds_ratio < 1:
    print(f"⚠️ Cell type is significantly DEPLETED in cluster (p < 0.05)")
else:
    print("❌ No significant enrichment/depletion")
```

### Batch Effect Detection
```python
# Test if there's a batch effect using ANOVA
# gene_expression: DataFrame with genes as rows, samples as columns
# batch_labels: array indicating batch for each sample

batch_effect_results = []

for gene in gene_expression.index:
    # Get expression values for each batch
    batches = [
        gene_expression.loc[gene, batch_labels == batch]
        for batch in np.unique(batch_labels)
    ]

    # ANOVA test
    f_stat, p_val = f_oneway(*batches)

    batch_effect_results.append({
        'gene': gene,
        'f_statistic': f_stat,
        'p_value': p_val
    })

batch_df = pd.DataFrame(batch_effect_results)

# Apply FDR correction
_, batch_df['q_value'], _, _ = multipletests(batch_df['p_value'], alpha=0.05, method='fdr_bh')

# Count genes with batch effects
genes_with_batch_effect = (batch_df['q_value'] < 0.05).sum()

print(f"Genes with significant batch effect: {genes_with_batch_effect} ({genes_with_batch_effect/len(batch_df)*100:.1f}%)")

if genes_with_batch_effect > len(batch_df) * 0.1:
    print("⚠️ WARNING: Strong batch effect detected (>10% genes affected)")
    print("→ Recommendation: Apply batch correction (ComBat, Harmony, etc.)")
else:
    print("✅ Minimal batch effect")
```

## Best Practices

1. **Check Assumptions**: Always test normality before using parametric tests (t-test, ANOVA)
2. **Multiple Testing**: Apply FDR or Bonferroni correction when testing many hypotheses
3. **Effect Size**: Report effect sizes (Cohen's d) alongside p-values
4. **Sample Size**: Ensure adequate sample size for statistical power
5. **Outliers**: Check for and handle outliers appropriately
6. **Non-Parametric Alternatives**: Use when assumptions are violated (Mann-Whitney instead of t-test)
7. **Report Details**: Always report test used, test statistic, p-value, and correction method
8. **Visualization**: Combine statistical tests with visualizations (box plots, violin plots)

## Troubleshooting

### Issue: "Warning: p-value is very small"
**Solution**: This is normal for highly significant results. Report as p < 0.001 or use scientific notation
```python
if p_value < 0.001:
    print(f"p < 0.001")
else:
    print(f"p = {p_value:.4f}")
```

### Issue: "Division by zero in effect size calculation"
**Solution**: Check for zero variance (all values identical)
```python
if np.std(group1) == 0 or np.std(group2) == 0:
    print("Cannot calculate effect size: zero variance in one or both groups")
else:
    d = cohens_d(group1, group2)
```

### Issue: "Test fails with NaN values"
**Solution**: Remove or impute NaN values before testing
```python
# Remove NaN
group1_clean = group1[~np.isnan(group1)]
group2_clean = group2[~np.isnan(group2)]

# Or filter in DataFrame
df_clean = df.dropna(subset=['column_name'])
```

### Issue: "Insufficient sample size warning"
**Solution**: Minimum sample sizes for reliable tests:
- t-test: n ≥ 30 per group (or ≥ 5 if normally distributed)
- ANOVA: n ≥ 20 per group
- Correlation: n ≥ 30 total

```python
if len(group1) < 30 or len(group2) < 30:
    print("⚠️ Warning: Small sample size. Results may not be reliable.")
    print("Consider using non-parametric tests or collecting more data.")
```

## Technical Notes

- **Libraries**: Uses `scipy.stats` and `statsmodels` (widely supported, stable)
- **Execution**: Runs locally in the agent's sandbox
- **Compatibility**: Works with ALL LLM providers (GPT, Gemini, Claude, DeepSeek, Qwen, etc.)
- **Performance**: Most tests complete in milliseconds; large-scale testing (>10K genes) takes 1-5 seconds
- **Precision**: Uses double-precision floating point (numpy default)
- **Corrections**: FDR (Benjamini-Hochberg) recommended for genomics; Bonferroni for small numbers of tests

## References
- scipy.stats documentation: https://docs.scipy.org/doc/scipy/reference/stats.html
- statsmodels documentation: https://www.statsmodels.org/stable/index.html
- Multiple testing: https://www.statsmodels.org/stable/generated/statsmodels.stats.multitest.multipletests.html
- Statistical testing guide: https://docs.scipy.org/doc/scipy/tutorial/stats.html

Overview

This skill performs rigorous statistical analyses for omics and general datasets using local Python libraries (scipy, statsmodels, numpy). It supports t-tests, ANOVA, correlation analysis, non-parametric tests, multiple testing correction, effect-size calculation, and confidence intervals. Designed to run locally and work with any LLM provider, it fits into Jupyter-based analysis pipelines for bulk, single-cell, and spatial RNA-seq.

How this skill works

The skill runs common statistical routines: parametric tests (t-test, ANOVA), non-parametric alternatives (Mann-Whitney, Kruskal-Wallis), correlation (Pearson, Spearman), contingency tests (chi-square, Fisher), and normality checks (Shapiro, D’Agostino). It computes p-values, test statistics, confidence intervals, Cohen’s d, and applies multiple testing corrections (Benjamini-Hochberg FDR, Bonferroni) via statsmodels. Outputs are returned as dataframes and saved as CSV for downstream steps.

When to use it

Compare means between two or more experimental groups (t-test, ANOVA)
Detect correlations or associations between continuous variables (Pearson, Spearman)
Run differential gene expression testing across many genes with multiple testing correction
Test categorical associations or cluster enrichment (chi-square, Fisher exact)
Use non-parametric tests when normality assumptions fail

Best practices

Always test distributional assumptions (Shapiro, normaltest) before selecting parametric tests
Apply multiple testing correction (FDR recommended) when running many hypotheses
Report effect sizes (Cohen’s d) and confidence intervals alongside p-values
Handle NaNs and outliers before testing; impute or filter as appropriate
Use post-hoc pairwise tests with corrections after a significant ANOVA

Example use cases

Differential expression: per-gene t-tests followed by Benjamini-Hochberg FDR to identify DEGs and save results.csv
Cluster enrichment: Fisher exact test to assess whether a cell type is overrepresented in a cluster
Batch effect detection: gene-wise ANOVA across batches, FDR correction and percent-affected summary to guide batch correction
Correlation analysis: compute Pearson and Spearman correlations between gene modules and phenotypes with interpretation of strength and direction
Non-parametric testing: Mann-Whitney U or Kruskal-Wallis when normality tests indicate non-normal data

FAQ

What correction should I use for thousands of tests?

Use Benjamini-Hochberg FDR for large-scale testing to control expected false discovery rate; Bonferroni is overly conservative for many tests.

My effect size calculation gives division by zero. Why?

That happens when a group has zero variance (all identical values). Check data for constant columns, remove or handle them before computing Cohen’s d.