home / skills / starlitnightly / omicverse / data-stats-analysis

data-stats-analysis skill

/.claude/skills/data-stats-analysis

This skill performs rigorous statistical analyses locally using scipy and statsmodels, enabling robust t-tests, ANOVA, correlations, and multiple-testing

npx playbooks add skill starlitnightly/omicverse --skill data-stats-analysis

Review the files below or copy the command above to add this skill to your agents.

Files (1)
SKILL.md
15.1 KB
---
name: data-stats-analysis
title: Statistical Analysis (Universal)
description: Perform statistical tests, hypothesis testing, correlation analysis, and multiple testing corrections using scipy and statsmodels. Works with ANY LLM provider (GPT, Gemini, Claude, etc.).
---

# Statistical Analysis (Universal)

## Overview
This skill enables you to perform rigorous statistical analyses including t-tests, ANOVA, correlation analysis, hypothesis testing, and multiple testing corrections. Unlike cloud-hosted solutions, this skill uses standard Python statistical libraries (**scipy**, **statsmodels**, **numpy**) and executes **locally** in your environment, making it compatible with **ALL LLM providers** including GPT, Gemini, Claude, DeepSeek, and Qwen.

## When to Use This Skill
- Compare means between groups (t-tests, ANOVA)
- Test for correlations between variables
- Perform hypothesis testing with p-value calculation
- Apply multiple testing corrections (FDR, Bonferroni)
- Calculate statistical summaries and confidence intervals
- Test for normality and distribution fitting
- Perform non-parametric tests (Mann-Whitney, Kruskal-Wallis)

## How to Use

### Step 1: Import Required Libraries
```python
import numpy as np
import pandas as pd
from scipy import stats
from scipy.stats import ttest_ind, mannwhitneyu, pearsonr, spearmanr
from scipy.stats import f_oneway, kruskal, chi2_contingency
from statsmodels.stats.multitest import multipletests
from statsmodels.stats.proportion import proportions_ztest
import warnings
warnings.filterwarnings('ignore')
```

### Step 2: Two-Sample t-Test
```python
# Compare means between two groups
# group1, group2: arrays of numeric values

# Perform independent t-test
t_statistic, p_value = ttest_ind(group1, group2)

print(f"t-statistic: {t_statistic:.4f}")
print(f"p-value: {p_value:.4e}")

if p_value < 0.05:
    print("✅ Significant difference between groups (p < 0.05)")
else:
    print("❌ No significant difference (p >= 0.05)")

# With equal variance assumption check
# Levene's test for equal variances
_, levene_p = stats.levene(group1, group2)
if levene_p < 0.05:
    # Use Welch's t-test (unequal variances)
    t_stat, p_val = ttest_ind(group1, group2, equal_var=False)
    print(f"Welch's t-test p-value: {p_val:.4e}")
else:
    print("Equal variances assumed")
```

### Step 3: One-Way ANOVA
```python
# Compare means across multiple groups
# groups: list of arrays, e.g., [group1, group2, group3]

# Perform one-way ANOVA
f_statistic, p_value = f_oneway(*groups)

print(f"F-statistic: {f_statistic:.4f}")
print(f"p-value: {p_value:.4e}")

if p_value < 0.05:
    print("✅ Significant difference between groups (p < 0.05)")
    print("Note: Use post-hoc tests to identify which groups differ")
else:
    print("❌ No significant difference between groups")

# Post-hoc pairwise t-tests with Bonferroni correction
from itertools import combinations

group_names = ['Group A', 'Group B', 'Group C']
pairwise_results = []

for (name1, data1), (name2, data2) in combinations(zip(group_names, groups), 2):
    _, p = ttest_ind(data1, data2)
    pairwise_results.append({
        'comparison': f'{name1} vs {name2}',
        'p_value': p
    })

# Apply Bonferroni correction
pairwise_df = pd.DataFrame(pairwise_results)
n_tests = len(pairwise_df)
pairwise_df['p_adjusted'] = pairwise_df['p_value'] * n_tests
pairwise_df['p_adjusted'] = pairwise_df['p_adjusted'].clip(upper=1.0)

print("\nPairwise Comparisons (Bonferroni-corrected):")
print(pairwise_df)
```

### Step 4: Correlation Analysis
```python
# Pearson correlation (linear relationships)
r_pearson, p_pearson = pearsonr(variable1, variable2)

print(f"Pearson correlation: r = {r_pearson:.4f}, p = {p_pearson:.4e}")

# Spearman correlation (monotonic relationships, robust to outliers)
r_spearman, p_spearman = spearmanr(variable1, variable2)

print(f"Spearman correlation: ρ = {r_spearman:.4f}, p = {p_spearman:.4e}")

# Interpretation
if abs(r_pearson) < 0.3:
    strength = "weak"
elif abs(r_pearson) < 0.7:
    strength = "moderate"
else:
    strength = "strong"

direction = "positive" if r_pearson > 0 else "negative"
print(f"Interpretation: {strength} {direction} correlation")

if p_pearson < 0.05:
    print("✅ Statistically significant (p < 0.05)")
else:
    print("❌ Not statistically significant")
```

### Step 5: Multiple Testing Correction
```python
# Scenario: Testing 1000 genes for differential expression
# p_values: array of p-values from individual tests

# Method 1: Benjamini-Hochberg FDR correction (recommended)
reject_fdr, p_adjusted_fdr, _, _ = multipletests(p_values, alpha=0.05, method='fdr_bh')

# Method 2: Bonferroni correction (more conservative)
reject_bonf, p_adjusted_bonf, _, _ = multipletests(p_values, alpha=0.05, method='bonferroni')

# Create results DataFrame
results_df = pd.DataFrame({
    'gene': gene_names,
    'p_value': p_values,
    'q_value_fdr': p_adjusted_fdr,
    'p_adjusted_bonferroni': p_adjusted_bonf,
    'significant_fdr': reject_fdr,
    'significant_bonf': reject_bonf
})

# Summary
print(f"Original significant (p < 0.05): {(p_values < 0.05).sum()}")
print(f"Significant after FDR correction: {reject_fdr.sum()}")
print(f"Significant after Bonferroni correction: {reject_bonf.sum()}")

# Save results
results_df.to_csv('statistical_results.csv', index=False)
print("✅ Results saved to: statistical_results.csv")
```

### Step 6: Non-Parametric Tests
```python
# Use when data is not normally distributed

# Mann-Whitney U test (alternative to t-test)
u_statistic, p_value_mw = mannwhitneyu(group1, group2, alternative='two-sided')

print(f"Mann-Whitney U test:")
print(f"U-statistic: {u_statistic:.4f}")
print(f"p-value: {p_value_mw:.4e}")

# Kruskal-Wallis H test (alternative to ANOVA)
h_statistic, p_value_kw = kruskal(*groups)

print(f"\nKruskal-Wallis H test:")
print(f"H-statistic: {h_statistic:.4f}")
print(f"p-value: {p_value_kw:.4e}")
```

## Advanced Features

### Normality Testing
```python
from scipy.stats import shapiro, normaltest, kstest

# Test if data follows normal distribution

# Shapiro-Wilk test (best for n < 5000)
stat_sw, p_sw = shapiro(data)
print(f"Shapiro-Wilk test: W={stat_sw:.4f}, p={p_sw:.4e}")

# D'Agostino-Pearson test
stat_dp, p_dp = normaltest(data)
print(f"D'Agostino-Pearson test: stat={stat_dp:.4f}, p={p_dp:.4e}")

# Interpretation
if p_sw < 0.05:
    print("❌ Data does NOT follow normal distribution (p < 0.05)")
    print("→ Recommendation: Use non-parametric tests (Mann-Whitney, Kruskal-Wallis)")
else:
    print("✅ Data appears normally distributed (p >= 0.05)")
    print("→ OK to use parametric tests (t-test, ANOVA)")
```

### Chi-Square Test for Contingency Tables
```python
# Test independence between categorical variables
# contingency_table: 2D array (rows=categories1, columns=categories2)

# Example: Cell type distribution across conditions
contingency_table = np.array([
    [50, 30, 20],  # Condition A: T cells, B cells, NK cells
    [40, 45, 15],  # Condition B
    [35, 25, 40]   # Condition C
])

chi2, p_value, dof, expected = chi2_contingency(contingency_table)

print(f"Chi-square statistic: {chi2:.4f}")
print(f"p-value: {p_value:.4e}")
print(f"Degrees of freedom: {dof}")
print(f"\nExpected frequencies:\n{expected}")

if p_value < 0.05:
    print("✅ Significant association between variables (p < 0.05)")
else:
    print("❌ No significant association")
```

### Confidence Intervals
```python
from scipy.stats import t as t_dist

def calculate_confidence_interval(data, confidence=0.95):
    """Calculate confidence interval for mean"""
    n = len(data)
    mean = np.mean(data)
    std_err = stats.sem(data)  # Standard error of mean

    # t-distribution critical value
    t_crit = t_dist.ppf((1 + confidence) / 2, df=n-1)

    margin_error = t_crit * std_err
    ci_lower = mean - margin_error
    ci_upper = mean + margin_error

    return mean, ci_lower, ci_upper

# Usage
mean, ci_low, ci_high = calculate_confidence_interval(data, confidence=0.95)

print(f"Mean: {mean:.4f}")
print(f"95% CI: [{ci_low:.4f}, {ci_high:.4f}]")
```

### Effect Size Calculation
```python
def cohens_d(group1, group2):
    """Calculate Cohen's d effect size"""
    n1, n2 = len(group1), len(group2)
    var1, var2 = np.var(group1, ddof=1), np.var(group2, ddof=1)

    # Pooled standard deviation
    pooled_std = np.sqrt(((n1-1)*var1 + (n2-1)*var2) / (n1+n2-2))

    # Cohen's d
    d = (np.mean(group1) - np.mean(group2)) / pooled_std

    return d

# Usage
effect_size = cohens_d(group1, group2)
print(f"Cohen's d: {effect_size:.4f}")

# Interpretation
if abs(effect_size) < 0.2:
    print("Effect size: negligible")
elif abs(effect_size) < 0.5:
    print("Effect size: small")
elif abs(effect_size) < 0.8:
    print("Effect size: medium")
else:
    print("Effect size: large")
```

## Common Use Cases

### Differential Gene Expression Statistical Testing
```python
# Compare gene expression between two conditions
# gene_expression_df: rows=genes, columns=samples
# condition_labels: array indicating which condition each sample belongs to

results = []

for gene in gene_expression_df.index:
    # Get expression values for each condition
    cond1_expr = gene_expression_df.loc[gene, condition_labels == 'Condition1']
    cond2_expr = gene_expression_df.loc[gene, condition_labels == 'Condition2']

    # t-test
    t_stat, p_val = ttest_ind(cond1_expr, cond2_expr)

    # Log2 fold change
    log2fc = np.log2(cond2_expr.mean() / cond1_expr.mean())

    results.append({
        'gene': gene,
        'log2FC': log2fc,
        'p_value': p_val,
        'mean_cond1': cond1_expr.mean(),
        'mean_cond2': cond2_expr.mean()
    })

deg_results = pd.DataFrame(results)

# Apply FDR correction
_, deg_results['q_value'], _, _ = multipletests(
    deg_results['p_value'],
    alpha=0.05,
    method='fdr_bh'
)

# Filter significant genes
significant_genes = deg_results[
    (deg_results['q_value'] < 0.05) &
    (abs(deg_results['log2FC']) > 1)
]

print(f"✅ Identified {len(significant_genes)} differentially expressed genes")
print(f"   - Upregulated: {(significant_genes['log2FC'] > 1).sum()}")
print(f"   - Downregulated: {(significant_genes['log2FC'] < -1).sum()}")

# Save
significant_genes.to_csv('deg_results.csv', index=False)
```

### Cluster Enrichment Analysis
```python
# Test if a cell type is enriched in a specific cluster
# total_cells: total number of cells
# cluster_cells: number of cells in cluster
# celltype_total: total cells of this type
# celltype_in_cluster: cells of this type in cluster

from scipy.stats import fisher_exact

# Create contingency table
contingency = [
    [celltype_in_cluster, cluster_cells - celltype_in_cluster],  # In cluster
    [celltype_total - celltype_in_cluster, total_cells - cluster_cells - (celltype_total - celltype_in_cluster)]  # Not in cluster
]

odds_ratio, p_value = fisher_exact(contingency, alternative='greater')

print(f"Odds ratio: {odds_ratio:.4f}")
print(f"p-value: {p_value:.4e}")

if p_value < 0.05 and odds_ratio > 1:
    print(f"✅ Cell type is significantly ENRICHED in cluster (p < 0.05)")
elif p_value < 0.05 and odds_ratio < 1:
    print(f"⚠️ Cell type is significantly DEPLETED in cluster (p < 0.05)")
else:
    print("❌ No significant enrichment/depletion")
```

### Batch Effect Detection
```python
# Test if there's a batch effect using ANOVA
# gene_expression: DataFrame with genes as rows, samples as columns
# batch_labels: array indicating batch for each sample

batch_effect_results = []

for gene in gene_expression.index:
    # Get expression values for each batch
    batches = [
        gene_expression.loc[gene, batch_labels == batch]
        for batch in np.unique(batch_labels)
    ]

    # ANOVA test
    f_stat, p_val = f_oneway(*batches)

    batch_effect_results.append({
        'gene': gene,
        'f_statistic': f_stat,
        'p_value': p_val
    })

batch_df = pd.DataFrame(batch_effect_results)

# Apply FDR correction
_, batch_df['q_value'], _, _ = multipletests(batch_df['p_value'], alpha=0.05, method='fdr_bh')

# Count genes with batch effects
genes_with_batch_effect = (batch_df['q_value'] < 0.05).sum()

print(f"Genes with significant batch effect: {genes_with_batch_effect} ({genes_with_batch_effect/len(batch_df)*100:.1f}%)")

if genes_with_batch_effect > len(batch_df) * 0.1:
    print("⚠️ WARNING: Strong batch effect detected (>10% genes affected)")
    print("→ Recommendation: Apply batch correction (ComBat, Harmony, etc.)")
else:
    print("✅ Minimal batch effect")
```

## Best Practices

1. **Check Assumptions**: Always test normality before using parametric tests (t-test, ANOVA)
2. **Multiple Testing**: Apply FDR or Bonferroni correction when testing many hypotheses
3. **Effect Size**: Report effect sizes (Cohen's d) alongside p-values
4. **Sample Size**: Ensure adequate sample size for statistical power
5. **Outliers**: Check for and handle outliers appropriately
6. **Non-Parametric Alternatives**: Use when assumptions are violated (Mann-Whitney instead of t-test)
7. **Report Details**: Always report test used, test statistic, p-value, and correction method
8. **Visualization**: Combine statistical tests with visualizations (box plots, violin plots)

## Troubleshooting

### Issue: "Warning: p-value is very small"
**Solution**: This is normal for highly significant results. Report as p < 0.001 or use scientific notation
```python
if p_value < 0.001:
    print(f"p < 0.001")
else:
    print(f"p = {p_value:.4f}")
```

### Issue: "Division by zero in effect size calculation"
**Solution**: Check for zero variance (all values identical)
```python
if np.std(group1) == 0 or np.std(group2) == 0:
    print("Cannot calculate effect size: zero variance in one or both groups")
else:
    d = cohens_d(group1, group2)
```

### Issue: "Test fails with NaN values"
**Solution**: Remove or impute NaN values before testing
```python
# Remove NaN
group1_clean = group1[~np.isnan(group1)]
group2_clean = group2[~np.isnan(group2)]

# Or filter in DataFrame
df_clean = df.dropna(subset=['column_name'])
```

### Issue: "Insufficient sample size warning"
**Solution**: Minimum sample sizes for reliable tests:
- t-test: n ≥ 30 per group (or ≥ 5 if normally distributed)
- ANOVA: n ≥ 20 per group
- Correlation: n ≥ 30 total

```python
if len(group1) < 30 or len(group2) < 30:
    print("⚠️ Warning: Small sample size. Results may not be reliable.")
    print("Consider using non-parametric tests or collecting more data.")
```

## Technical Notes

- **Libraries**: Uses `scipy.stats` and `statsmodels` (widely supported, stable)
- **Execution**: Runs locally in the agent's sandbox
- **Compatibility**: Works with ALL LLM providers (GPT, Gemini, Claude, DeepSeek, Qwen, etc.)
- **Performance**: Most tests complete in milliseconds; large-scale testing (>10K genes) takes 1-5 seconds
- **Precision**: Uses double-precision floating point (numpy default)
- **Corrections**: FDR (Benjamini-Hochberg) recommended for genomics; Bonferroni for small numbers of tests

## References
- scipy.stats documentation: https://docs.scipy.org/doc/scipy/reference/stats.html
- statsmodels documentation: https://www.statsmodels.org/stable/index.html
- Multiple testing: https://www.statsmodels.org/stable/generated/statsmodels.stats.multitest.multipletests.html
- Statistical testing guide: https://docs.scipy.org/doc/scipy/tutorial/stats.html

Overview

This skill performs rigorous statistical analyses for omics and general datasets using local Python libraries (scipy, statsmodels, numpy). It supports t-tests, ANOVA, correlation analysis, non-parametric tests, multiple testing correction, effect-size calculation, and confidence intervals. Designed to run locally and work with any LLM provider, it fits into Jupyter-based analysis pipelines for bulk, single-cell, and spatial RNA-seq.

How this skill works

The skill runs common statistical routines: parametric tests (t-test, ANOVA), non-parametric alternatives (Mann-Whitney, Kruskal-Wallis), correlation (Pearson, Spearman), contingency tests (chi-square, Fisher), and normality checks (Shapiro, D’Agostino). It computes p-values, test statistics, confidence intervals, Cohen’s d, and applies multiple testing corrections (Benjamini-Hochberg FDR, Bonferroni) via statsmodels. Outputs are returned as dataframes and saved as CSV for downstream steps.

When to use it

  • Compare means between two or more experimental groups (t-test, ANOVA)
  • Detect correlations or associations between continuous variables (Pearson, Spearman)
  • Run differential gene expression testing across many genes with multiple testing correction
  • Test categorical associations or cluster enrichment (chi-square, Fisher exact)
  • Use non-parametric tests when normality assumptions fail

Best practices

  • Always test distributional assumptions (Shapiro, normaltest) before selecting parametric tests
  • Apply multiple testing correction (FDR recommended) when running many hypotheses
  • Report effect sizes (Cohen’s d) and confidence intervals alongside p-values
  • Handle NaNs and outliers before testing; impute or filter as appropriate
  • Use post-hoc pairwise tests with corrections after a significant ANOVA

Example use cases

  • Differential expression: per-gene t-tests followed by Benjamini-Hochberg FDR to identify DEGs and save results.csv
  • Cluster enrichment: Fisher exact test to assess whether a cell type is overrepresented in a cluster
  • Batch effect detection: gene-wise ANOVA across batches, FDR correction and percent-affected summary to guide batch correction
  • Correlation analysis: compute Pearson and Spearman correlations between gene modules and phenotypes with interpretation of strength and direction
  • Non-parametric testing: Mann-Whitney U or Kruskal-Wallis when normality tests indicate non-normal data

FAQ

What correction should I use for thousands of tests?

Use Benjamini-Hochberg FDR for large-scale testing to control expected false discovery rate; Bonferroni is overly conservative for many tests.

My effect size calculation gives division by zero. Why?

That happens when a group has zero variance (all identical values). Check data for constant columns, remove or handle them before computing Cohen’s d.