home / skills / pluginagentmarketplace / custom-plugin-ai-data-scientist / statistical-analysis

statistical-analysis skill

not checked

npx playbooks add skill pluginagentmarketplace/custom-plugin-ai-data-scientist --skill statistical-analysis

Review the files below or copy the command above to add this skill to your agents.

Files (4)

SKILL.md

4.4 KB

---
name: statistical-analysis
description: Probability, distributions, hypothesis testing, and statistical inference. Use for A/B testing, experimental design, or statistical validation.
sasmp_version: "1.3.0"
bonded_agent: 02-mathematics-statistics
bond_type: PRIMARY_BOND
---

# Statistical Analysis

Apply statistical methods to understand data and validate findings.

## Quick Start

```python
from scipy import stats
import numpy as np

# Descriptive statistics
data = np.array([1, 2, 3, 4, 5])
print(f"Mean: {np.mean(data)}")
print(f"Std: {np.std(data)}")

# Hypothesis testing
group1 = [23, 25, 27, 29, 31]
group2 = [20, 22, 24, 26, 28]
t_stat, p_value = stats.ttest_ind(group1, group2)
print(f"P-value: {p_value}")
```

## Core Tests

### T-Test (Compare Means)
```python
# One-sample: Compare to population mean
stats.ttest_1samp(data, 100)

# Two-sample: Compare two groups
stats.ttest_ind(group1, group2)

# Paired: Before/after comparison
stats.ttest_rel(before, after)
```

### Chi-Square (Categorical Data)
```python
from scipy.stats import chi2_contingency

observed = np.array([[10, 20], [15, 25]])
chi2, p_value, dof, expected = chi2_contingency(observed)
```

### ANOVA (Multiple Groups)
```python
f_stat, p_value = stats.f_oneway(group1, group2, group3)
```

## Confidence Intervals

```python
from scipy import stats

confidence_level = 0.95
mean = np.mean(data)
se = stats.sem(data)
ci = stats.t.interval(confidence_level, len(data)-1, mean, se)

print(f"95% CI: [{ci[0]:.2f}, {ci[1]:.2f}]")
```

## Correlation

```python
# Pearson (linear)
r, p_value = stats.pearsonr(x, y)

# Spearman (rank-based)
rho, p_value = stats.spearmanr(x, y)
```

## Distributions

```python
# Normal
x = np.linspace(-3, 3, 100)
pdf = stats.norm.pdf(x, loc=0, scale=1)

# Sampling
samples = np.random.normal(0, 1, 1000)

# Test normality
stat, p_value = stats.shapiro(data)
```

## A/B Testing Framework

```python
def ab_test(control, treatment, alpha=0.05):
    """
    Run A/B test with statistical significance

    Returns: significant (bool), p_value (float)
    """
    t_stat, p_value = stats.ttest_ind(control, treatment)

    significant = p_value < alpha
    improvement = (np.mean(treatment) - np.mean(control)) / np.mean(control) * 100

    return {
        'significant': significant,
        'p_value': p_value,
        'improvement': f"{improvement:.2f}%"
    }
```

## Interpretation

**P-value < 0.05**: Reject null hypothesis (statistically significant)

**P-value >= 0.05**: Fail to reject null (not significant)

## Common Pitfalls

- Multiple testing without correction
- Small sample sizes
- Ignoring assumptions (normality, independence)
- Confusing correlation with causation
- p-hacking (searching for significance)

## Troubleshooting

### Common Issues

**Problem: Non-normal data for t-test**
```python
# Check normality first
stat, p = stats.shapiro(data)
if p < 0.05:
    # Use non-parametric alternative
    stat, p = stats.mannwhitneyu(group1, group2)  # Instead of ttest_ind
```

**Problem: Multiple comparisons inflating false positives**
```python
from statsmodels.stats.multitest import multipletests

# Apply Bonferroni correction
p_values = [0.01, 0.03, 0.04, 0.02, 0.06]
rejected, p_adjusted, _, _ = multipletests(p_values, method='bonferroni')
```

**Problem: Underpowered study (sample too small)**
```python
from statsmodels.stats.power import TTestIndPower

# Calculate required sample size
power_analysis = TTestIndPower()
sample_size = power_analysis.solve_power(
    effect_size=0.5,  # Medium effect (Cohen's d)
    power=0.8,        # 80% power
    alpha=0.05        # 5% significance
)
print(f"Required n per group: {sample_size:.0f}")
```

**Problem: Heterogeneous variances**
```python
# Check with Levene's test
stat, p = stats.levene(group1, group2)
if p < 0.05:
    # Use Welch's t-test (default in scipy)
    t, p = stats.ttest_ind(group1, group2, equal_var=False)
```

**Problem: Outliers affecting results**
```python
from scipy.stats import zscore

# Detect outliers (|z| > 3)
z_scores = np.abs(zscore(data))
clean_data = data[z_scores < 3]

# Or use robust statistics
median = np.median(data)
mad = np.median(np.abs(data - median))  # Median Absolute Deviation
```

### Debug Checklist
- [ ] Check sample size adequacy (power analysis)
- [ ] Test normality assumption (Shapiro-Wilk)
- [ ] Test homogeneity of variance (Levene's)
- [ ] Check for outliers (z-scores, IQR)
- [ ] Apply multiple testing correction if needed
- [ ] Report effect sizes, not just p-values