home / skills / openclaw / skills / statistics-2

statistics-2 skill

safe

This skill helps you perform hypothesis testing and diagnostics across A/B testing, time series, and model validation with a unified TestResult interface.

npx playbooks add skill openclaw/skills --skill statistics-2

Review the files below or copy the command above to add this skill to your agents.

Files (2)

SKILL.md

9.7 KB

---
name: pywayne-statistics
description: Comprehensive statistical testing library with 37+ methods for normality tests, location tests, correlation tests, time series tests, and model diagnostics. Use when performing hypothesis testing, A/B testing, data quality checks, time series analysis, or regression model validation. All methods return unified TestResult objects with consistent interface including p-value, statistic, confidence interval, and effect size.
---

# Pywayne Statistics

Comprehensive statistical testing library for hypothesis testing, A/B testing, and data analysis.

## Quick Start

```python
from pywayne.statistics import NormalityTests, LocationTests
import numpy as np

# Test data normality
nt = NormalityTests()
data = np.random.normal(0, 1, 100)
result = nt.shapiro_wilk(data)
print(f"p-value: {result.p_value:.4f}, is_normal: {not result.reject_null}")

# Compare two groups
lt = LocationTests()
group_a = np.random.normal(100, 15, 50)
group_b = np.random.normal(105, 15, 50)
result = lt.two_sample_ttest(group_a, group_b)
print(f"Significant difference: {result.reject_null}")
```

## Test Categories

### NormalityTests (`NormalityTests`)

Test if data follows a normal distribution or other specified distributions.

| Method | Description | Use Case |
|---------|-------------|-----------|
| `shapiro_wilk` | Shapiro-Wilk test | Small-medium samples (n ≤ 5000) |
| `ks_test_normal` | K-S normality test | Medium-large samples |
| `ks_test_two_sample` | Two-sample K-S test | Compare two sample distributions |
| `anderson_darling` | Anderson-Darling test | Tail-sensitive normality test |
| `dagostino_pearson` | D'Agostino-Pearson K² | Based on skewness and kurtosis |
| `jarque_bera` | Jarque-Bera test | Large samples, regression residuals |
| `chi_square_goodness_of_fit` | Chi-square goodness-of-fit | Categorical data |
| `lilliefors_test` | Lilliefors test | Unknown parameters K-S test |

**Example:**
```python
from pywayne.statistics import NormalityTests

nt = NormalityTests()
result = nt.shapiro_wilk(data)
if result.p_value < 0.05:
    print("Data is NOT normally distributed")
else:
    print("Data follows normal distribution")
```

### LocationTests (`LocationTests`)

Compare means or medians across groups (parametric and non-parametric).

| Method | Description | Use Case |
|---------|-------------|-----------|
| `one_sample_ttest` | One-sample t-test | Compare sample mean to a value |
| `two_sample_ttest` | Two-sample t-test | Compare two independent group means |
| `paired_ttest` | Paired t-test | Compare before/after measurements |
| `one_way_anova` | One-way ANOVA | Compare 3+ group means |
| `mann_whitney_u` | Mann-Whitney U test | Non-parametric two-sample test |
| `wilcoxon_signed_rank` | Wilcoxon signed-rank | Non-parametric paired test |
| `kruskal_wallis` | Kruskal-Wallis H test | Non-parametric multi-group test |

**Example (A/B Testing):**
```python
from pywayne.statistics import LocationTests, NormalityTests

lt = LocationTests()
nt = NormalityTests()

# Check normality first
if nt.shapiro_wilk(control).p_value > 0.05:
    result = lt.two_sample_ttest(control, treatment)
else:
    result = lt.mann_whitney_u(control, treatment)

print(f"Effect significant: {result.reject_null}")
```

### CorrelationTests (`CorrelationTests`)

Test correlation between variables and independence of categorical variables.

| Method | Description | Use Case |
|---------|-------------|-----------|
| `pearson_correlation` | Pearson correlation | Linear relationship |
| `spearman_correlation` | Spearman's rank | Monotonic relationship |
| `kendall_tau` | Kendall's tau | Rank correlation, small samples |
| `chi_square_independence` | Chi-square independence | Categorical variables |
| `fisher_exact_test` | Fisher's exact test | 2×2 contingency table |
| `mcnemar_test` | McNemar's test | Paired categorical data |

**Example:**
```python
from pywayne.statistics import CorrelationTests

ct = CorrelationTests()
result = ct.pearson_correlation(x, y)
print(f"Correlation: {result.statistic:.3f}, p-value: {result.p_value:.4f}")
```

### TimeSeriesTests (`TimeSeriesTests`)

Test time series properties: stationarity, autocorrelation, cointegration.

| Method | Description | Use Case |
|---------|-------------|-----------|
| `adf_test` | Augmented Dickey-Fuller | Unit root test for stationarity |
| `kpss_test` | KPSS test | Stationarity test (complements ADF) |
| `ljung_box_test` | Ljung-Box Q test | Overall autocorrelation |
| `runs_test` | Runs test | Randomness testing |
| `arch_test` | ARCH effect test | Heteroscedasticity |
| `granger_causality` | Granger causality | Causal relationship |
| `engle_granger_cointegration` | Engle-Granger cointegration | Long-term equilibrium |
| `breusch_godfrey_test` | Breusch-Godfrey | Higher-order autocorrelation |

**Example:**
```python
from pywayne.statistics import TimeSeriesTests

tst = TimeSeriesTests()
adf_result = tst.adf_test(time_series_data)
kpss_result = tst.kpss_test(time_series_data)

if adf_result.reject_null:
    print("Series is stationary")
else:
    print("Series has unit root (non-stationary)")
```

### ModelDiagnostics (`ModelDiagnostics`)

Regression model diagnostics: heteroscedasticity, autocorrelation, multicollinearity.

| Method | Description | Use Case |
|---------|-------------|-----------|
| `breusch_pagan_test` | Breusch-Pagan | Heteroscedasticity test |
| `white_test` | White's test | General heteroscedasticity |
| `goldfeld_quandt_test` | Goldfeld-Quandt | Structural break heteroscedasticity |
| `durbin_watson_test` | Durbin-Watson | First-order autocorrelation |
| `variance_inflation_factor` | VIF | Multicollinearity diagnosis |
| `levene_test` | Levene's test | Homogeneity of variance |
| `bartlett_test` | Bartlett's test | Homogeneity (normal assumption) |
| `residual_normality_test` | Residual normality | Regression assumption check |

**Example:**
```python
from pywayne.statistics import ModelDiagnostics

md = ModelDiagnostics()
residuals = y - model.predict(X)

# Check assumptions
bp_result = md.breusch_pagan_test(residuals, X)
dw_result = md.durbin_watson_test(residuals)

if bp_result.reject_null:
    print("Warning: Heteroscedasticity detected")
```

## TestResult Object

All test methods return a unified `TestResult` object:

```python
result = nt.shapiro_wilk(data)

# Access results
result.test_name        # Test method name
result.statistic        # Test statistic value
result.p_value          # P-value
result.reject_null      # True if null hypothesis is rejected
result.critical_value   # Critical value (if applicable)
result.confidence_interval # Tuple (lower, upper) if applicable
result.effect_size      # Effect size if applicable
result.additional_info  # Dict with additional information
```

## Utility Functions

### `list_all_tests()`

List all available test methods across all modules.

```python
from pywayne.statistics import list_all_tests
print(list_all_tests())
```

### `show_test_usage(method_name)`

Display usage and documentation for a specific test.

```python
from pywayne.statistics import show_test_usage
show_test_usage('shapiro_wilk')
```

## Method Selection Guide

### Normality Tests

| Sample Size | Recommended Method |
|-------------|-------------------|
| n < 30 | Shapiro-Wilk |
| 30 ≤ n ≤ 300 | Shapiro-Wilk, D'Agostino-Pearson |
| n > 300 | Jarque-Bera, Kolmogorov-Smirnov |

### Location Tests

| Condition | Parametric | Non-parametric |
|-----------|-------------|----------------|
| Normal data | t-test, ANOVA | - |
| Non-normal data | - | Mann-Whitney U, Kruskal-Wallis |
| Paired data | Paired t-test | Wilcoxon signed-rank |

## Multiple Testing Correction

When performing multiple tests, apply p-value correction:

```python
from statsmodels.stats.multitest import multipletests

p_values = [r.p_value for r in results]
rejected, p_corrected, _, _ = multipletests(
    p_values, alpha=0.05, method='fdr_bh'
)
```

## Common Applications

### Data Quality Check

```python
def data_quality_check(data):
    nt = NormalityTests()
    lt = LocationTests()

    normality = nt.shapiro_wilk(data)

    # Outlier detection (IQR)
    Q1, Q3 = np.percentile(data, [25, 75])
    IQR = Q3 - Q1
    outliers = data[(data < Q1 - 1.5*IQR) | (data > Q3 + 1.5*IQR)]

    return {
        'size': len(data),
        'is_normal': not normality.reject_null,
        'p_value': normality.p_value,
        'outliers': len(outliers)
    }
```

### A/B Testing Workflow

```python
def ab_test_analysis(control, treatment):
    nt = NormalityTests()
    lt = LocationTests()

    # Check normality
    norm_c = nt.shapiro_wilk(control[:100])
    norm_t = nt.shapiro_wilk(treatment[:100])

    # Choose appropriate test
    if norm_c.p_value > 0.05 and norm_t.p_value > 0.05:
        result = lt.two_sample_ttest(control, treatment)
    else:
        result = lt.mann_whitney_u(control, treatment)

    return {
        'test_used': result.test_name,
        'p_value': result.p_value,
        'significant': result.reject_null,
        'effect_size': result.effect_size
    }
```

### Regression Model Diagnostics

```python
def diagnose_model(y, X, model):
    md = ModelDiagnostics()
    residuals = y - model.predict(X)

    return {
        'heteroscedasticity_bp': md.breusch_pagan_test(residuals, X).reject_null,
        'autocorrelation_dw': md.durbin_watson_test(residuals).statistic,
        'residuals_normal': md.residual_normality_test(residuals).p_value,
        'vif_max': max(md.variance_inflation_factor(X))
    }
```

## Notes

- All methods accept `np.ndarray` or list as input
- All methods return `TestResult` with consistent interface
- Always validate test assumptions before applying parametric tests
- Apply multiple testing correction when performing several tests
- Report effect sizes alongside p-values for complete interpretation

Overview

This skill is a comprehensive statistical testing library implementing 37+ methods for normality, location, correlation, time series, and regression diagnostics. It returns unified TestResult objects with consistent fields (p-value, statistic, confidence interval, effect size, reject_null) to simplify pipelines. Use it for hypothesis testing, A/B testing, data quality checks, time series analysis, and model validation.

How this skill works

Each test method accepts numpy arrays or Python lists and runs the chosen statistical procedure, returning a TestResult object with standardized attributes (test_name, statistic, p_value, reject_null, critical_value, confidence_interval, effect_size, additional_info). Utility functions list_all_tests() and show_test_usage() help discover and inspect methods. Multiple-testing workflows are supported by returning p-values that integrate easily with standard correction routines.

When to use it

Perform normality checks before parametric tests or modeling
Compare group means or medians in experiments and A/B tests
Assess correlation and independence between variables
Evaluate time series properties: stationarity, autocorrelation, cointegration
Run regression diagnostics for heteroscedasticity, autocorrelation, multicollinearity

Best practices

Always validate assumptions (normality, equal variance, independence) before choosing parametric tests
Check residuals with ModelDiagnostics after fitting regression models
Report effect sizes in addition to p-values for practical interpretation
Apply p-value correction when running multiple tests (e.g., FDR, Bonferroni)
Use appropriate sample-size-specific normality tests (Shapiro-Wilk for small samples, Jarque-Bera or KS for large)

Example use cases

A/B testing: check normality on samples then run two-sample t-test or Mann-Whitney U accordingly
Data quality pipeline: run normality, compute outliers (IQR), and summarize TestResult outputs
Time series workflow: run ADF and KPSS to assess stationarity, then Ljung-Box for autocorrelation
Regression validation: use Breusch-Pagan, Durbin-Watson, and VIF to detect assumption violations
Exploratory analysis: compute Pearson/Spearman/Kendall correlations and chi-square independence tests

FAQ

What inputs do test methods accept?

All methods accept numpy.ndarray or Python list inputs; they convert data internally and return a TestResult object.

How do I handle multiple test p-values?

Collect p_value from each TestResult and apply a correction such as statsmodels.stats.multitest.multipletests with method='fdr_bh' or Bonferroni.