home / skills / dkyazzentwatwa / chatgpt-skills / statistical-analyzer
This skill guides statistical analysis with hypothesis tests, regression, ANOVA, and plain-English interpretations and visuals to inform data-driven decisions.
npx playbooks add skill dkyazzentwatwa/chatgpt-skills --skill statistical-analyzerReview the files below or copy the command above to add this skill to your agents.
---
name: statistical-analyzer
description: Perform statistical hypothesis testing, regression analysis, ANOVA, and t-tests with plain-English interpretations and visualizations.
---
# Statistical Analyzer
Guided statistical analysis with hypothesis testing, regression, ANOVA, and plain-English results.
## Features
- **Hypothesis Testing**: t-tests, chi-square, proportion tests
- **Regression Analysis**: Linear, polynomial, multiple regression
- **ANOVA**: One-way, two-way ANOVA with post-hoc tests
- **Distribution Analysis**: Normality tests, Q-Q plots
- **Correlation Analysis**: Pearson, Spearman with significance
- **Plain-English Results**: Interpret statistical outputs
- **Visualizations**: Regression plots, residual analysis, box plots
- **Report Generation**: PDF/HTML reports with interpretations
## Quick Start
```python
from statistical_analyzer import StatisticalAnalyzer
analyzer = StatisticalAnalyzer()
# T-test
analyzer.load_data(df, group_col='treatment', value_col='score')
results = analyzer.t_test(group1='control', group2='experimental')
print(results['interpretation'])
# Regression
analyzer.load_data(df)
results = analyzer.linear_regression(x='age', y='income')
print(f"R²: {results['r_squared']}")
analyzer.plot_regression('regression.png')
```
## CLI Usage
```bash
# T-test
python statistical_analyzer.py --data data.csv --test t-test --group treatment --value score --output results.html
# ANOVA
python statistical_analyzer.py --data data.csv --test anova --group category --value score --output results.pdf
# Regression
python statistical_analyzer.py --data data.csv --test regression --x age --y income --output report.pdf
# Correlation matrix
python statistical_analyzer.py --data data.csv --test correlation --output correlation.png
```
## API Reference
### StatisticalAnalyzer Class
```python
class StatisticalAnalyzer:
def __init__(self)
# Data Loading
def load_data(self, data, **kwargs) -> 'StatisticalAnalyzer'
def load_csv(self, filepath, **kwargs) -> 'StatisticalAnalyzer'
# Hypothesis Tests
def t_test(self, group1, group2, paired=False, alternative='two-sided') -> Dict
def one_sample_t_test(self, column, expected_mean, alternative='two-sided') -> Dict
def anova(self, groups, value_col) -> Dict
def chi_square(self, observed, expected=None) -> Dict
def proportion_test(self, successes, total, expected_prop=0.5) -> Dict
# Regression
def linear_regression(self, x, y) -> Dict
def polynomial_regression(self, x, y, degree=2) -> Dict
def multiple_regression(self, predictors: List[str], target: str) -> Dict
# Correlation
def correlation(self, method='pearson') -> pd.DataFrame # Correlation matrix
def correlation_test(self, var1, var2, method='pearson') -> Dict
# Distribution Tests
def normality_test(self, column, method='shapiro') -> Dict
def qq_plot(self, column, output=None) -> str
# Visualization
def plot_regression(self, output, x=None, y=None) -> str
def plot_residuals(self, output) -> str
def plot_distribution(self, column, output) -> str
def plot_boxplot(self, groups, value_col, output) -> str
# Reporting
def generate_report(self, output, format='pdf') -> str
def summary(self) -> str
```
## Tests
### T-Test
Compare means between two groups:
```python
analyzer.load_csv('data.csv')
# Independent samples
results = analyzer.t_test(
group1='control',
group2='treatment',
paired=False
)
# Results
print(results)
# {
# 'statistic': -2.45,
# 'p_value': 0.018,
# 'mean_diff': -5.2,
# 'ci': (-9.5, -0.9),
# 'interpretation': 'The difference is statistically significant (p=0.018)...'
# }
# Paired samples (before/after)
results = analyzer.t_test(
group1='before',
group2='after',
paired=True
)
```
### ANOVA
Compare means across multiple groups:
```python
results = analyzer.anova(
groups=['control', 'treatment_a', 'treatment_b'],
value_col='score'
)
# Results include post-hoc tests
print(results['interpretation'])
# "There is a statistically significant difference between groups (p<0.001).
# Post-hoc tests show treatment_a differs from control (p=0.003)..."
```
### Regression Analysis
```python
# Simple linear regression
results = analyzer.linear_regression(x='hours_studied', y='exam_score')
print(f"R² = {results['r_squared']:.3f}")
print(f"Equation: y = {results['slope']:.2f}x + {results['intercept']:.2f}")
print(f"p-value: {results['p_value']:.4f}")
# Polynomial regression
results = analyzer.polynomial_regression(x='age', y='salary', degree=2)
# Multiple regression
results = analyzer.multiple_regression(
predictors=['age', 'experience', 'education'],
target='salary'
)
```
### Correlation Analysis
```python
# Full correlation matrix
corr_matrix = analyzer.correlation(method='pearson')
print(corr_matrix)
# Test specific correlation
results = analyzer.correlation_test('height', 'weight', method='pearson')
print(results['interpretation'])
# "There is a strong positive correlation (r=0.82, p<0.001)"
```
### Distribution Tests
```python
# Test normality
results = analyzer.normality_test('scores', method='shapiro')
# Returns: {'statistic': 0.98, 'p_value': 0.35,
# 'interpretation': 'Data appears normally distributed (p=0.35)'}
# Q-Q plot
analyzer.qq_plot('scores', output='qq_plot.png')
```
## Interpretation Guide
The analyzer provides plain-English interpretations:
### Significance Levels
- **p < 0.001**: "Highly significant"
- **p < 0.01**: "Very significant"
- **p < 0.05**: "Statistically significant"
- **p ≥ 0.05**: "Not statistically significant"
### Effect Sizes
- **Cohen's d**: Small (0.2), Medium (0.5), Large (0.8)
- **R²**: Weak (<0.3), Moderate (0.3-0.7), Strong (>0.7)
- **Correlation**: Weak (<0.3), Moderate (0.3-0.7), Strong (>0.7)
## Visualizations
### Regression Plot
```python
analyzer.linear_regression(x='age', y='income')
analyzer.plot_regression('regression.png')
# Creates scatter plot with regression line and confidence interval
```
### Residual Plot
```python
analyzer.plot_residuals('residuals.png')
# Checks regression assumptions (homoscedasticity)
```
### Box Plot
```python
analyzer.plot_boxplot(
groups=['control', 'treatment_a', 'treatment_b'],
value_col='score',
output='boxplot.png'
)
```
### Distribution Plot
```python
analyzer.plot_distribution('scores', 'distribution.png')
# Histogram with normal curve overlay
```
## Reports
Generate comprehensive reports:
```python
analyzer.load_csv('data.csv')
analyzer.t_test(group1='control', group2='treatment')
analyzer.linear_regression(x='hours', y='score')
# PDF report with all analyses
analyzer.generate_report('analysis_report.pdf', format='pdf')
# HTML report
analyzer.generate_report('analysis_report.html', format='html')
```
Reports include:
- Summary statistics
- Test results with interpretations
- Visualizations
- Assumptions checks
- Recommendations
## Assumptions Checking
Automatic assumptions validation:
```python
# T-test checks:
# - Normality (Shapiro-Wilk)
# - Equal variances (Levene's test)
# Warnings if assumptions violated
# ANOVA checks:
# - Normality per group
# - Homogeneity of variances
# Suggests non-parametric alternatives
# Regression checks:
# - Linearity
# - Homoscedasticity
# - Normality of residuals
# - Independence (Durbin-Watson)
```
## Dependencies
- scipy>=1.10.0
- statsmodels>=0.14.0
- pandas>=2.0.0
- numpy>=1.24.0
- matplotlib>=3.7.0
- seaborn>=0.12.0
- reportlab>=4.0.0
This skill performs guided statistical analysis including hypothesis testing, regression, ANOVA, correlation, and distribution checks, and returns plain-English interpretations and visualizations. It automates assumptions checks, generates plots, and can produce PDF/HTML reports summarizing results. The focus is on actionable outputs: test statistics, confidence intervals, effect sizes, and clear recommendations.
Load a DataFrame or CSV and call the desired method (t_test, anova, linear_regression, correlation, etc.). Each routine runs appropriate assumption checks (normality, homoscedasticity, independence), computes test statistics and effect sizes, and assembles a plain-English interpretation. Visualization helpers create regression plots, residual diagnostics, boxplots, Q–Q plots, and distribution overlays. A report generator compiles results and figures into PDF or HTML outputs.
What outputs will I get from a t-test?
You receive the test statistic, p-value, mean difference, confidence interval, assumption checks, effect size, and a plain-English interpretation.
What if assumptions are violated?
The analyzer issues warnings, suggests nonparametric alternatives, and can run alternate tests (e.g., Mann–Whitney, Kruskal–Wallis) or transform data as appropriate.