home / skills / pluginagentmarketplace / custom-plugin-data-analyst / statistics

statistics skill

/skills/statistics

This skill helps you perform statistical analysis from descriptive to inferential methods, enabling data-driven decisions and robust conclusions.

npx playbooks add skill pluginagentmarketplace/custom-plugin-data-analyst --skill statistics

Review the files below or copy the command above to add this skill to your agents.

Files (7)
SKILL.md
2.5 KB
---
name: statistics
description: Statistical analysis methods, hypothesis testing, and probability for data analytics
version: "2.0.0"
sasmp_version: "2.0.0"
bonded_agent: 03-statistical-analysis-expert
bond_type: PRIMARY_BOND

# Skill Configuration
config:
  atomic: true
  retry_enabled: true
  max_retries: 3
  backoff_strategy: exponential
  numerical_precision: high

# Parameter Validation
parameters:
  skill_level:
    type: string
    required: true
    enum: [beginner, intermediate, advanced]
    default: beginner
  focus_area:
    type: string
    required: false
    enum: [descriptive, inferential, probability, regression, experiments, all]
    default: all
  tool_preference:
    type: string
    required: false
    enum: [python, r, excel, all]
    default: python

# Observability
observability:
  logging_level: info
  metrics: [calculation_accuracy, test_validity, model_fit]
---

# Statistics Skill

## Overview
Master statistical concepts and methods essential for data analysis, from descriptive statistics to advanced inferential techniques.

## Core Topics

### Descriptive Statistics
- Measures of central tendency (mean, median, mode)
- Measures of dispersion (variance, standard deviation, IQR)
- Data distributions and skewness
- Percentiles and quartiles

### Inferential Statistics
- Sampling methods and sample size determination
- Confidence intervals
- Hypothesis testing (t-tests, chi-square, ANOVA)
- P-values and statistical significance

### Probability
- Basic probability rules
- Probability distributions (normal, binomial, Poisson)
- Bayes' theorem
- Expected value and variance

### Regression Analysis
- Linear regression
- Multiple regression
- Logistic regression
- Model validation and diagnostics

## Learning Objectives
- Apply descriptive statistics to summarize data
- Conduct hypothesis tests for business decisions
- Build and interpret regression models
- Communicate statistical findings effectively

## Error Handling

| Error Type | Cause | Recovery |
|------------|-------|----------|
| Sample too small | Insufficient data | Increase sample or use bootstrap |
| Assumption violated | Data doesn't fit test | Use non-parametric alternative |
| Multicollinearity | Correlated predictors | Remove or combine variables |
| Outliers | Extreme values | Investigate or use robust methods |
| P-hacking | Multiple testing | Apply Bonferroni correction |

## Related Skills
- programming (for implementing statistical models)
- visualization (for presenting statistical insights)
- advanced (for machine learning)

Overview

This skill provides a practical toolkit for statistical analysis tailored to data analytics workflows. It covers descriptive statistics, probability, inferential tests, and regression methods so you can summarize data, test hypotheses, and build predictive models. The focus is on actionable techniques and common pitfalls to support robust, repeatable analysis.

How this skill works

The skill inspects datasets to compute core descriptive metrics (mean, median, variance, percentiles) and evaluates distribution shape and dispersion. It guides sampling considerations, constructs confidence intervals, and runs hypothesis tests (t-tests, chi-square, ANOVA) while reporting p-values and assumptions. For modeling, it fits linear, multiple, and logistic regressions, performs validation and diagnostics, and flags issues like multicollinearity and outliers.

When to use it

  • Summarize central tendency and spread before modeling or reporting
  • Determine sample size and evaluate sampling strategy for studies
  • Test hypotheses to support business or scientific decisions
  • Build and validate regression models for prediction or inference
  • Diagnose model problems like multicollinearity, heteroscedasticity, or influential points

Best practices

  • Always visualize distributions before choosing tests or models
  • Check and state test assumptions; use non-parametric alternatives when violated
  • Pre-register analysis plans or correct for multiple tests to avoid p-hacking
  • Perform diagnostics: residual analysis, VIF for multicollinearity, and leverage for outliers
  • Prefer confidence intervals and effect sizes alongside p-values for interpretation

Example use cases

  • Compare conversion rates between variants using chi-square or t-tests
  • Estimate customer lifetime value distribution and summarize percentiles
  • Build a regression model to predict churn and validate with holdout or cross-validation
  • Calculate required sample size for A/B testing and plan enrollment
  • Detect and mitigate multicollinearity when adding correlated predictors to a marketing mix model

FAQ

What if my sample is too small for standard tests?

Increase sample size when possible, or use bootstrap methods and non-parametric tests that make fewer assumptions.

How do I choose between parametric and non-parametric tests?

Inspect distribution shape and variance homogeneity; if assumptions are violated, use a suitable non-parametric alternative like Mann-Whitney or Kruskal-Wallis.