home / skills / benchflow-ai / skillsbench / contribution-analysis

This skill quantifies each factor's contribution to outcome variance using R² decomposition, enabling clear prioritization of drivers.

npx playbooks add skill benchflow-ai/skillsbench --skill contribution-analysis

Review the files below or copy the command above to add this skill to your agents.

Files (1)
SKILL.md
2.9 KB
---
name: contribution-analysis
description: Calculate the relative contribution of different factors to a response variable using R² decomposition. Use when you need to quantify how much each factor explains the variance of an outcome.
license: MIT
---

# Contribution Analysis Guide

## Overview

Contribution analysis quantifies how much each factor contributes to explaining the variance of a response variable. This skill focuses on R² decomposition method.

## Complete Workflow

When you have multiple correlated variables that belong to different categories:
```python
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression
from factor_analyzer import FactorAnalyzer

# Step 1: Combine ALL variables into one matrix
pca_vars = ['Var1', 'Var2', 'Var3', 'Var4', 'Var5', 'Var6', 'Var7', 'Var8']
X = df[pca_vars].values
y = df['ResponseVariable'].values

# Step 2: Standardize
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Step 3: Run ONE global PCA on all variables together
fa = FactorAnalyzer(n_factors=4, rotation='varimax')
fa.fit(X_scaled)
scores = fa.transform(X_scaled)

# Step 4: R² decomposition on factor scores
def calc_r2(X, y):
    model = LinearRegression()
    model.fit(X, y)
    y_pred = model.predict(X)
    ss_res = np.sum((y - y_pred) ** 2)
    ss_tot = np.sum((y - np.mean(y)) ** 2)
    return 1 - (ss_res / ss_tot)

full_r2 = calc_r2(scores, y)

# Step 5: Calculate contribution of each factor
contrib_0 = full_r2 - calc_r2(scores[:, [1, 2, 3]], y)
contrib_1 = full_r2 - calc_r2(scores[:, [0, 2, 3]], y)
contrib_2 = full_r2 - calc_r2(scores[:, [0, 1, 3]], y)
contrib_3 = full_r2 - calc_r2(scores[:, [0, 1, 2]], y)
```

## R² Decomposition Method

The contribution of each factor is calculated by comparing the full model R² with the R² when that factor is removed:
```
Contribution_i = R²_full - R²_without_i
```

## Output Format
```python
contributions = {
    'Category1': contrib_0 * 100,
    'Category2': contrib_1 * 100,
    'Category3': contrib_2 * 100,
    'Category4': contrib_3 * 100
}

dominant = max(contributions, key=contributions.get)
dominant_pct = round(contributions[dominant])

with open('output.csv', 'w') as f:
    f.write('variable,contribution\n')
    f.write(f'{dominant},{dominant_pct}\n')
```

## Common Issues

| Issue | Cause | Solution |
|-------|-------|----------|
| Negative contribution | Suppressor effect | Check for multicollinearity |
| Contributions don't sum to R² | Normal behavior | R² decomposition is approximate |
| Very small contributions | Factor not important | May be negligible driver |

## Best Practices

- Run ONE global PCA on all variables together, not separate PCA per category
- Use factor_analyzer with varimax rotation
- Map factors to category names based on loadings interpretation
- Report contribution as percentage
- Identify the dominant (largest) factor

Overview

This skill calculates the relative contribution of different factors to a response variable using R² decomposition on factor scores. It combines all candidate variables into a single factor analysis, then quantifies how much each derived factor explains outcome variance. The result is a percentage contribution per factor and identification of the dominant driver.

How this skill works

All variables are standardized and a single global factor/PCA model is estimated to produce orthogonal factor scores. A linear regression of the outcome on all factor scores produces a full-model R². Each factor's contribution is computed as the drop in R² when that factor's score is omitted (R²_full - R²_without_i). Contributions are reported as percentages and the largest value is flagged as dominant.

When to use it

  • You need to quantify how groups of correlated variables explain variance in an outcome.
  • You want a simple, interpretable decomposition of explained variance by latent factors.
  • Variables are numerous and naturally group into categories but are correlated across groups.
  • You need a reproducible method to report the dominant driver of an outcome.
  • You prefer factor-based summaries (PCA/FA) rather than single-variable importance.

Best practices

  • Run one global PCA or factor analysis on all variables together, not separate per category.
  • Standardize predictors before factor analysis so loadings are comparable.
  • Use an orthogonal rotation (e.g., varimax) to aid interpretation of factor loadings.
  • Map factors to category names based on loadings, then report contributions as percentages.
  • Check for multicollinearity and suppressor effects if any contribution is negative.

Example use cases

  • Assess which behavioral, demographic, or environmental latent factor most explains customer churn.
  • Decompose drivers of test scores when many correlated cognitive and socio-economic variables exist.
  • Compare how product features, pricing, and marketing latent factors contribute to revenue variance.
  • Summarize contributions of physiological, lifestyle, and genetic factors to a health outcome.

FAQ

What if contributions don't sum to the full R²?

R² decomposition by subtraction is approximate; contributions need not sum exactly to R² due to overlap and model geometry.

Why could a contribution be negative?

Negative values can arise from suppressor effects or multicollinearity; inspect loadings and correlations and consider re-specifying factors.