home / skills / amnadtaowsoam / cerebraskills / model-explainability

model-explainability skill

safe

This skill helps you understand and explain how machine learning models decide, boosting trust and actionable feedback for stakeholders.

npx playbooks add skill amnadtaowsoam/cerebraskills --skill model-explainability

Review the files below or copy the command above to add this skill to your agents.

Files (1)

SKILL.md

5.4 KB

---
name: Model Explainability and Interpretability
description: Techniques and tools for understanding how machine learning models make decisions and explaining those decisions to stakeholders.
---

# Model Explainability (XAI)

## Overview

Model Explainability (XAI) is the set of processes and methods that allow human users to comprehend and trust the results and output created by machine learning algorithms. It bridges the gap between high-performing "Black Box" models and the human need for transparency.

**Core Principle**: "Accuracy without explainability is a liability in high-stakes decisions."

---

## 1. Explainability vs. Interpretability

*   **Interpretability**: The degree to which a human can observe the cause of a decision (e.g., "The model rejected the loan because the credit score was < 600").
*   **Explainability**: Post-hoc techniques used to explain a model's logic after a decision is made (e.g., "Even though this is a complex neural network, we can see it focused on the customer's high debt-to-income ratio").

---

## 2. Techniques for Explainability

### A. Global Interpretability (The "Big Picture")
Understanding the model's logic across the entire dataset.
*   **Feature Importance**: Ranking features by how much they contribute to the model's accuracy.
*   **Permutation Importance**: Measuring how much the loss increases when a feature's values are randomly shuffled.

### B. Local Interpretability (The "Single Decision")
Explaining why a specific prediction was made for a specific user.
*   **LIME (Local Interpretable Model-agnostic Explanations)**: Training a simple linear surrogate model around a single prediction to approximate the complex model's behavior.
*   **SHAP (SHapley Additive exPlanations)**: Based on game theory; assigns each feature an "additive" value (Shapley value) showing its contribution to the final prediction.

---

## 3. Implementation with SHAP (Python)

SHAP is the current gold standard for local and global explanation.

```python
import shap
import xgboost

# 1. Train model
model = xgboost.XGBRegressor().fit(X, y)

# 2. Create explainer
explainer = shap.Explainer(model)
shap_values = explainer(X)

# 3. Visualize a single prediction (Local)
shap.plots.waterfall(shap_values[0])

# 4. Visualize global importance (Global)
shap.plots.bar(shap_values)
```

---

## 4. Explaining Deep Learning (Vision & NLP)

Traditional feature importance doesn't work for pixels or vectors.

| Technique | Focus | Use Case |
| :--- | :--- | :--- |
| **Attention Maps** | NLP (Transformers) | Highlight which words the model looked at to translate a sentence. |
| **Grad-CAM** | Vision (CNNs) | Generates a "Heatmap" showing which part of an image led to a classification. |
| **Integrated Gradients**| All Deep Learning | Attributes the output to the input features by computing the gradient. |

---

## 5. Counterfactual Explanations ("What-If")

A counterfactual explanation tells a user: *"If you change Feature X to Value Y, the result would have been Z."*

*   **Example**: "You were denied a loan. If your annual income was $5,000 higher, your loan would have been approved."
*   **Importance**: This provides actionable feedback to users and helps identify hidden thresholds or bias in the model.

---

## 6. Explainability for LLMs and Generative AI

LLM explainability is challenging due to the trillions of parameters.
1.  **Chain of Thought (CoT)**: Forcing the model to "show its work" by outputting its reasoning steps before the final answer.
2.  **Attribution Tools**: Identifying which part of the training data or RAG context was used for a specific claim.
3.  **Visualization Tools**: Using tools like **Captum** to identify which tokens in a prompt were most influential.

---

## 7. Strategic Importance of XAI

1.  **Trust**: Stakeholders (doctors, pilots, judges) won't use AI they don't understand.
2.  **Model Debugging**: Identifying "Shortcut Learning" (e.g., a model that identifies a "Dog" by looking at the "Grass" in the background).
3.  **Regulatory Compliance**: The "Right to Explanation" in GDPR requires meaningful information about the logic involved in automated decisions.
4.  **Bias Detection**: If "Zip Code" is the most important feature, the model might be a proxy for racial bias.

---

## 8. Tools for Model Explainability

1.  **SHAP Library**: The most widely used approach for tabular data.
2.  **LIME Library**: Fast and flexible for local explanations.
3.  **Captum (PyTorch)**: Comprehensive tool for deep learning interpretability.
4.  **Google What-If Tool**: Interactive dashboard for exploring model behavior.
5.  **InterpretML**: Microsoft library for glassbox models (EBMs).

---

## 9. Model Explainability Checklist

- [ ] **Methodology**: Have we chosen the right technique (SHAP vs. LIME) for our model type?
- [ ] **Stakes**: For high-stakes decisions, is the explanation human-readable (not just a chart)?
- [ ] **Debugging**: Have we used XAI to check for "spurious correlations" (learning from noise)?
- [ ] **Transparency**: Does the end-user have access to the "Reason Code" for their specific decision?
- [ ] **Attribution**: For LLM outputs, can we cite the source of the information?
- [ ] **Fairness**: Does SHAP show "Protected Attributes" (Gender/Race) as high-importance features?

---

## Related Skills
* `44-ai-governance/model-bias-fairness`
* `44-ai-governance/model-risk-management`
* `41-incident-management/incident-triage` (using XAI to debug production errors)

Overview

This skill covers practical techniques and tools for making machine learning models understandable and trustworthy. It focuses on both global and local explanation methods, counterfactual reasoning, and interpretability approaches for deep learning and large language models. The goal is to help teams explain decisions to stakeholders, debug models, and meet regulatory and fairness requirements.

How this skill works

The skill inspects model behavior using global techniques (feature importance, permutation importance) and local methods that explain individual predictions (SHAP, LIME). For deep learning it uses attribution and visualization tools (Grad-CAM, Integrated Gradients, attention maps). For LLMs it applies chain-of-thought prompting and attribution for context or training data. It produces human-readable reason codes, visualizations, and counterfactuals to make decisions actionable.

When to use it

When you need to justify high-stakes automated decisions to users or regulators
During model debugging to find spurious correlations or shortcut learning
When auditing models for bias or protected-attribute leakage
To generate actionable feedback for users via counterfactual explanations
When explaining deep learning outputs in vision, NLP, or generative systems

Best practices

Choose technique by model type: SHAP/LIME for tabular, Captum/Integrated Gradients for deep nets
Combine global and local explanations to get both overall behavior and single-decision detail
Provide short, human-readable reason codes alongside visual plots for stakeholders
Validate explanations against ground truth or known causal relationships where possible
Treat explanations as part of testing: include XAI checks in CI for model updates

Example use cases

Explaining a loan denial to a customer with a counterfactual: what minimal change would flip the decision
Using SHAP to surface top features driving churn predictions across a customer base
Applying Grad-CAM to show which image regions triggered a misclassification in a medical scan
Running attribution for an LLM response to identify which document or prompt tokens influenced a claim
Detecting proxy bias by discovering that zip code or other surrogates dominate feature importance

FAQ

When should I use SHAP vs LIME?

Use SHAP for consistent, theoretically grounded additive attributions and global summaries; use LIME for fast, local surrogate explanations when runtime or simplicity is the priority.

Can XAI fix model bias automatically?

No. XAI reveals potential bias and proxies but remediation requires intervention (reweighting, feature removal, fairness-aware training).