home / skills / amnadtaowsoam / cerebraskills / model-explainability
This skill helps you understand and explain how machine learning models decide, boosting trust and actionable feedback for stakeholders.
npx playbooks add skill amnadtaowsoam/cerebraskills --skill model-explainabilityReview the files below or copy the command above to add this skill to your agents.
---
name: Model Explainability and Interpretability
description: Techniques and tools for understanding how machine learning models make decisions and explaining those decisions to stakeholders.
---
# Model Explainability (XAI)
## Overview
Model Explainability (XAI) is the set of processes and methods that allow human users to comprehend and trust the results and output created by machine learning algorithms. It bridges the gap between high-performing "Black Box" models and the human need for transparency.
**Core Principle**: "Accuracy without explainability is a liability in high-stakes decisions."
---
## 1. Explainability vs. Interpretability
* **Interpretability**: The degree to which a human can observe the cause of a decision (e.g., "The model rejected the loan because the credit score was < 600").
* **Explainability**: Post-hoc techniques used to explain a model's logic after a decision is made (e.g., "Even though this is a complex neural network, we can see it focused on the customer's high debt-to-income ratio").
---
## 2. Techniques for Explainability
### A. Global Interpretability (The "Big Picture")
Understanding the model's logic across the entire dataset.
* **Feature Importance**: Ranking features by how much they contribute to the model's accuracy.
* **Permutation Importance**: Measuring how much the loss increases when a feature's values are randomly shuffled.
### B. Local Interpretability (The "Single Decision")
Explaining why a specific prediction was made for a specific user.
* **LIME (Local Interpretable Model-agnostic Explanations)**: Training a simple linear surrogate model around a single prediction to approximate the complex model's behavior.
* **SHAP (SHapley Additive exPlanations)**: Based on game theory; assigns each feature an "additive" value (Shapley value) showing its contribution to the final prediction.
---
## 3. Implementation with SHAP (Python)
SHAP is the current gold standard for local and global explanation.
```python
import shap
import xgboost
# 1. Train model
model = xgboost.XGBRegressor().fit(X, y)
# 2. Create explainer
explainer = shap.Explainer(model)
shap_values = explainer(X)
# 3. Visualize a single prediction (Local)
shap.plots.waterfall(shap_values[0])
# 4. Visualize global importance (Global)
shap.plots.bar(shap_values)
```
---
## 4. Explaining Deep Learning (Vision & NLP)
Traditional feature importance doesn't work for pixels or vectors.
| Technique | Focus | Use Case |
| :--- | :--- | :--- |
| **Attention Maps** | NLP (Transformers) | Highlight which words the model looked at to translate a sentence. |
| **Grad-CAM** | Vision (CNNs) | Generates a "Heatmap" showing which part of an image led to a classification. |
| **Integrated Gradients**| All Deep Learning | Attributes the output to the input features by computing the gradient. |
---
## 5. Counterfactual Explanations ("What-If")
A counterfactual explanation tells a user: *"If you change Feature X to Value Y, the result would have been Z."*
* **Example**: "You were denied a loan. If your annual income was $5,000 higher, your loan would have been approved."
* **Importance**: This provides actionable feedback to users and helps identify hidden thresholds or bias in the model.
---
## 6. Explainability for LLMs and Generative AI
LLM explainability is challenging due to the trillions of parameters.
1. **Chain of Thought (CoT)**: Forcing the model to "show its work" by outputting its reasoning steps before the final answer.
2. **Attribution Tools**: Identifying which part of the training data or RAG context was used for a specific claim.
3. **Visualization Tools**: Using tools like **Captum** to identify which tokens in a prompt were most influential.
---
## 7. Strategic Importance of XAI
1. **Trust**: Stakeholders (doctors, pilots, judges) won't use AI they don't understand.
2. **Model Debugging**: Identifying "Shortcut Learning" (e.g., a model that identifies a "Dog" by looking at the "Grass" in the background).
3. **Regulatory Compliance**: The "Right to Explanation" in GDPR requires meaningful information about the logic involved in automated decisions.
4. **Bias Detection**: If "Zip Code" is the most important feature, the model might be a proxy for racial bias.
---
## 8. Tools for Model Explainability
1. **SHAP Library**: The most widely used approach for tabular data.
2. **LIME Library**: Fast and flexible for local explanations.
3. **Captum (PyTorch)**: Comprehensive tool for deep learning interpretability.
4. **Google What-If Tool**: Interactive dashboard for exploring model behavior.
5. **InterpretML**: Microsoft library for glassbox models (EBMs).
---
## 9. Model Explainability Checklist
- [ ] **Methodology**: Have we chosen the right technique (SHAP vs. LIME) for our model type?
- [ ] **Stakes**: For high-stakes decisions, is the explanation human-readable (not just a chart)?
- [ ] **Debugging**: Have we used XAI to check for "spurious correlations" (learning from noise)?
- [ ] **Transparency**: Does the end-user have access to the "Reason Code" for their specific decision?
- [ ] **Attribution**: For LLM outputs, can we cite the source of the information?
- [ ] **Fairness**: Does SHAP show "Protected Attributes" (Gender/Race) as high-importance features?
---
## Related Skills
* `44-ai-governance/model-bias-fairness`
* `44-ai-governance/model-risk-management`
* `41-incident-management/incident-triage` (using XAI to debug production errors)
This skill covers practical techniques and tools for making machine learning models understandable and trustworthy. It focuses on both global and local explanation methods, counterfactual reasoning, and interpretability approaches for deep learning and large language models. The goal is to help teams explain decisions to stakeholders, debug models, and meet regulatory and fairness requirements.
The skill inspects model behavior using global techniques (feature importance, permutation importance) and local methods that explain individual predictions (SHAP, LIME). For deep learning it uses attribution and visualization tools (Grad-CAM, Integrated Gradients, attention maps). For LLMs it applies chain-of-thought prompting and attribution for context or training data. It produces human-readable reason codes, visualizations, and counterfactuals to make decisions actionable.
When should I use SHAP vs LIME?
Use SHAP for consistent, theoretically grounded additive attributions and global summaries; use LIME for fast, local surrogate explanations when runtime or simplicity is the priority.
Can XAI fix model bias automatically?
No. XAI reveals potential bias and proxies but remediation requires intervention (reweighting, feature removal, fairness-aware training).