home / skills / plurigrid / asi / causal-inference

causal-inference skill

/skills/causal-inference

This skill enables robust causal reasoning with interventional and counterfactual analysis to improve generalization and robustness in AI systems.

npx playbooks add skill plurigrid/asi --skill causal-inference

Review the files below or copy the command above to add this skill to your agents.

Files (1)
SKILL.md
8.5 KB
---
name: causal-inference
description: "Bengio's causal inference for AI: Interventional reasoning, counterfactuals, and System 2 deep learning. World models with causal structure."
version: 1.0.0
---


# Causal Inference Skill

> *"Current deep learning is System 1: fast, intuitive, but easily fooled. We need System 2: slow, deliberate, causal."*
> — Yoshua Bengio

## Overview

**Causal inference** enables:
1. **Interventional reasoning**: What happens if I *do* X?
2. **Counterfactual reasoning**: What *would have* happened if...?
3. **Transfer**: Causal structure generalizes across domains
4. **Robustness**: Causal models resist distribution shift

## Pearl's Causal Hierarchy

```
Level 3: Counterfactual (Imagining)
  "What would have happened if I had done X?"
  P(y_x | x', y')
          ▲
Level 2: Intervention (Doing)
  "What happens if I do X?"
  P(y | do(X))
          ▲
Level 1: Association (Seeing)
  "What does X tell me about Y?"
  P(y | x)
```

## Structural Causal Models (SCM)

```python
class StructuralCausalModel:
    """
    SCM: Variables, causal graph, structural equations.
    """
    
    def __init__(self, variables: List[str], graph: DAG, equations: Dict):
        self.variables = variables
        self.graph = graph  # Directed Acyclic Graph
        self.equations = equations  # X_i = f_i(parents(X_i), U_i)
    
    def intervene(self, intervention: Dict[str, float]) -> "SCM":
        """
        do(X = x): Replace equation for X with constant.
        
        This breaks incoming edges to X.
        """
        new_equations = self.equations.copy()
        for var, value in intervention.items():
            new_equations[var] = lambda *_: value
        
        new_graph = self.graph.remove_edges_to(intervention.keys())
        
        return StructuralCausalModel(
            self.variables, new_graph, new_equations
        )
    
    def counterfactual(self, evidence: Dict, intervention: Dict) -> Dict:
        """
        Counterfactual: What would Y be if X had been x, given we observed evidence?
        
        Three steps:
        1. Abduction: Infer noise terms from evidence
        2. Action: Apply intervention
        3. Prediction: Compute counterfactual outcome
        """
        # Step 1: Abduction - infer noise terms U
        noise_terms = self.abduct_noise(evidence)
        
        # Step 2: Action - apply intervention
        intervened_scm = self.intervene(intervention)
        
        # Step 3: Prediction - forward propagate with inferred noise
        counterfactual_world = intervened_scm.forward(noise_terms)
        
        return counterfactual_world


class CausalDiscovery:
    """
    Learn causal structure from data.
    """
    
    def __init__(self, data: pd.DataFrame):
        self.data = data
        
    def pc_algorithm(self) -> DAG:
        """
        PC Algorithm: Constraint-based causal discovery.
        
        1. Start with complete undirected graph
        2. Remove edges based on conditional independence tests
        3. Orient edges using v-structures and rules
        """
        from causallearn.search.ConstraintBased.PC import pc
        
        result = pc(self.data.values)
        return result.G
    
    def gflownet_discovery(self) -> Distribution[DAG]:
        """
        Use GFlowNet to sample DAGs proportional to likelihood.
        
        This gives a DISTRIBUTION over causal graphs,
        properly accounting for uncertainty.
        """
        from gflownet import CausalDAGGFlowNet
        
        gfn = CausalDAGGFlowNet(n_variables=len(self.data.columns))
        gfn.train(reward=lambda g: self.bayesian_score(g))
        
        # Sample multiple DAGs
        dag_samples = [gfn.sample() for _ in range(1000)]
        return dag_samples
```

## System 2 Deep Learning

```python
class System2Network:
    """
    Bengio's vision: Combine System 1 (fast) with System 2 (slow).
    
    System 1: Neural pattern matching (current DL)
    System 2: Deliberate causal reasoning (compositional, symbolic)
    """
    
    def __init__(self):
        self.system1 = NeuralNetwork()  # Fast intuition
        self.system2 = CausalReasoner()  # Slow reasoning
        self.attention = DynamicAttention()  # Which to use when
    
    def forward(self, x: Tensor, requires_reasoning: bool = False) -> Tensor:
        """
        Hybrid forward pass.
        """
        # System 1: quick answer
        fast_answer = self.system1(x)
        
        if not requires_reasoning:
            return fast_answer
        
        # System 2: verify/refine via causal reasoning
        slow_answer = self.system2.reason(x, fast_answer)
        
        # Combine based on confidence
        confidence = self.attention(x, fast_answer, slow_answer)
        return confidence * fast_answer + (1 - confidence) * slow_answer
    
    def causal_attention(self, query: Tensor) -> Tensor:
        """
        Attention guided by causal relevance, not just correlation.
        
        Standard attention: A(Q, K, V) = softmax(QK^T/√d) V
        Causal attention: Weight by causal effect, not correlation
        """
        correlational_weights = self.compute_attention(query)
        causal_effects = self.system2.estimate_effects(query)
        
        # Reweight by causal importance
        causal_weights = correlational_weights * causal_effects
        return self.apply_attention(causal_weights)
```

## GF(3) Triads

```
# Causal-Categorical Triad
sheaf-cohomology (-1) ⊗ causal-inference (0) ⊗ gflownet (+1) = 0 ✓

# System 2 Triad
proofgeneral-narya (-1) ⊗ causal-inference (0) ⊗ forward-forward-learning (+1) = 0 ✓

# World Model Triad
persistent-homology (-1) ⊗ causal-inference (0) ⊗ self-evolving-agent (+1) = 0 ✓
```

## Integration with Interaction Entropy

```ruby
module CausalInference
  def self.intervene(interaction_sequence, intervention)
    # Build causal graph from interaction sequence
    graph = build_causal_graph(interaction_sequence)
    
    # Apply intervention
    intervened = graph.do(intervention)
    
    # Predict outcome
    outcome = intervened.forward_propagate
    
    {
      original_graph: graph,
      intervention: intervention,
      predicted_outcome: outcome,
      trit: 0  # Coordinator (bridges observational and interventional)
    }
  end
  
  def self.counterfactual(interaction, alternative_action)
    # What would have happened if we'd done alternative_action?
    noise = abduct_noise(interaction)
    intervened = apply_intervention(alternative_action)
    counterfactual_outcome = propagate_with_noise(intervened, noise)
    
    {
      actual_outcome: interaction[:outcome],
      counterfactual_outcome: counterfactual_outcome,
      difference: interaction[:outcome] - counterfactual_outcome
    }
  end
end
```

## Key Properties

1. **Invariance**: Causal mechanisms are stable across environments
2. **Modularity**: Can change one mechanism without affecting others
3. **Compositionality**: Complex models from simple causal primitives
4. **Identifiability**: Can (sometimes) learn causal structure from data

## References

1. Bengio, Y. et al. (2019). "A Meta-Transfer Objective for Learning to Disentangle Causal Mechanisms."
2. Schölkopf, B. et al. (2021). "Toward Causal Representation Learning."
3. Pearl, J. (2009). *Causality: Models, Reasoning, and Inference*.
4. Bengio, Y. (2017). "The Consciousness Prior."



## Scientific Skill Interleaving

This skill connects to the K-Dense-AI/claude-scientific-skills ecosystem:

### Graph Theory
- **networkx** [○] via bicomodule
  - Universal graph hub

### Bibliography References

- `general`: 734 citations in bib.duckdb



## SDF Interleaving

This skill connects to **Software Design for Flexibility** (Hanson & Sussman, 2021):

### Primary Chapter: 3. Variations on an Arithmetic Theme

**Concepts**: generic arithmetic, coercion, symbolic, numeric

### GF(3) Balanced Triad

```
causal-inference (+) + SDF.Ch3 (○) + [balancer] (−) = 0
```

**Skill Trit**: 1 (PLUS - generation)

### Secondary Chapters

- Ch10: Adventure Game Example
- Ch8: Degeneracy
- Ch4: Pattern Matching
- Ch5: Evaluation
- Ch1: Flexibility through Abstraction
- Ch7: Propagators

### Connection Pattern

Generic arithmetic crosses type boundaries. This skill handles heterogeneous data.
## Cat# Integration

This skill maps to **Cat# = Comod(P)** as a bicomodule in the equipment structure:

```
Trit: 0 (ERGODIC)
Home: Prof
Poly Op: ⊗
Kan Role: Adj
Color: #26D826
```

### GF(3) Naturality

The skill participates in triads satisfying:
```
(-1) + (0) + (+1) ≡ 0 (mod 3)
```

This ensures compositional coherence in the Cat# equipment structure.

Overview

This skill implements practical causal inference tools for building world models with interventional and counterfactual reasoning. It blends structural causal models, causal discovery, and a System 2 style deliberative module to improve robustness and transfer. The focus is on modular causal mechanisms that generalize across environments and resist distribution shift.

How this skill works

The core provides Structural Causal Models (SCMs) that support do-interventions and three-step counterfactual queries (abduction, action, prediction). Causal discovery routines include constraint-based PC and a GFlowNet sampler to produce a distribution over DAGs, capturing structural uncertainty. A System 2 hybrid network combines fast neural pattern recognition with a slow causal reasoner and causal attention that reweights correlational scores by estimated causal effect.

When to use it

  • When you need to predict outcomes under interventions rather than passive observations
  • When counterfactual explanations are required for decisions or debugging
  • When models must generalize across shifting environments or domains
  • When you want uncertainty-aware causal graph estimates rather than a single point graph
  • When integrating causal structure into attention or world-model architectures

Best practices

  • Start by specifying plausible causal variables and priors before blind structure search
  • Use GFlowNet sampling to capture multiple high-probability DAGs and quantify graph uncertainty
  • Validate interventions with held-out interventional data when possible
  • Modularize mechanisms so interventions only replace local equations
  • Combine System 1 predictions with System 2 verification when stakes or ambiguity are high

Example use cases

  • Evaluate policy changes by computing P(y | do(X)) instead of relying on observational correlations
  • Generate counterfactual explanations for model decisions to support audits and debugging
  • Discover causal structure in multivariate time series using PC and GFlowNet ensembles
  • Augment transformer attention with causal reweighting to focus on causally relevant features
  • Build robust world models for simulation and planning that retain invariances across domains

FAQ

Can I learn causal structure from purely observational data?

Sometimes. Constraint-based methods and assumptions (e.g., faithfulness, sufficiency) let you recover parts of the graph, but interventions or domain knowledge improve identifiability.

When should I use GFlowNet discovery instead of a single-score search?

Use GFlowNet when graph uncertainty matters: it samples many plausible DAGs proportional to their scores, enabling ensemble decisions and calibrated downstream inferences.