home / skills / omidzamani / dspy-skills / dspy-advanced-module-composition

dspy-advanced-module-composition skill

safe

/skills/dspy-advanced-module-composition

This skill helps you compose DSPy programs by orchestrating ensemble, multi-chain comparison, and sequential modules for robust multi-step workflows.

npx playbooks add skill omidzamani/dspy-skills --skill dspy-advanced-module-composition

Review the files below or copy the command above to add this skill to your agents.

Files (1)

SKILL.md

8.6 KB

---
name: dspy-advanced-module-composition
version: "1.0.0"
dspy-compatibility: "3.1.2"
description: This skill should be used when the user asks to "compose DSPy modules", "use Ensemble optimizer", "combine multiple programs", "use dspy.MultiChainComparison", mentions "ensemble voting", "module composition", "sequential pipelines", or needs to build complex multi-module DSPy programs with ensemble patterns or multi-chain comparison.
allowed-tools:
  - Read
  - Write
  - Glob
  - Grep
---

# DSPy Advanced Module Composition

## Goal

Compose complex DSPy programs using the Ensemble optimizer, MultiChainComparison for reasoning synthesis, and sequential module patterns.

## When to Use

- Need consensus from multiple approaches
- Comparing different reasoning strategies
- Building robust pipelines with fallbacks
- Complex multi-step workflows with branching
- Ensemble methods for improved accuracy

## Related Skills

- Design modules: [dspy-custom-module-design](../dspy-custom-module-design/SKILL.md)
- Define signatures: [dspy-signature-designer](../dspy-signature-designer/SKILL.md)
- Evaluate performance: [dspy-evaluation-suite](../dspy-evaluation-suite/SKILL.md)

## Inputs

| Input | Type | Description |
|-------|------|-------------|
| `modules` | `list[dspy.Module]` | Modules to compose |
| `composition_type` | `str` | "ensemble", "sequential", "comparison" |

## Outputs

| Output | Type | Description |
|--------|------|-------------|
| `composed_program` | `dspy.Module` | Composed multi-module program |

## Workflow

### Phase 1: Ensemble Voting

Combine multiple programs using the Ensemble optimizer:

```python
import dspy
from dspy.teleprompt import Ensemble

dspy.configure(lm=dspy.LM("openai/gpt-4o-mini"))

# Define a signature for the task
class BasicQA(dspy.Signature):
    """Answer questions with short factoid answers."""
    question = dspy.InputField()
    answer = dspy.OutputField()

# Create multiple program instances (should be optimized/compiled programs)
# For simple demonstration, we'll use different predictors
program1 = dspy.Predict(BasicQA)
program2 = dspy.ChainOfThought(BasicQA)
program3 = dspy.Predict(BasicQA)

# Ensemble is an optimizer that compiles programs together
ensemble = Ensemble(reduce_fn=dspy.majority)
ensembled_program = ensemble.compile([program1, program2, program3])

# Use the ensembled program
result = ensembled_program(question="What is 2 + 2?")
print(result.answer)  # Voted answer
```

### Phase 2: MultiChainComparison

Compare multiple reasoning attempts:

```python
import dspy

class BasicQA(dspy.Signature):
    """Answer questions with short factoid answers."""
    question = dspy.InputField()
    answer = dspy.OutputField(desc="often between 1 and 5 words")

class ComparisonPipeline(dspy.Module):
    def __init__(self):
        # Generate multiple reasoning attempts
        self.cot = dspy.ChainOfThought(BasicQA)

        # Compare M attempts and select best
        # Must pass a Signature class, not a string
        self.compare = dspy.MultiChainComparison(
            BasicQA,
            M=3,  # Number of attempts to compare
            temperature=0.7
        )

    def forward(self, question):
        # Generate multiple completions to compare
        # Each completion must have rationale/reasoning field
        completions = [
            self.cot(question=question)
            for _ in range(3)
        ]

        # MultiChainComparison synthesizes them into best answer
        # Pass completions as positional arg, not keyword arg
        return self.compare(completions, question=question)

# Usage
dspy.configure(lm=dspy.LM("openai/gpt-4o-mini"))
pipeline = ComparisonPipeline()
result = pipeline(question="Explain quantum computing")
print(f"Best answer: {result.answer}")
print(f"Rationale: {result.rationale}")
```

### Phase 3: Sequential Composition

Chain modules for multi-step workflows:

```python
import dspy

# Define signatures for each step
class QueryRewrite(dspy.Signature):
    """Rewrite a question for better retrieval."""
    question = dspy.InputField()
    refined_query: str = dspy.OutputField()

class GenerateAnswer(dspy.Signature):
    """Generate answer from context and question."""
    context = dspy.InputField()
    question = dspy.InputField()
    answer = dspy.OutputField()

class ValidateAnswer(dspy.Signature):
    """Validate answer quality."""
    answer = dspy.InputField()
    question = dspy.InputField()
    is_valid: bool = dspy.OutputField()
    confidence: float = dspy.OutputField()

class SequentialRAG(dspy.Module):
    """Multi-step RAG pipeline."""

    def __init__(self):
        # Step 1: Query rewriting
        self.rewrite = dspy.Predict(QueryRewrite)

        # Step 2: Retrieval
        self.retrieve = dspy.Retrieve(k=5)

        # Step 3: Answer generation
        self.generate = dspy.ChainOfThought(GenerateAnswer)

        # Step 4: Validation
        self.validate = dspy.Predict(ValidateAnswer)

    def forward(self, question):
        # Sequential execution
        refined = self.rewrite(question=question)
        passages = self.retrieve(refined.refined_query).passages

        answer_pred = self.generate(
            context=passages,
            question=question
        )

        validation = self.validate(
            answer=answer_pred.answer,
            question=question
        )

        return dspy.Prediction(
            answer=answer_pred.answer,
            is_valid=validation.is_valid,
            confidence=validation.confidence
        )

# Usage
dspy.configure(lm=dspy.LM("openai/gpt-4o-mini"))
rag = SequentialRAG()
result = rag(question="What causes lightning?")
print(f"Answer: {result.answer} (valid: {result.is_valid})")
```

### Phase 4: Fallback Strategies

Handle failures with fallback modules:

```python
import dspy
import logging

logger = logging.getLogger(__name__)

class BasicQA(dspy.Signature):
    """Answer questions with short factoid answers."""
    question = dspy.InputField()
    answer = dspy.OutputField()

class RobustQA(dspy.Module):
    """Fallback strategy for errors."""

    def __init__(self):
        self.primary = dspy.ChainOfThought(BasicQA)
        self.fallback = dspy.Predict(BasicQA)

    def forward(self, question):
        try:
            result = self.primary(question=question)
            if result.answer and len(result.answer) > 10:
                return result
        except Exception as e:
            logger.error(f"Primary failed: {e}")

        return self.fallback(question=question)
```

## Production Example

```python
import dspy
from dspy.teleprompt import BootstrapFewShot, Ensemble

class GenerateAnswer(dspy.Signature):
    """Generate answer from context and question."""
    context = dspy.InputField()
    question = dspy.InputField()
    answer = dspy.OutputField()

class MultiStrategyQA(dspy.Module):
    """Production QA with retrieval."""

    def __init__(self):
        self.retrieve = dspy.Retrieve(k=3)
        self.generate = dspy.ChainOfThought(GenerateAnswer)

    def forward(self, question: str):
        context = self.retrieve(question).passages
        return self.generate(context=context, question=question)

# Usage with optimization
dspy.configure(lm=dspy.LM("openai/gpt-4o-mini"))
qa = MultiStrategyQA()

# First, optimize the base program
optimizer = BootstrapFewShot(
    metric=lambda ex, pred, trace: ex.answer in pred.answer,
    max_bootstrapped_demos=3
)

compiled_qa = optimizer.compile(qa, trainset=trainset)

# Then create ensemble from multiple optimized programs
# (train with different seeds or optimizers to get diversity)
program1 = optimizer.compile(qa, trainset=trainset)
program2 = optimizer.compile(qa, trainset=trainset)
program3 = optimizer.compile(qa, trainset=trainset)

ensemble = Ensemble(reduce_fn=dspy.majority)
final_program = ensemble.compile([program1, program2, program3])
```

## Best Practices

1. **Test modules independently** - Validate each module before composition
2. **Handle failures gracefully** - Use try/except in parallel composition
3. **Balance cost vs accuracy** - Ensembles are expensive (N × cost)
4. **Optimize composed programs** - Use BootstrapFewShot or MIPROv2 on final composition
5. **Module reusability** - Design modules to work in multiple compositions

## Limitations

- Ensemble increases cost linearly with module count
- Voting strategies may not work for all output types
- Sequential composition amplifies latency
- Error propagation in chains can be hard to debug
- Parallel composition requires careful state management

## Official Documentation

- **DSPy Documentation**: https://dspy.ai/
- **DSPy GitHub**: https://github.com/stanfordnlp/dspy
- **Modules Guide**: https://dspy.ai/learn/programming/modules/

Overview

This skill composes complex DSPy programs using ensemble optimizers, multi-chain comparison, and sequential module patterns. It helps you build robust multi-module pipelines with voting, fallback strategies, and comparison-based reasoning synthesis. Use it to orchestrate parallel and sequential modules for higher accuracy and resilience.

How this skill works

The skill assembles module lists into a single dspy.Module using Ensemble (voting/reduction), MultiChainComparison (synthesizing multiple reasoning traces), or sequential chaining for multi-step RAG flows. It compiles or wraps prebuilt modules, runs them in parallel or sequence, and reduces outputs via configurable reduce functions or selection logic. You can also attach fallback branches and apply optimization passes like BootstrapFewShot on the composed program.

When to use it

You need consensus across multiple model strategies (ensemble voting).
You want to compare multiple reasoning attempts and synthesize the best answer (MultiChainComparison).
Building multi-step RAG or retrieval + generation pipelines (sequential composition).
Implementing robust fallbacks for error handling or low-confidence outputs.
Optimizing final composed programs with few-shot bootstrapping or ensemble compilation.

Best practices

Test and validate individual modules before composing them to isolate bugs quickly.
Prefer signature-driven module interfaces so composed programs interoperate cleanly.
Use ensembles selectively—balance improved accuracy against N× cost and increased latency.
Include explicit fallback modules and exception handling to avoid silent failures in production.
Optimize composed programs (BootstrapFewShot, MIPROv2) after composition to get reliable, calibrated outputs.

Example use cases

Ensemble QA: compile three optimized QA modules and majority-vote their answers for high-confidence factual responses.
Multi-chain reasoning: generate multiple chain-of-thought traces and use MultiChainComparison to produce the best rationale and final answer.
Sequential RAG: rewrite queries, retrieve passages, generate answers with CoT, and validate outputs before returning a final prediction.
Robust fallback pipeline: primary ChainOfThought module plus a lightweight Predict fallback when the primary fails or returns low-confidence answers.
Production optimization: optimize base modules with BootstrapFewShot, compile multiple variants, then create an Ensemble for deployment.

FAQ

Do ensembles always improve accuracy?

Not always. Ensembles can improve robustness for many tasks, but they incur extra cost and latency and may fail for outputs that are not amenable to simple voting strategies.

How many attempts should I use with MultiChainComparison?

Common values are 3–5 attempts; more attempts increase diversity but also cost. Tune M empirically for your task and budget.

Should I optimize modules before or after composition?

Optimize modules first to ensure each component is strong, then optimize the full composed program (BootstrapFewShot, MIPROv2) to calibrate interactions and reduce compounding errors.