home / skills / omidzamani / dspy-skills / dspy-bootstrap-fewshot
This skill automatically generates and selects effective few-shot demonstrations for DSPy programs using a teacher model with limited data.
npx playbooks add skill omidzamani/dspy-skills --skill dspy-bootstrap-fewshotReview the files below or copy the command above to add this skill to your agents.
---
name: dspy-bootstrap-fewshot
version: "1.0.0"
dspy-compatibility: "3.1.2"
description: This skill should be used when the user asks to "bootstrap few-shot examples", "generate demonstrations", "use BootstrapFewShot", "optimize with limited data", "create training demos automatically", mentions "teacher model for few-shot", "10-50 training examples", or wants automatic demonstration generation for a DSPy program without extensive compute.
allowed-tools:
- Read
- Write
- Glob
- Grep
---
# DSPy Bootstrap Few-Shot Optimizer
## Goal
Automatically generate and select optimal few-shot demonstrations for your DSPy program using a teacher model.
## When to Use
- You have **10-50 labeled examples**
- Manual example selection is tedious or suboptimal
- You want demonstrations with reasoning traces
- Quick optimization without extensive compute
## Related Skills
- For more data (200+ examples): [dspy-miprov2-optimizer](../dspy-miprov2-optimizer/SKILL.md)
- For agentic systems: [dspy-gepa-reflective](../dspy-gepa-reflective/SKILL.md)
- Measure improvements: [dspy-evaluation-suite](../dspy-evaluation-suite/SKILL.md)
## Inputs
| Input | Type | Description |
|-------|------|-------------|
| `program` | `dspy.Module` | Your DSPy program to optimize |
| `trainset` | `list[dspy.Example]` | Training examples |
| `metric` | `callable` | Evaluation function |
| `metric_threshold` | `float` | Numerical threshold for accepting demos (optional) |
| `max_bootstrapped_demos` | `int` | Max teacher-generated demos (default: 4) |
| `max_labeled_demos` | `int` | Max direct labeled demos (default: 16) |
| `max_rounds` | `int` | Max bootstrapping attempts per example (default: 1) |
| `teacher_settings` | `dict` | Configuration for teacher model (optional) |
## Outputs
| Output | Type | Description |
|--------|------|-------------|
| `compiled_program` | `dspy.Module` | Optimized program with demos |
## Workflow
### Phase 1: Setup
```python
import dspy
from dspy.teleprompt import BootstrapFewShot
# Configure LMs
dspy.configure(lm=dspy.LM("openai/gpt-4o-mini"))
```
### Phase 2: Define Program and Metric
```python
class QA(dspy.Module):
def __init__(self):
self.generate = dspy.ChainOfThought("question -> answer")
def forward(self, question):
return self.generate(question=question)
def validate_answer(example, pred, trace=None):
return example.answer.lower() in pred.answer.lower()
```
### Phase 3: Compile
```python
optimizer = BootstrapFewShot(
metric=validate_answer,
max_bootstrapped_demos=4,
max_labeled_demos=4,
teacher_settings={'lm': dspy.LM("openai/gpt-4o")}
)
compiled_qa = optimizer.compile(QA(), trainset=trainset)
```
### Phase 4: Use and Save
```python
# Use optimized program
result = compiled_qa(question="What is photosynthesis?")
# Save for production (state-only, recommended)
compiled_qa.save("qa_optimized.json", save_program=False)
```
## Production Example
```python
import dspy
from dspy.teleprompt import BootstrapFewShot
from dspy.evaluate import Evaluate
import logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
class ProductionQA(dspy.Module):
def __init__(self):
self.cot = dspy.ChainOfThought("question -> answer")
def forward(self, question: str):
try:
return self.cot(question=question)
except Exception as e:
logger.error(f"Generation failed: {e}")
return dspy.Prediction(answer="Unable to answer")
def robust_metric(example, pred, trace=None):
if not pred.answer or pred.answer == "Unable to answer":
return 0.0
return float(example.answer.lower() in pred.answer.lower())
def optimize_with_bootstrap(trainset, devset):
"""Full optimization pipeline with validation."""
# Baseline
baseline = ProductionQA()
evaluator = Evaluate(devset=devset, metric=robust_metric, num_threads=4)
baseline_score = evaluator(baseline)
logger.info(f"Baseline: {baseline_score:.2%}")
# Optimize
optimizer = BootstrapFewShot(
metric=robust_metric,
max_bootstrapped_demos=4,
max_labeled_demos=4
)
compiled = optimizer.compile(baseline, trainset=trainset)
optimized_score = evaluator(compiled)
logger.info(f"Optimized: {optimized_score:.2%}")
if optimized_score > baseline_score:
compiled.save("production_qa.json", save_program=False)
return compiled
logger.warning("Optimization didn't improve; keeping baseline")
return baseline
```
## Best Practices
1. **Quality over quantity** - 10 excellent examples beat 100 noisy ones
2. **Use stronger teacher** - GPT-4 as teacher for GPT-3.5 student
3. **Validate with held-out set** - Always test on unseen data
4. **Start with 4 demos** - More isn't always better
## Limitations
- Requires labeled training data
- Teacher model costs can add up
- May not generalize to very different inputs
- Limited exploration compared to MIPROv2
## Official Documentation
- **DSPy Documentation**: https://dspy.ai/
- **DSPy GitHub**: https://github.com/stanfordnlp/dspy
- **BootstrapFewShot API**: https://dspy.ai/api/optimizers/BootstrapFewShot/
- **Optimization Guide**: https://dspy.ai/learn/optimization/optimizers/
This skill automates generation and selection of few-shot demonstrations for a DSPy program using a teacher model. It helps produce high-quality training demos and reasoning traces from a limited labeled set (typically 10–50 examples) to improve program performance quickly without heavy compute.
The optimizer uses a stronger teacher model to generate candidate demonstrations and evaluates them against a provided metric. It selects and compiles the best mix of teacher-generated and labeled demos into a DSPy Module, optionally limiting counts per configuration and validating improvements on held-out data.
How many labeled examples do I need?
This skill is designed for small labeled sets—typically 10–50 examples; quality matters more than quantity.
Can I control teacher model cost and output count?
Yes. Configure teacher_settings and limits like max_bootstrapped_demos and max_rounds to balance cost and coverage.