home / skills / omidzamani / dspy-skills / dspy-output-refinement-constraints

dspy-output-refinement-constraints skill

safe

/skills/dspy-output-refinement-constraints

This skill refines DSPy outputs through iterative reward-based checks and best-of-N selection to meet format, length, and content constraints.

npx playbooks add skill omidzamani/dspy-skills --skill dspy-output-refinement-constraints

Review the files below or copy the command above to add this skill to your agents.

Files (1)

SKILL.md

6.7 KB

---
name: dspy-output-refinement-constraints
version: "1.0.0"
dspy-compatibility: "3.1.2"
description: This skill should be used when the user asks to "refine DSPy outputs", "enforce constraints", "use dspy.Refine", "select best output", "use dspy.BestOfN", mentions "output validation", "constraint checking", "multi-attempt generation", "reward function", or needs to improve output quality through iterative refinement or best-of-N selection with custom constraints.
allowed-tools:
  - Read
  - Write
  - Glob
  - Grep
---

# DSPy Output Refinement & Constraints

## Goal

Improve output quality using iterative refinement (dspy.Refine) and best-of-N selection (dspy.BestOfN) with custom constraint validation.

## When to Use

- Outputs need format validation (JSON, specific structure)
- Length constraints (max tokens, word count)
- Content requirements (must include X, avoid Y)
- Quality improvement through multiple attempts
- Replacing deprecated Assert/Suggest patterns

## Related Skills

- Design signatures: [dspy-signature-designer](../dspy-signature-designer/SKILL.md)
- Optimize programs: [dspy-miprov2-optimizer](../dspy-miprov2-optimizer/SKILL.md)
- Evaluate quality: [dspy-evaluation-suite](../dspy-evaluation-suite/SKILL.md)

## Inputs

| Input | Type | Description |
|-------|------|-------------|
| `module` | `dspy.Module` | Module to refine |
| `reward_fn` | `callable` | Constraint validation function |
| `N` | `int` | Number of attempts |
| `threshold` | `float` | Minimum reward to accept |

## Outputs

| Output | Type | Description |
|--------|------|-------------|
| `refined_output` | `dspy.Prediction` | Validated, refined result |

## Workflow

### Phase 1: dspy.Refine for Iterative Improvement

Refine iteratively improves outputs across multiple attempts:

```python
import dspy

dspy.configure(lm=dspy.LM("openai/gpt-4o-mini"))

# Base module
summarizer = dspy.ChainOfThought("document -> summary: str")

# Reward function: checks constraints
def summary_reward(args, pred):
    summary = pred.summary
    word_count = len(summary.split())

    if word_count > 100 or len(summary) < 50:
        return 0.0
    if "important" not in summary.lower():
        return 0.5
    return 1.0

# Refine module
refined_summarizer = dspy.Refine(
    module=summarizer,
    reward_fn=summary_reward,
    N=3,
    threshold=1.0
)

# Use it
result = refined_summarizer(document="Long document text here...")
print(result.summary)
```

### Phase 2: dspy.BestOfN for Selection

Generate N outputs and pick the best:

```python
import dspy

def json_reward(args, pred):
    """Validate JSON format and fields."""
    import json
    try:
        data = json.loads(pred.output)
        if not {'name', 'age', 'email'}.issubset(data.keys()):
            return 0.3
        if '@' not in data.get('email', ''):
            return 0.5
        return 1.0
    except json.JSONDecodeError:
        return 0.0

# BestOfN: try 5 times, pick best
extractor = dspy.Predict("text -> output: str")
best_extractor = dspy.BestOfN(module=extractor, reward_fn=json_reward, N=5, threshold=1.0)

result = best_extractor(text="John Doe, 30 years old, [email protected]")
print(result.output)  # Best valid JSON
```

### Phase 3: Multi-Constraint Reward Functions

Complex validation with scoring:

```python
import dspy
import re

def comprehensive_reward(args, pred):
    """Validate format, length, and content."""
    text = pred.answer
    score = 0.0

    # Length: 50-150 words (33%)
    word_count = len(text.split())
    if 50 <= word_count <= 150:
        score += 0.33

    # Format: capitalized, ends with period (33%)
    if re.match(r'^[A-Z]', text) and text.endswith('.'):
        score += 0.33

    # Content: required terms present (34%)
    if all(term in text.lower() for term in ['data', 'analysis']):
        score += 0.34

    return score

# Use with Refine
qa = dspy.ChainOfThought("question -> answer: str")
refined_qa = dspy.Refine(module=qa, reward_fn=comprehensive_reward, N=4, threshold=0.9)

result = refined_qa(question="What is data science?")
```

## Production Example

```python
import dspy
import json
import logging

logger = logging.getLogger(__name__)

class StructuredExtractor(dspy.Module):
    """Extract structured data with validation."""

    def __init__(self):
        self.extractor = dspy.Predict(
            "text -> json_output: str"
        )
        self.refined = dspy.Refine(
            module=self.extractor,
            reward_fn=self.validation_reward,
            N=3,
            threshold=0.9
        )

    def validation_reward(self, args, pred):
        """Validate JSON structure and business logic."""
        try:
            data = json.loads(pred.json_output)
            score = 0.0

            # Required fields
            if {'product', 'price', 'quantity'}.issubset(data.keys()):
                score += 0.4

            # Type validation
            if isinstance(data.get('price'), (int, float)) and data['price'] > 0:
                score += 0.3
            if isinstance(data.get('quantity'), int) and data['quantity'] > 0:
                score += 0.3

            return score
        except (json.JSONDecodeError, TypeError) as e:
            logger.warning(f"Validation failed: {e}")
            return 0.0

    def forward(self, text: str):
        try:
            return self.refined(text=text)
        except Exception as e:
            logger.error(f"Extraction failed: {e}")
            return dspy.Prediction(json_output='{}')

# Usage
extractor = StructuredExtractor()
result = extractor(text="iPhone 15, $999, quantity: 50")
print(result.json_output)
```

## Migration from Assert/Suggest

DSPy 2.6+ deprecates `dspy.Assert`/`dspy.Suggest`. Use Refine with reward functions:

```python
# Old: dspy.Assert(len(output) < 100, "Too long")
# New:
def reward(args, pred):
    return 1.0 if len(pred.output) < 100 else 0.0

refined = dspy.Refine(module=module, reward_fn=reward, N=3, threshold=1.0)
```

## Best Practices

1. **Score gradually** - Use 0.0-1.0 range, not binary pass/fail
2. **Multiple constraints** - Weight each constraint (e.g., 25% each for 4 checks)
3. **Handle exceptions** - Reward functions should never raise, return 0.0 on error
4. **Limit attempts** - 3-5 attempts for Refine, 5-10 for BestOfN
5. **Log failures** - Track which constraints fail most often

## Limitations

- Each attempt costs an additional LLM call
- Reward functions don't receive feedback prompts (unlike GEPA)
- BestOfN is expensive (N × cost)
- No automatic constraint learning (manual reward design)
- Refine may not improve if base module is fundamentally wrong

## Official Documentation

- **DSPy Documentation**: https://dspy.ai/
- **DSPy GitHub**: https://github.com/stanfordnlp/dspy
- **Refine Module**: https://dspy.ai/api/modules/Refine/

Overview

This skill refines DSPy module outputs by applying iterative improvement (dspy.Refine) and best-of-N selection (dspy.BestOfN) with custom reward/constraint functions. It helps enforce format, length, and content rules so predictions meet validation criteria before being accepted. Use it to replace deprecated Assert/Suggest patterns and to raise output quality with controlled multi-attempt generation.

How this skill works

You wrap an existing dspy.Module with Refine or BestOfN and supply a reward_fn that scores each prediction from 0.0 to 1.0. Refine runs iteratively and tries to improve outputs until the threshold is met; BestOfN generates N independent attempts and selects the highest-scoring result. Reward functions validate structure, content, and business rules and must return a numeric score without raising exceptions.

When to use it

You need strict format validation (JSON, CSV, specific schema).
Outputs must meet length or token limits (min/max words or chars).
Content must include required terms or avoid prohibited items.
You want higher-quality results through multiple attempts.
You need a drop-in replacement for deprecated Assert/Suggest patterns.

Best practices

Score gradually using a 0.0–1.0 range and weight multiple constraints.
Keep reward functions robust: catch exceptions and return 0.0 on failure.
Limit N to reasonable values (Refine: 3–5, BestOfN: 5–10) to control cost.
Log failed constraints to identify recurring model weaknesses.
Start with coarse checks, then add finer-grained constraints iteratively.

Example use cases

Enforce JSON schema for extracted entities (name, age, email) with BestOfN selection.
Refine document summaries to required length and include mandatory keywords.
Validate and score e-commerce extraction (product, price, quantity) before ingest.
Apply multi-constraint scoring (length, capitalization, required terms) for QA answers.
Replace Assert/Suggest checks by converting rules into reward functions for Refine.

FAQ

What should a reward function return?

Return a float between 0.0 and 1.0 representing how well the prediction meets constraints; do not raise exceptions.

When to use Refine vs BestOfN?

Use Refine to iteratively improve a single output toward a threshold. Use BestOfN to generate several independent outputs and pick the best.