home / skills / zpankz / mcp-skillset / goals

goals skill

/goals

This skill helps you optimize prompts with dense process goals, guiding stepwise reasoning to improve reliability and reduce hallucinations.

npx playbooks add skill zpankz/mcp-skillset --skill goals

Review the files below or copy the command above to add this skill to your agents.

Files (1)
SKILL.md
8.7 KB
---
name: goals
description: Optimize prompts via process goals (controllable behavioral instructions) rather than outcome goals (sparse end-result demands). Grounded in sports psychology meta-analysis showing process goals (d=1.36) vastly outperform outcome goals (d=0.09). Use when designing prompts, optimizing LLM steering, implementing CoT/decomposition patterns, or building automatic prompt optimization pipelines. Instantiates surrogate loss paradigm for discrete prompt space.
---

# Process Goals in Prompt Optimization

## Core Principle

**Process goals** (controllable intermediate actions) provide **dense feedback signals**; **outcome goals** (end-result demands) provide **sparse, delayed feedback**. This asymmetry explains why behavioral prompting dominates direct output demands.

```
Mechanism: Dense intermediate supervision → stable gradients → reliable optimization
Failure mode: Sparse outcome signal → high variance → reward hacking / hallucination
```

## Goal Typology

| Type | Effect Size | Prompt Analog | Signal Density | Failure Mode |
|------|-------------|---------------|----------------|--------------|
| **Outcome** | d=0.09 | "Give the correct answer" | Sparse | Hallucination, reward hacking |
| **Performance** | d=0.44 | "Achieve high accuracy" | Proxy | Goodhart's Law misalignment |
| **Process** | d=1.36 | "Think step-by-step" | Dense | Over-specification (rare) |

## λ-Instantiations

### Chain-of-Thought (CoT)

```python
# Outcome (weak): "What is 247 × 38?"
# Process (strong):
prompt = """
Solve 247 × 38.
Think step-by-step:
1. Break into partial products
2. Show each multiplication
3. Sum the results
4. State final answer
"""
```

**Mechanism**: Mandates controllable decomposition → self-supervision at each step → error detection before propagation.

**Variants**: Zero-shot CoT ("Let's think step by step"), Auto-CoT (automated exemplar generation), Faithful CoT (enforced structure).

### Decomposition & Sub-Goals

```python
# Tree-of-Thoughts pattern
decompose = """
Generate 3 possible approaches to this problem.
For each approach:
  - State the sub-goals required
  - Identify potential failure points
  - Estimate confidence
Select the approach with highest expected success.
"""

# ReAct pattern
react = """
Thought: [Analyze current state]
Action: [Select tool/operation]
Observation: [Record result]
... repeat until solved ...
"""
```

**Mechanism**: Explicit sub-goal enumeration → local optimization per sub-problem → composition into global solution.

### Auxiliary Tasks

```python
# Direct (weak): "Write a function to sort this list"
# With auxiliary (strong):
aux_prompt = """
Before writing the function:
1. State the input/output types
2. Identify edge cases (empty, single element, duplicates)
3. Choose algorithm and justify complexity
4. Write the function
5. Trace execution on a small example
"""
```

**Mechanism**: Forces deeper processing via intermediate outputs → surfaces implicit assumptions → catches errors early.

### Structured Output Constraints

```python
# Unstructured (weak): "Analyze this data"
# Structured (strong):
structured = """
Analyze the data. Output as:

## Summary Statistics
[numerical summary]

## Key Findings
1. [finding with evidence]
2. [finding with evidence]

## Confidence Assessment
- High confidence: [claims]
- Uncertain: [claims requiring verification]
"""
```

**Mechanism**: Format constraints → consistent reasoning patterns → verifiable outputs.

## Automatic Optimization Paradigm

### Why Process Goals Emerge

```
Search space: discrete prompt tokens
Objective: maximize downstream performance
Challenge: non-differentiable, combinatorial

Solution: Search for PROCESS INSTRUCTIONS
  → Dense intermediate feedback enables gradient estimation
  → Behavioral prompts transfer across tasks
  → Compositional structure reduces search dimensionality
```

### Optimization Methods

| Method | Mechanism | Process Goal Discovery |
|--------|-----------|----------------------|
| **APE** | LLM generates candidates, scores on held-out | Discovers zero-shot CoT variants |
| **OPRO** | Meta-prompt + performance trajectory | Evolves process instructions iteratively |
| **TextGrad** | Gradient through text feedback | Optimizes behavioral descriptions |
| **DEEVO** | Multi-agent debate | Converges on robust process formulations |

### DSPy Integration

```python
import dspy

class ProcessOptimizedModule(dspy.Module):
    """Process goals as learnable signatures."""

    def __init__(self):
        # Process-oriented signatures
        self.decompose = dspy.ChainOfThought("problem -> subgoals, approach")
        self.execute = dspy.ReAct("subgoals, context -> intermediate_results")
        self.synthesize = dspy.Predict("intermediate_results -> final_answer")

    def forward(self, problem):
        # Explicit process steps
        plan = self.decompose(problem=problem)
        results = self.execute(subgoals=plan.subgoals, context=plan.approach)
        return self.synthesize(intermediate_results=results)

# Optimizer learns to refine process instructions
optimizer = dspy.MIPROv2(metric=task_metric, num_threads=4)
optimized = optimizer.compile(ProcessOptimizedModule(), trainset=examples)
```

## Implementation Patterns

### Pattern 1: Process Scaffolding

```python
def scaffold_prompt(task: str, domain: str) -> str:
    """Wrap any task in process scaffolding."""
    return f"""
Task: {task}

Before responding:
1. Identify the key requirements
2. Consider potential approaches
3. Select approach and justify
4. Execute step-by-step
5. Verify output meets requirements

Domain context: {domain}
"""
```

### Pattern 2: Progressive Disclosure

```python
def progressive_process(complexity: int) -> str:
    """Scale process detail to task complexity."""

    if complexity < 2:  # Trivial
        return ""  # No scaffolding needed

    elif complexity < 4:  # Simple
        return "Think through this step by step."

    elif complexity < 8:  # Moderate
        return """
Break this into steps:
1. Understand the problem
2. Plan your approach
3. Execute and verify
"""

    else:  # Complex
        return """
Use systematic analysis:

## Problem Decomposition
- Core requirements:
- Constraints:
- Success criteria:

## Approach Selection
- Option A: [describe] - Pros/Cons
- Option B: [describe] - Pros/Cons
- Selected: [justify]

## Execution Trace
[step-by-step with intermediate validation]

## Verification
- Requirements met: [checklist]
- Confidence: [with justification]
"""
```

### Pattern 3: Self-Critique Integration

```python
critique_process = """
After your initial response:

CRITIQUE:
- What assumptions did I make?
- Where might I be wrong?
- What would a skeptic object to?

REVISION:
- Address each critique
- Strengthen weak points
- Explicitly note remaining uncertainty
"""
```

## Empirical Calibration

| Benchmark | Outcome Prompt | Process Prompt | Δ Relative |
|-----------|---------------|----------------|------------|
| GSM8K | 45% | 68% | +51% |
| Big-Bench Hard | 38% | 57% | +50% |
| MMLU (hard) | 52% | 61% | +17% |
| Coding (HumanEval) | 64% | 78% | +22% |

**Efficiency**: Process prompting often reduces total tokens via early error detection and structured reasoning.

## Risk Mitigation

| Risk | Mechanism | Mitigation |
|------|-----------|------------|
| Over-specification | Rigid process constrains valid alternatives | Use minimal scaffolding for simple tasks |
| Process drift | Steps followed without achieving goal | Include explicit goal-checking at each step |
| Verbosity | Excessive intermediate output | Compress after verification, emit summary |
| False confidence | Structured output mimics rigor | Require explicit uncertainty quantification |

## Integration with Holonic Architecture

```python
# Process goals as λ-transforms in skill composition
process_transform = {
    "ρ.parse": "Decompose input into components",
    "ρ.branch": "Generate alternative approaches",
    "ρ.reduce": "Select optimal path with justification",
    "ρ.ground": "Execute with intermediate verification",
    "ρ.emit": "Synthesize with confidence bounds"
}

# Validation: process goal adherence
def validate_process(response: str, expected_steps: List[str]) -> bool:
    """Verify process scaffolding was followed."""
    return all(
        step_marker in response
        for step_marker in expected_steps
    )
```

## Quick Reference

```
ALWAYS: Behavioral instructions > outcome demands
SCALE: Process detail ∝ task complexity
VERIFY: Include self-check at each process step
OPTIMIZE: Use APE/OPRO to discover domain-specific process formulations

CoT: "Think step by step" → d=1.36 equivalent
Decomposition: Sub-goals + local optimization
Auxiliary: Intermediate outputs force deep processing
Structure: Format constraints enable verification
```

Overview

This skill optimizes prompts by favoring process goals—controllable, intermediate behavioral instructions—over sparse outcome demands. It encodes patterns like chain-of-thought, decomposition, auxiliary tasks, and structured outputs to produce dense feedback and more reliable model behavior. The approach is grounded in meta-analytic evidence showing process goals greatly outperform outcome-only guidance.

How this skill works

The skill rewrites or generates prompts to mandate intermediate steps, sub-goals, and verification checkpoints so each stage yields actionable signals. It instantiates process scaffolds (CoT, ReAct, Tree-of-Thoughts, auxiliary tasks) and exposes learnable process signatures that an optimizer can search over. During optimization, intermediate outputs are scored to guide discrete prompt search and reduce variance compared to outcome-only metrics.

When to use it

  • Designing prompts for complex reasoning or multi-step tasks
  • Implementing chain-of-thought, decomposition, or ReAct patterns
  • Building automatic prompt optimization or search pipelines
  • Steering LLM behavior where correctness and verifiability matter
  • Reducing hallucination and reward-hacking in downstream outputs

Best practices

  • Scale process detail to task complexity: minimal scaffolding for trivial tasks, rich scaffolding for complex tasks
  • Require explicit verification or uncertainty estimates at each step to avoid process drift
  • Prefer behavioral instructions (what to do) over outcome-only commands (what to produce)
  • Use auxiliary tasks (edge cases, I/O types, traces) to surface hidden assumptions
  • Compress or summarize intermediate work after verification to control verbosity

Example use cases

  • Math problems: enforce step-by-step partial products, show intermediate calculations and final check
  • Code generation: require input/output spec, edge-case listing, algorithm choice, and traced example before code
  • Data analysis: structured output with summary statistics, evidence-backed findings, and confidence categories
  • Prompt optimization: run APE/OPRO/TextGrad-style searches over process instructions to discover robust scaffolds
  • Multi-agent debate: converge on strong process formulations via structured agent interactions

FAQ

Why prefer process goals over outcome goals?

Process goals generate dense, intermediate feedback that stabilizes optimization and reduces high-variance failure modes like hallucination and reward hacking.

Won't detailed scaffolding make responses verbose?

It can; mitigate by compressing verified intermediate results into concise summaries and using progressive disclosure to scale scaffolding to task complexity.

How do I validate that the model followed the process?

Include explicit step markers or expected-step lists in the prompt and check for their presence and correctness in the response; score intermediate outputs separately.