home / skills / eyadsibai / ltk / saga-orchestration

This skill helps you implement distributed transactions and long-running workflows using saga patterns with clear orchestration or choreography guidance.

npx playbooks add skill eyadsibai/ltk --skill saga-orchestration

Review the files below or copy the command above to add this skill to your agents.

Files (1)
SKILL.md
4.1 KB
---
name: saga-orchestration
description: Use when implementing distributed transactions, coordinating multi-service workflows, handling compensating transactions, or asking about "saga pattern", "distributed transactions", "compensating actions", "workflow orchestration", "choreography vs orchestration"
version: 1.0.0
---

# Saga Orchestration

Patterns for managing distributed transactions and long-running business processes.

## Saga Types

### Choreography

```
┌─────┐  ┌─────┐  ┌─────┐
│Svc A│─►│Svc B│─►│Svc C│
└─────┘  └─────┘  └─────┘
   │        │        │
   ▼        ▼        ▼
 Event    Event    Event
```

- Services react to events
- Decentralized control
- Good for simple flows

### Orchestration

```
     ┌─────────────┐
     │ Orchestrator│
     └──────┬──────┘
            │
      ┌─────┼─────┐
      ▼     ▼     ▼
   ┌────┐┌────┐┌────┐
   │Svc1││Svc2││Svc3│
   └────┘└────┘└────┘
```

- Central coordinator
- Explicit control flow
- Better for complex flows

## Saga States

| State | Description |
|-------|-------------|
| **Started** | Saga initiated |
| **Pending** | Waiting for step |
| **Compensating** | Rolling back |
| **Completed** | All steps succeeded |
| **Failed** | Failed after compensation |

## Orchestrator Implementation

```python
@dataclass
class SagaStep:
    name: str
    action: str
    compensation: str
    status: str = "pending"

class SagaOrchestrator:
    async def execute(self, data: dict) -> SagaResult:
        completed_steps = []
        context = {"data": data}

        for step in self.steps:
            result = await step.action(context)
            if not result.success:
                await self.compensate(completed_steps, context)
                return SagaResult(status="failed", error=result.error)

            completed_steps.append(step)
            context.update(result.data)

        return SagaResult(status="completed", data=context)

    async def compensate(self, completed_steps, context):
        for step in reversed(completed_steps):
            await step.compensation(context)
```

## Order Fulfillment Saga Example

```python
class OrderFulfillmentSaga(SagaOrchestrator):
    def define_steps(self, data):
        return [
            SagaStep("reserve_inventory",
                     action=self.reserve_inventory,
                     compensation=self.release_inventory),
            SagaStep("process_payment",
                     action=self.process_payment,
                     compensation=self.refund_payment),
            SagaStep("create_shipment",
                     action=self.create_shipment,
                     compensation=self.cancel_shipment),
        ]
```

## Choreography Example

```python
class OrderChoreographySaga:
    def __init__(self, event_bus):
        self.event_bus = event_bus
        event_bus.subscribe("OrderCreated", self._on_order_created)
        event_bus.subscribe("InventoryReserved", self._on_inventory_reserved)
        event_bus.subscribe("PaymentFailed", self._on_payment_failed)

    async def _on_order_created(self, event):
        await self.event_bus.publish("ReserveInventory", {
            "order_id": event["order_id"],
            "items": event["items"]
        })

    async def _on_payment_failed(self, event):
        # Compensation
        await self.event_bus.publish("ReleaseInventory", {
            "reservation_id": event["reservation_id"]
        })
```

## Best Practices

1. **Make steps idempotent** - Safe to retry
2. **Design compensations carefully** - Must always work
3. **Use correlation IDs** - For tracing across services
4. **Implement timeouts** - Don't wait forever
5. **Log everything** - For debugging failures

## When to Use

**Orchestration:**

- Complex multi-step workflows
- Need visibility into saga state
- Central error handling

**Choreography:**

- Simple event flows
- Loose coupling required
- Independent service teams

Overview

This skill explains and implements the saga orchestration pattern for distributed transactions and long-running workflows. It covers both orchestration and choreography approaches, step definition, compensation handling, and state transitions. It is focused on practical implementation details in Python for coordinating multi-service flows and handling failures.

How this skill works

The skill inspects saga design choices and provides an orchestrator pattern that executes ordered steps, collects context, and runs compensating actions in reverse on failure. It shows how to define SagaStep objects with action and compensation callables, manage saga state (started, pending, compensating, completed, failed), and integrate with event buses for choreography. The examples include an OrderFulfillmentSaga and event-driven choreography handlers to demonstrate both approaches.

When to use it

  • Coordinating multi-step business processes that span services
  • Implementing distributed transactions without two-phase commit
  • Handling compensating transactions when a later step fails
  • Choosing between central orchestration and decentralized choreography
  • Adding visibility and centralized error handling for complex workflows

Best practices

  • Make each step idempotent so retries are safe
  • Design compensating actions to always restore invariants
  • Use correlation IDs to trace a saga across services
  • Implement timeouts and retries to avoid indefinite waits
  • Log state transitions and step results for observability

Example use cases

  • Order fulfillment: reserve inventory, process payment, create shipment with rollbacks on failure
  • Travel booking: book flight, hotel, car with compensations that cancel bookings on error
  • Subscription signup: provision services, charge payment, notify user with cleanup on failure
  • Microservice orchestrations that need centralized visibility and error handling
  • Event-driven flows for simple use cases using choreography and an event bus

FAQ

When should I prefer orchestration over choreography?

Choose orchestration for complex workflows requiring explicit control, central visibility, and easier centralized error handling; choose choreography for simpler, loosely coupled flows managed by events.

How do compensations differ from rollbacks?

Compensations are forward-facing actions that semantically undo prior work (for example, issuing a refund) and must be designed to be idempotent and safe; they are not the same as database rollbacks.