home / skills / eyadsibai / ltk / saga-orchestration

saga-orchestration skill

safe

/plugins/ltk-engineering/skills/architecture/saga-orchestration

This skill helps you implement distributed transactions and long-running workflows using saga patterns with clear orchestration or choreography guidance.

npx playbooks add skill eyadsibai/ltk --skill saga-orchestration

Review the files below or copy the command above to add this skill to your agents.

Files (1)

SKILL.md

4.1 KB

---
name: saga-orchestration
description: Use when implementing distributed transactions, coordinating multi-service workflows, handling compensating transactions, or asking about "saga pattern", "distributed transactions", "compensating actions", "workflow orchestration", "choreography vs orchestration"
version: 1.0.0
---

# Saga Orchestration

Patterns for managing distributed transactions and long-running business processes.

## Saga Types

### Choreography

```
┌─────┐  ┌─────┐  ┌─────┐
│Svc A│─►│Svc B│─►│Svc C│
└─────┘  └─────┘  └─────┘
   │        │        │
   ▼        ▼        ▼
 Event    Event    Event
```

- Services react to events
- Decentralized control
- Good for simple flows

### Orchestration

```
     ┌─────────────┐
     │ Orchestrator│
     └──────┬──────┘
            │
      ┌─────┼─────┐
      ▼     ▼     ▼
   ┌────┐┌────┐┌────┐
   │Svc1││Svc2││Svc3│
   └────┘└────┘└────┘
```

- Central coordinator
- Explicit control flow
- Better for complex flows

## Saga States

| State | Description |
|-------|-------------|
| **Started** | Saga initiated |
| **Pending** | Waiting for step |
| **Compensating** | Rolling back |
| **Completed** | All steps succeeded |
| **Failed** | Failed after compensation |

## Orchestrator Implementation

```python
@dataclass
class SagaStep:
    name: str
    action: str
    compensation: str
    status: str = "pending"

class SagaOrchestrator:
    async def execute(self, data: dict) -> SagaResult:
        completed_steps = []
        context = {"data": data}

        for step in self.steps:
            result = await step.action(context)
            if not result.success:
                await self.compensate(completed_steps, context)
                return SagaResult(status="failed", error=result.error)

            completed_steps.append(step)
            context.update(result.data)

        return SagaResult(status="completed", data=context)

    async def compensate(self, completed_steps, context):
        for step in reversed(completed_steps):
            await step.compensation(context)
```

## Order Fulfillment Saga Example

```python
class OrderFulfillmentSaga(SagaOrchestrator):
    def define_steps(self, data):
        return [
            SagaStep("reserve_inventory",
                     action=self.reserve_inventory,
                     compensation=self.release_inventory),
            SagaStep("process_payment",
                     action=self.process_payment,
                     compensation=self.refund_payment),
            SagaStep("create_shipment",
                     action=self.create_shipment,
                     compensation=self.cancel_shipment),
        ]
```

## Choreography Example

```python
class OrderChoreographySaga:
    def __init__(self, event_bus):
        self.event_bus = event_bus
        event_bus.subscribe("OrderCreated", self._on_order_created)
        event_bus.subscribe("InventoryReserved", self._on_inventory_reserved)
        event_bus.subscribe("PaymentFailed", self._on_payment_failed)

    async def _on_order_created(self, event):
        await self.event_bus.publish("ReserveInventory", {
            "order_id": event["order_id"],
            "items": event["items"]
        })

    async def _on_payment_failed(self, event):
        # Compensation
        await self.event_bus.publish("ReleaseInventory", {
            "reservation_id": event["reservation_id"]
        })
```

## Best Practices

1. **Make steps idempotent** - Safe to retry
2. **Design compensations carefully** - Must always work
3. **Use correlation IDs** - For tracing across services
4. **Implement timeouts** - Don't wait forever
5. **Log everything** - For debugging failures

## When to Use

**Orchestration:**

- Complex multi-step workflows
- Need visibility into saga state
- Central error handling

**Choreography:**

- Simple event flows
- Loose coupling required
- Independent service teams

Overview

This skill explains and implements the saga orchestration pattern for distributed transactions and long-running workflows. It covers both orchestration and choreography approaches, step definition, compensation handling, and state transitions. It is focused on practical implementation details in Python for coordinating multi-service flows and handling failures.

How this skill works

The skill inspects saga design choices and provides an orchestrator pattern that executes ordered steps, collects context, and runs compensating actions in reverse on failure. It shows how to define SagaStep objects with action and compensation callables, manage saga state (started, pending, compensating, completed, failed), and integrate with event buses for choreography. The examples include an OrderFulfillmentSaga and event-driven choreography handlers to demonstrate both approaches.

When to use it

Coordinating multi-step business processes that span services
Implementing distributed transactions without two-phase commit
Handling compensating transactions when a later step fails
Choosing between central orchestration and decentralized choreography
Adding visibility and centralized error handling for complex workflows

Best practices

Make each step idempotent so retries are safe
Design compensating actions to always restore invariants
Use correlation IDs to trace a saga across services
Implement timeouts and retries to avoid indefinite waits
Log state transitions and step results for observability

Example use cases

Order fulfillment: reserve inventory, process payment, create shipment with rollbacks on failure
Travel booking: book flight, hotel, car with compensations that cancel bookings on error
Subscription signup: provision services, charge payment, notify user with cleanup on failure
Microservice orchestrations that need centralized visibility and error handling
Event-driven flows for simple use cases using choreography and an event bus

FAQ

When should I prefer orchestration over choreography?

Choose orchestration for complex workflows requiring explicit control, central visibility, and easier centralized error handling; choose choreography for simpler, loosely coupled flows managed by events.

How do compensations differ from rollbacks?

Compensations are forward-facing actions that semantically undo prior work (for example, issuing a refund) and must be designed to be idempotent and safe; they are not the same as database rollbacks.