home / skills / toilahuongg / shopify-agents-kit / resilience-engineering

resilience-engineering skill

/.claude/skills/resilience-engineering

This skill helps you implement resilient Shopify integrations by managing rate limits, retry strategies, queues, and circuit breakers.

npx playbooks add skill toilahuongg/shopify-agents-kit --skill resilience-engineering

Review the files below or copy the command above to add this skill to your agents.

Files (1)
SKILL.md
2.6 KB
---
name: resilience-engineering
description: Strategies for handling Shopify API Rate Limits (429), retry policies, and circuit breakers. Essential for high-traffic apps.
---

# Resilience Engineering for Shopify Apps

Shopify's API limit is a "Leaky Bucket". If you pour too much too fast, it overflows (429 Too Many Requests). Your app must handle this gracefully.

## 1. Handling Rate Limits (429)

### The "Retry-After" Header
When Shopify returns a 429, they include a `Retry-After` header (seconds to wait).

**Implementation (using `bottleneck` or custom delay)**:
```typescript
async function fetchWithRetry(url, options, retries = 3) {
  try {
    const res = await fetch(url, options);
    if (res.status === 429) {
      const wait = parseFloat(res.headers.get("Retry-After") || "1.0");
      if (retries > 0) {
        await new Promise(r => setTimeout(r, wait * 1000));
        return fetchWithRetry(url, options, retries - 1);
      }
    }
    return res;
  } catch (err) {
    // network error handling
  }
}
```
*Note: The official `@shopify/shopify-api` client handles retries automatically if configured.*

## 2. Queues & Throttling
For bulk operations (e.g., syncing 10,000 products), you cannot just loop and await.

### Using `bottleneck`
```bash
npm install bottleneck
```

```typescript
import Bottleneck from "bottleneck";

const limiter = new Bottleneck({
  minTime: 500, // wait 500ms between requests (2 req/sec)
  maxConcurrent: 5,
});

const products = await limiter.schedule(() => shopify.rest.Product.list({ ... }));
```

### Background Jobs (BullMQ)
Move heavy lifting to a background worker. (See `redis-bullmq` skill - *to be added if needed, but conceptually here*).

## 3. Circuit Breaker
If an external service (e.g., your own backend API or a shipping carrier) goes down, stop calling it to prevent cascading failures.

### Using `cockatiel`
```bash
npm install cockatiel
```

```typescript
import { CircuitBreaker, handleAll, retry } from 'cockatiel';

// Create a Retry Policy
const retryPolicy = retry(handleAll, { maxAttempts: 3, backoff: new ExponentialBackoff() });

// Create a Circuit Breaker (open after 5 failures, reset after 10s)
const circuitBreaker = new CircuitBreaker(handleAll, {
  halfOpenAfter: 10 * 1000,
  breaker: new ConsecutiveBreaker(5),
});

// Execute
const result = await retryPolicy.execute(() => 
  circuitBreaker.execute(() => fetchMyService())
);
```

## 4. Webhook Idempotency
Shopify guarantees "at least once" delivery. You might receive the same `orders/create` webhook twice.
**Fix**: Store `X-Shopify-Webhook-Id` in Redis/DB with a short TTL (e.g., 24h). If it exists, ignore the request.

Overview

This skill teaches practical resilience engineering for Shopify apps to handle API rate limits (429), retries, throttling, and circuit breakers. It focuses on concrete patterns: honoring Retry-After headers, queuing and throttling bulk work, using circuit breakers for unreliable services, and ensuring webhook idempotency. The guidance is geared for high-traffic apps and large sync jobs.

How this skill works

It inspects common failure modes from Shopify and related services and prescribes code-level strategies. Key behaviors include reading the Retry-After header on 429 responses, applying exponential backoff and retry policies, scheduling requests through a limiter or background queue, and wrapping unstable dependencies with a circuit breaker. It also enforces webhook idempotency by tracking unique webhook delivery IDs.

When to use it

  • When your app receives 429 Too Many Requests from Shopify or third-party APIs.
  • During large data syncs or bulk operations (thousands of products, orders, etc.).
  • When integrating with unreliable external services where failures can cascade.
  • When processing Shopify webhooks to avoid duplicate handling.
  • When you need predictable throughput and to avoid service throttling.

Best practices

  • Always respect the Retry-After header and implement a capped backoff before retrying failed requests.
  • Move heavy or batched work to background workers and use a rate limiter (e.g., Bottleneck) instead of tight loops.
  • Combine retry policies with circuit breakers to stop hammering failing services and allow recovery periods.
  • Persist webhook IDs with a TTL (Redis or DB) to achieve idempotency for at-least-once deliveries.
  • Monitor error rates, throttling responses, and circuit breaker state to tune timeouts, thresholds, and concurrency.

Example use cases

  • Syncing 10,000 products: schedule requests through a Bottleneck limiter and run jobs in a worker queue.
  • Handling intermittent carrier API failures: wrap calls with a circuit breaker and retry with exponential backoff.
  • Responding to 429 responses: parse Retry-After and requeue the request after the specified delay.
  • Processing orders/create webhooks: store X-Shopify-Webhook-Id in Redis for 24 hours to ignore duplicates.
  • Scaling billing or inventory updates: limit concurrent requests and stagger bursts to stay within Shopify’s leaky-bucket model.

FAQ

How many retries should I allow after a 429?

Use a small number (2–4) and always wait the Retry-After interval; unlimited retries can worsen congestion.

Should I use fixed delays or exponential backoff?

Prefer exponential backoff with jitter for network errors, but honor exact Retry-After when Shopify provides it.

When should a circuit breaker open?

Open after a short burst of consecutive failures (e.g., 3–5) and keep it open for a measured cooldown (seconds to minutes) before half-open trials.