home / skills / plurigrid / asi / jepsen-testing
This skill helps you architect, run, and interpret Jepsen-style correctness tests for distributed systems under faults, with formal checkers and actionable
npx playbooks add skill plurigrid/asi --skill jepsen-testingReview the files below or copy the command above to add this skill to your agents.
---
name: jepsen-testing
description: Jepsen-style correctness testing for distributed systems under faults (partitions, crashes, clock skew) using concurrent operation histories and formal checkers (linearizability/serializability and Elle-style anomalies). Use when designing, implementing, or running Jepsen tests, or interpreting histories/violations.
---
# Jepsen Testing
## Intake
- Identify the system under test and the exact client surface (Redis, S3, Kafka, HTTP, gRPC).
- Define what "acknowledged" means for each operation (what does the client treat as committed).
- Write the claimed consistency guarantees as a checkable property (linearizable register, RYW, monotonic reads, serializable txns).
- Specify the failure model to test (crash-stop, partitions, clock skew, disk stalls, restarts).
- Decide whether the test must be multi-surface (write via Redis, read via S3) to validate cross-frontend coherence.
## Workload Design
- Prefer the smallest workload that can falsify the claim.
- Use a mix of reads and writes that creates ambiguous interleavings.
- Add a "witness" invariant that is easy to explain:
- Lost acknowledged write.
- Read sees a value that cannot be explained by any sequential execution respecting real-time order.
- List-append: element lost/duplicated or observed order implies a cycle.
## Checker Selection
- Register or map semantics: use a linearizability checker.
- Transactional / multi-object semantics: use Elle-style anomaly detection (write cycles, dirty reads, lost updates).
- If linearizability is too strong for the product, explicitly select a weaker model and encode it (do not silently downgrade).
## Fault (Nemesis) Selection
- Partitions: majority/minority splits, bridge partitions, flapping partitions.
- Process faults: kill and restart, node reboot, rolling restarts.
- Time faults: clock offsets and jumps if the system relies on time.
- Storage faults: fsync latency, I/O stalls, disk-full behavior (only if safe and reversible).
## Run and Minimize
- Start with a short, low-concurrency run until the harness is stable.
- When a failure appears, minimize by reducing:
- Keys, operation count, and concurrency.
- Fault intensity and schedule complexity.
- Preserve determinism (fixed seeds, fixed partition schedule) so a failing history can be reproduced.
## Reporting
- State the exact claim under test and the precise pass/fail property.
- Include the workload, nemesis schedule, and a minimal failing history excerpt.
- Distinguish availability failures (timeouts) from safety failures (incorrect ok results).
This skill provides Jepsen-style correctness testing for distributed systems, focusing on faults like partitions, crashes, and clock skew, and on interpreting concurrent histories with formal checkers. It helps design workloads, choose checkers (linearizability, serializability, Elle-style anomaly detection), run fault injectors, and minimize failing histories for reproducible debugging. Use it when designing, implementing, or evaluating distributed consistency claims.
The skill inspects the system under test by exercising its client surface with controlled, concurrent workloads while injecting nemeses (partitions, process kills, clock faults, storage stalls). It records a fine-grained operation history (timestamps, observed results, acknowledgements) and runs formal checkers to detect violations like non-linearizable operations, write-loss, or Elle-style cycles. When failures occur it provides guidance to minimize the failing history and to produce a compact, reproducible report that separates availability from safety failures.
What should I record during a run to make checking possible?
Record operation type, key(s), arguments, client-observed result, start and end timestamps, and the exact nemesis schedule so the history is replayable and checkable.
If linearizability is too strong, what do I do?
Explicitly define a weaker model you intend to satisfy, encode it as a checker, and document that choice rather than silently downgrading the test.