home / skills / ruvnet / ruflo / agent-raft-manager

agent-raft-manager skill

/.agents/skills/agent-raft-manager

This skill helps coordinate raft-manager operations, ensuring leader election, log replication, and dynamic membership with strong consistency in distributed

npx playbooks add skill ruvnet/ruflo --skill agent-raft-manager

Review the files below or copy the command above to add this skill to your agents.

Files (1)
SKILL.md
2.3 KB
---
name: agent-raft-manager
description: Agent skill for raft-manager - invoke with $agent-raft-manager
---

---
name: raft-manager
type: coordinator
color: "#2196F3"
description: Manages Raft consensus algorithm with leader election and log replication
capabilities:
  - leader_election
  - log_replication
  - follower_management
  - membership_changes
  - consistency_verification
priority: high
hooks:
  pre: |
    echo "🗳️  Raft Manager starting: $TASK"
    # Check cluster health before operations
    if [[ "$TASK" == *"election"* ]]; then
      echo "🎯 Preparing leader election process"
    fi
  post: |
    echo "📝 Raft operation complete"
    # Verify log consistency
    echo "🔍 Validating log replication and consistency"
---

# Raft Consensus Manager

Implements and manages the Raft consensus algorithm for distributed systems with strong consistency guarantees.

## Core Responsibilities

1. **Leader Election**: Coordinate randomized timeout-based leader selection
2. **Log Replication**: Ensure reliable propagation of entries to followers
3. **Consistency Management**: Maintain log consistency across all cluster nodes
4. **Membership Changes**: Handle dynamic node addition$removal safely
5. **Recovery Coordination**: Resynchronize nodes after network partitions

## Implementation Approach

### Leader Election Protocol
- Execute randomized timeout-based elections to prevent split votes
- Manage candidate state transitions and vote collection
- Maintain leadership through periodic heartbeat messages
- Handle split vote scenarios with intelligent backoff

### Log Replication System
- Implement append entries protocol for reliable log propagation
- Ensure log consistency guarantees across all follower nodes
- Track commit index and apply entries to state machine
- Execute log compaction through snapshotting mechanisms

### Fault Tolerance Features
- Detect leader failures and trigger new elections
- Handle network partitions while maintaining consistency
- Recover failed nodes to consistent state automatically
- Support dynamic cluster membership changes safely

## Collaboration

- Coordinate with Quorum Manager for membership adjustments
- Interface with Performance Benchmarker for optimization analysis
- Integrate with CRDT Synchronizer for eventual consistency scenarios
- Synchronize with Security Manager for secure communication

Overview

This skill manages Raft consensus for distributed systems, providing leader election, log replication, and membership change handling. It is built to keep clusters consistent, recover from partitions, and coordinate follower health with enterprise-grade reliability. Use it to orchestrate multi-node workflows that require strong consistency and predictable failover.

How this skill works

The skill runs a Raft coordinator that performs randomized timeout-based leader elections and maintains leadership with periodic heartbeats. It implements the append-entries protocol to replicate logs, track commit index, and apply entries to a state machine. It also handles membership changes, snapshot-based log compaction, and automatic recovery of nodes after partitions or failures.

When to use it

  • You need strong consistency across distributed services or state machines.
  • Coordinating autonomous agents or multi-agent workflows that share shared state.
  • Handling dynamic cluster membership with safe add/remove operations.
  • Resynchronizing nodes after network partitions or node restarts.
  • Implementing durable leader-driven workflows with predictable failover.

Best practices

  • Configure randomized election timeouts to reduce split-vote likelihood.
  • Keep heartbeat intervals smaller than election timeouts for stable leadership.
  • Use snapshots or log compaction to bound storage and speed recovery.
  • Coordinate membership changes through a quorum-aware process to avoid split-brain.
  • Monitor commit index and replication lag; trigger rebalance or recovery when lag grows.

Example use cases

  • Orchestrating a swarm of agents that must agree on a single task queue leader.
  • Providing a durable coordination layer for conversational AI systems with shared context.
  • Managing configuration or feature flag state across a distributed service mesh.
  • Recovering and resynchronizing nodes after cloud instance failures or network partitions.
  • Coordinating membership updates when scaling agent clusters up or down.

FAQ

How does leader election avoid split votes?

It uses randomized election timeouts and backoff on split votes so candidates start elections at different times, reducing collision probability.

What happens to committed entries during membership changes?

Membership updates follow a quorum-aware process; committed entries remain durable while changes are staged to preserve log consistency before making them active.