All Projects
NetworksInfrastructure

SYNAPSE: Distributed Consensus Engine for Multi-Node Command Systems

Raft-based consensus engine enabling consistent shared state across multi-node command systems, with safe leader election, quorum-committed log replication, and zero-downtime membership reconfiguration.

SYNAPSE distributed consensus architecture
// The Challenge

Command systems with multiple processing nodes require consistent shared state even when individual nodes fail or network links between them are severed. A centralised coordinator solves the consistency problem but reintroduces a single point of failure. Without a principled consensus algorithm, distributed nodes either risk inconsistent state (split-brain) or grind to a halt when they cannot communicate — neither is acceptable in operational command contexts.

// Our Approach

Implemented a Raft-based consensus engine in Go. Leader election ensures exactly one node is authoritative at any time. Log replication propagates all state changes through the majority quorum before committing, guaranteeing safety under arbitrary single-node and network partition failures. The implementation includes membership reconfiguration (adding and removing nodes without service interruption) and a log compaction mechanism that prevents unbounded storage growth in long-running deployments.

Module 01

Leader Election and Quorum Management

Exactly-once leadership with safe election under partition

SYNAPSE uses Raft's randomised election timeout to ensure exactly one leader is elected in each term, even when the network is partitioned. A candidate must receive votes from a majority quorum before assuming leadership. The implementation handles the split-vote case by retrying with randomised backoff, and prevents pre-emption from stale leaders through term number enforcement.

FOLLOWERawaiting heartbeatCANDIDATErequesting votesLEADERbroadcasts heartbeatElection timeout / no heartbeatQuorum votes receivedHigher term seen or split voteHigher term detectedHeartbeat
Raft node state machine. A follower that times out without a heartbeat becomes a candidate. Quorum votes produce a leader. Higher-term signals from any node force immediate reversion to follower.
  • Randomised election timeout preventing simultaneous candidate split
  • Majority quorum requirement: (N/2 + 1) votes required for election
  • Term number enforcement: stale leaders revert to follower immediately
  • Leader heartbeat mechanism: followers detect leader loss within 2x election timeout
  • Pre-vote extension: nodes check quorum reachability before triggering election
  • Leadership transfer: orderly handover without election when leader restarts gracefully
Module 02

Log Replication and Safety Guarantees

Consistent state across all nodes, committed only on quorum acknowledgement

All state changes in SYNAPSE pass through the replicated log. The leader appends the entry locally, sends it to all followers, and only commits it once a majority acknowledge receipt. Committed entries are guaranteed never to be overwritten, even if the leader fails immediately after commit. Followers that fall behind are caught up by replaying log entries from the last confirmed match point.

LeaderF1F2F3F4(slow / lagging)AppendEntries [idx=42]ACKno responseCOMMITquorum = 3/5CommitIndex updatecatches up nextheartbeat
Log replication sequence. Leader sends AppendEntries to all followers; commits only after acknowledgement from a majority quorum (3 of 5). F4 is slow but catches up on the next heartbeat without blocking the commit.
  • Append-only log with sequence index and term number per entry
  • Commit only on majority acknowledgement: no partial commits
  • Follower catch-up via log replay from match index
  • Out-of-order delivery buffered until predecessor is confirmed
  • Conflict detection: follower rejects entries conflicting at the same index
  • Log durability: entries written to disk before acknowledgement sent
Module 03

Log Compaction and Membership Reconfiguration

Long-running deployment support without unbounded storage growth

Log compaction in SYNAPSE uses snapshotting: the current state machine state is serialised and written as a snapshot, replacing all log entries up to that point. New followers joining the cluster are bootstrapped from the latest snapshot rather than replaying the full log from inception. Membership changes are implemented as log entries processed through the consensus mechanism, preventing split-brain during the transition.

  • Snapshot-based log compaction at configurable entry count threshold
  • Snapshot installation on new and lagging followers via streaming transfer
  • Safe membership change via joint consensus: both old and new quorum required
  • Node addition without cluster downtime
  • Node removal with graceful leader handover if removed node is current leader
  • Compaction does not interrupt in-flight log replication
// Technical Complexity

Raft's safety property holds only when the implementation correctly enforces the election restriction: a candidate cannot win unless it holds all committed entries, and this check requires comparing log term and index, not just index.

Log compaction interacts with replication in a subtle way: a snapshot being transferred to a lagging follower must not be invalidated by concurrent log entries arriving from the leader, requiring careful coordination of the snapshot and replication paths.

Membership change via joint consensus doubles the complexity of the election and commit machinery for the duration of the transition.

// Stack and Methods
Distributed SystemsRaftGogRPCProtocol BuffersBadgerDBPrometheus