NetworksInfrastructure

SYNAPSE: Distributed Consensus Engine for Multi-Node Command Systems

Raft-based consensus engine enabling consistent shared state across multi-node command systems, with safe leader election, quorum-committed log replication, and zero-downtime membership reconfiguration.

SYNAPSE distributed consensus architecture

// The Challenge

Command systems with multiple processing nodes require consistent shared state even when individual nodes fail or network links between them are severed. A centralised coordinator solves the consistency problem but reintroduces a single point of failure. Without a principled consensus algorithm, distributed nodes either risk inconsistent state (split-brain) or grind to a halt when they cannot communicate — neither is acceptable in operational command contexts.

// Our Approach

Implemented a Raft-based consensus engine in Go. Leader election ensures exactly one node is authoritative at any time. Log replication propagates all state changes through the majority quorum before committing, guaranteeing safety under arbitrary single-node and network partition failures. The implementation includes membership reconfiguration (adding and removing nodes without service interruption) and a log compaction mechanism that prevents unbounded storage growth in long-running deployments.

Module 01

Leader Election and Quorum Management

Exactly-once leadership with safe election under partition

SYNAPSE uses Raft's randomised election timeout to ensure exactly one leader is elected in each term, even when the network is partitioned. A candidate must receive votes from a majority quorum before assuming leadership. The implementation handles the split-vote case by retrying with randomised backoff, and prevents pre-emption from stale leaders through term number enforcement.

Raft node state machine. A follower that times out without a heartbeat becomes a candidate. Quorum votes produce a leader. Higher-term signals from any node force immediate reversion to follower.

Randomised election timeout preventing simultaneous candidate split
Majority quorum requirement: (N/2 + 1) votes required for election
Term number enforcement: stale leaders revert to follower immediately
Leader heartbeat mechanism: followers detect leader loss within 2x election timeout
Pre-vote extension: nodes check quorum reachability before triggering election
Leadership transfer: orderly handover without election when leader restarts gracefully

Module 02

Log Replication and Safety Guarantees

Consistent state across all nodes, committed only on quorum acknowledgement

All state changes in SYNAPSE pass through the replicated log. The leader appends the entry locally, sends it to all followers, and only commits it once a majority acknowledge receipt. Committed entries are guaranteed never to be overwritten, even if the leader fails immediately after commit. Followers that fall behind are caught up by replaying log entries from the last confirmed match point.

Log replication sequence. Leader sends AppendEntries to all followers; commits only after acknowledgement from a majority quorum (3 of 5). F4 is slow but catches up on the next heartbeat without blocking the commit.

Append-only log with sequence index and term number per entry
Commit only on majority acknowledgement: no partial commits
Follower catch-up via log replay from match index
Out-of-order delivery buffered until predecessor is confirmed
Conflict detection: follower rejects entries conflicting at the same index
Log durability: entries written to disk before acknowledgement sent

Module 03

Log Compaction and Membership Reconfiguration

Long-running deployment support without unbounded storage growth

Log compaction in SYNAPSE uses snapshotting: the current state machine state is serialised and written as a snapshot, replacing all log entries up to that point. New followers joining the cluster are bootstrapped from the latest snapshot rather than replaying the full log from inception. Membership changes are implemented as log entries processed through the consensus mechanism, preventing split-brain during the transition.

Snapshot-based log compaction at configurable entry count threshold
Snapshot installation on new and lagging followers via streaming transfer
Safe membership change via joint consensus: both old and new quorum required
Node addition without cluster downtime
Node removal with graceful leader handover if removed node is current leader
Compaction does not interrupt in-flight log replication

// Technical Complexity

Raft's safety property holds only when the implementation correctly enforces the election restriction: a candidate cannot win unless it holds all committed entries, and this check requires comparing log term and index, not just index.

Log compaction interacts with replication in a subtle way: a snapshot being transferred to a lagging follower must not be invalidated by concurrent log entries arriving from the leader, requiring careful coordination of the snapshot and replication paths.

Membership change via joint consensus doubles the complexity of the election and commit machinery for the duration of the transition.

// Stack and Methods

Distributed SystemsRaftGogRPCProtocol BuffersBadgerDBPrometheus

Back to Portfolio Discuss this capability →