Part 3: Raft
In this part, you will implement the Raft consensus algorithm. Raft maintains a replicated log across a cluster of servers, ensuring that all servers agree on the same sequence of commands even when some servers crash or become partitioned.
Read the Raft paper thoroughly before writing any code. Sections 5 and 6, and Figure 2 are most important for this homework.
Your implementation goes in src/kv_raft/raft.c.
Raft interface
The starter code defines three public functions. Please do not change these signatures as they are required for testing.
/* Return (current_term, is_leader). */
void raft_get_state(raft_t *r, uint64_t *term, bool *is_leader);
/* Submit a command to the leader's log. Returns a 1-based index, or 0
if this peer is not the leader. Always sets *term and *is_leader. */
int raft_submit(raft_t *r, const char *cmd,
uint64_t *term, bool *is_leader);
/* Return the command at 1-based `index` if committed, else NULL.
Caller frees. */
char *raft_get_committed(raft_t *r, int index);
Your Raft must also implement raft_dispatch to handle request_vote and append_entries RPCs from peers. The starter already wires the test-inspection endpoints (_test/get_state, _test/submit, _test/get_committed) for you — do not modify them.
Leader election
Implement Raft leader election and heartbeats.
Tasks
Raft state. Add fields to the
raftstruct for the state described in Figure 2 of the paper. You will need to consider synchronisation here — apthread_mutex_tprotecting all state is the simplest choice.request_voteRPC. Define the needed argument and reply fields. Implement the handler per Figure 2. Serialize the wire format withcJSON(one JSON object per struct, numbers for integers).append_entriesRPC. Define the needed argument and reply fields. For now, the handler only needs to process heartbeats (no log entries yet).Election timeout. Spawn a background thread with
pthread_createthat triggers an election if no heartbeat is received within a randomized timeout. You should use randomised election timeouts, in a range of 400–800ms. If you poll for timeout expiration in a loop, keep each sleep in the 10-50ms range. The tests require a new leader to be elected within 5 seconds of the old leader failing.Heartbeats. When a server becomes leader, it should periodically send empty
append_entriesRPCs to all peers to maintain authority. A heartbeat interval around 50ms is reasonable.Implement
raft_get_state. Tests read state through this function (via an RPC), so they will fail immediately until it returns real values.
Hints
- Raft peers should communicate only via RPC (through
r->peers). Userpc_call_timeout(r->peers[i], "request_vote", body, 1)to send RPCs, and recall from previous parts that it returnsNULLon failure. - You must send
request_voteRPCs in parallel when you start an election. Start elections by incrementing the term, voting for yourself, and spawning one thread per peer withpthread_create(detach them, or have them free their argument and exit).
Testing
./test_raft test_initial_election
./test_raft test_re_election
./test_raft test_many_elections
Log replication
Implement log entry appending and commitment.
Tasks
raft_submitandraft_get_committed. If this server is the leader, append the command to the log and return the 1-based index (also set*termand*is_leader = true). If not the leader, return0with*is_leader = false.Update
append_entriesto work with log entries. The leader maintainsnextIndexandmatchIndexfor each peer (Figure 2) and sends entries that the follower is missing.Follower log handling. Followers must check
prev_log_indexandprev_log_termas described in Figure 2. On conflict, reject the AppendEntries so the leader can back up.Commitment. The leader commits an entry when a majority of servers have replicated it.
Election restriction. See Section 5.4.1 of the paper.
Note that logs are 1-based. You may use index 0 as a dummy entry. raft_submit should return a 1-based index.
Hints
- You should never hold a lock across an RPC call.
- When the leader discovers a follower’s log is behind, it needs to back up
nextIndex. You may just decrement by one here, though a faster approach is outlined on pages 7–8 of the paper (with a grey vertical line). Send aconflict_index(and optionallyconflict_term) in the AppendEntries reply to let the leader skip whole conflicting terms at once.
Testing
Beyond the leader-election tests, you should now also pass:
./test_raft test_basic_agree
./test_raft test_follower_failure
./test_raft test_leader_failure
./test_raft test_fail_no_agree
./test_raft test_fail_agree
./test_raft test_concurrent_starts
./test_raft test_rejoin
./test_raft test_backup
./test_raft test_partition
Extra Credit: Persistence
Implement state persistence so that Raft servers can recover after crashes.
Tasks
Identify persistent state. Figure 2 specifies which state must be persisted:
currentTerm,votedFor, andlog.Persist on every change. Whenever any of these values change, serialize them and save via
persister_save(r->persister, data).Restore on startup. In
raft_new, read withpersister_read(p). If the return is non-NULL and non-empty, deserialize and restore the state.
Hints
- Factor persistence into a helper (e.g.
persist_locked) so every mutation site is one line. cJSON_PrintUnformattedandcJSON_Parseare the easiest way to serialize/deserialize.- The persister writes atomically (write to temp file, then rename), so you don’t need to worry about partial writes.
Testing
You should now pass every ./test_raft test, including:
./test_raft test_persist
./test_raft test_figure8
./test_raft test_reliable_churn
./test_raft test_unreliable_churn
To run all Raft tests:
./test_raft
The full Raft test suite should complete in under 120 seconds.