Part 3: Raft

In this part, you will implement the Raft consensus algorithm. Raft maintains a replicated log across a cluster of servers, ensuring that all servers agree on the same sequence of commands even when some servers crash or become partitioned.

Read the Raft paper thoroughly before writing any code. Sections 5 and 6, and Figure 2 are most important for this homework.

Your implementation goes in src/kv_raft/raft.c.

Raft interface

The starter code defines three public functions. Please do not change these signatures as they are required for testing.

/* Return (current_term, is_leader). */
void raft_get_state(raft_t *r, uint64_t *term, bool *is_leader);

/* Submit a command to the leader's log. Returns a 1-based index, or 0
   if this peer is not the leader. Always sets *term and *is_leader. */
int raft_submit(raft_t *r, const char *cmd,
                uint64_t *term, bool *is_leader);

/* Return the command at 1-based `index` if committed, else NULL.
   Caller frees. */
char *raft_get_committed(raft_t *r, int index);

Your Raft must also implement raft_dispatch to handle request_vote and append_entries RPCs from peers. The starter already wires the test-inspection endpoints (_test/get_state, _test/submit, _test/get_committed) for you — do not modify them.

Leader election

Implement Raft leader election and heartbeats.

Tasks

Raft state. Add fields to the raft struct for the state described in Figure 2 of the paper. You will need to consider synchronisation here — a pthread_mutex_t protecting all state is the simplest choice.
request_vote RPC. Define the needed argument and reply fields. Implement the handler per Figure 2. Serialize the wire format with cJSON (one JSON object per struct, numbers for integers).
append_entries RPC. Define the needed argument and reply fields. For now, the handler only needs to process heartbeats (no log entries yet).
Election timeout. Spawn a background thread with pthread_create that triggers an election if no heartbeat is received within a randomized timeout. You should use randomised election timeouts, in a range of 400–800ms. If you poll for timeout expiration in a loop, keep each sleep in the 10-50ms range. The tests require a new leader to be elected within 5 seconds of the old leader failing.
Heartbeats. When a server becomes leader, it should periodically send empty append_entries RPCs to all peers to maintain authority. A heartbeat interval around 50ms is reasonable.
Implement raft_get_state. Tests read state through this function (via an RPC), so they will fail immediately until it returns real values.

Hints

Raft peers should communicate only via RPC (through r->peers). Use rpc_call_timeout(r->peers[i], "request_vote", body, 1) to send RPCs, and recall from previous parts that it returns NULL on failure.
You must send request_vote RPCs in parallel when you start an election. Start elections by incrementing the term, voting for yourself, and spawning one thread per peer with pthread_create (detach them, or have them free their argument and exit).

Testing

./test_raft test_initial_election
./test_raft test_re_election
./test_raft test_many_elections

Log replication

Implement log entry appending and commitment.

Tasks

raft_submit and raft_get_committed. If this server is the leader, append the command to the log and return the 1-based index (also set *term and *is_leader = true). If not the leader, return 0 with *is_leader = false.
Update append_entries to work with log entries. The leader maintains nextIndex and matchIndex for each peer (Figure 2) and sends entries that the follower is missing.
Follower log handling. Followers must check prev_log_index and prev_log_term as described in Figure 2. On conflict, reject the AppendEntries so the leader can back up.
Commitment. The leader commits an entry when a majority of servers have replicated it.
Election restriction. See Section 5.4.1 of the paper.

Note that logs are 1-based. You may use index 0 as a dummy entry. raft_submit should return a 1-based index.

Hints

You should never hold a lock across an RPC call.
When the leader discovers a follower’s log is behind, it needs to back up nextIndex. You may just decrement by one here, though a faster approach is outlined on pages 7–8 of the paper (with a grey vertical line). Send a conflict_index (and optionally conflict_term) in the AppendEntries reply to let the leader skip whole conflicting terms at once.

Testing

Beyond the leader-election tests, you should now also pass:

./test_raft test_basic_agree
./test_raft test_follower_failure
./test_raft test_leader_failure
./test_raft test_fail_no_agree
./test_raft test_fail_agree
./test_raft test_concurrent_starts
./test_raft test_rejoin
./test_raft test_backup
./test_raft test_partition

Extra Credit: Persistence

Implement state persistence so that Raft servers can recover after crashes.

Tasks

Identify persistent state. Figure 2 specifies which state must be persisted: currentTerm, votedFor, and log.
Persist on every change. Whenever any of these values change, serialize them and save via persister_save(r->persister, data).
Restore on startup. In raft_new, read with persister_read(p). If the return is non-NULL and non-empty, deserialize and restore the state.

Hints

Factor persistence into a helper (e.g. persist_locked) so every mutation site is one line.
cJSON_PrintUnformatted and cJSON_Parse are the easiest way to serialize/deserialize.
The persister writes atomically (write to temp file, then rename), so you don’t need to worry about partial writes.

Testing

You should now pass every ./test_raft test, including:

./test_raft test_persist
./test_raft test_figure8
./test_raft test_reliable_churn
./test_raft test_unreliable_churn

To run all Raft tests:

./test_raft

The full Raft test suite should complete in under 120 seconds.