Part 4: Fault-Tolerant KV Service
In this last part, you will build a fault-tolerant key-value service by combining your Raft implementation with a replicated state machine (RSM). The service maintains identical KV databases across multiple servers and continues operating as long as a majority of servers are alive.
Architecture
The system has three layers:
┌───────────┐
│ Client │ Sends get/put RPCs to servers; retries on failure
├───────────┤
│ RSM │ Submits operations to Raft, applies committed ops to KV state
├───────────┤
│ Raft │ Replicates operations across the cluster
└───────────┘
Each server runs an rsm_t that owns a raft_t peer internally. The RSM receives KV RPCs from clients, submits them as commands to Raft, and applies committed commands to its local KV state. Since all servers apply the same commands in the same order, their KV states remain identical.
Replicated State Machine
Implement the RSM in src/kv_raft/rsm.c.
The starter code already routes "get" and "put" RPCs to your rsm_handle_get and rsm_handle_put functions (forwarding unknown RPCs to the inner Raft for peer communication — you should not change this).
Tasks
Submit operations to Raft. When the RSM receives a
getorput, serialize the operation to a command string (JSON is easiest) and callraft_submit(r->raft, cmd, &term, &is_leader). If the returned index is0, returnKV_WRONG_LEADERto the client.Wait for commitment. After
raft_submitsucceeds, you need to wait for the command at your returned index to commit. Pollraft_get_committed(r->raft, index)on a background applier thread, or use apthread_cond_tthat the applier signals when a new entry commits. If you sleep between polls, use a short delay such asusleep(10 * 1000)and keep sleeps in the 10-50ms range.Apply committed operations. Maintain your own KV state (a
pthread_mutex_t-protected hash table works), and when a committed command matches the one you submitted, apply it and return the result to the RPC handler.Handle leadership changes. If the Raft leader changes while you are waiting for a commit (e.g., a different command appears at your index, or too much time passes), return
KV_WRONG_LEADERso the client retries on another server.
All operations must go through the Raft log, including gets. A get that bypasses Raft might read stale data from a server that has been partitioned from the majority. Submitting gets to Raft ensures the server is still the leader at the time of the read. Every RSM peer must apply all committed log entries to its KV state in order, not just the ones it submitted, so the replicas stay in sync. When you see a committed entry that another peer submitted, apply it to your KV state but don’t return a result for it (no one on this server is waiting for it).
Hints
- You need to encode your KV operations into the command string passed to
raft_submit. Tag each op with this server’smeplus a uniqueidso the handler waiting at index N can tell whether the op that landed there is its op or someone else’s. - Be careful about the gap between
raft_submitand commitment. The leader can change, and a different command might be committed at the index you were given.
Client
Implement the cluster-aware client in src/kv_raft/client.c. The client has an rpc_client_t * for each RSM server.
Tasks
- Track the leader. Remember which server responded successfully last time and try it first. An
atomic_inton the struct is convenient. - Retry on
KV_WRONG_LEADER. Try the next server. - Retry on network failure (
rpc_call_timeoutreturnedNULL). Try the next server. - Handle
KV_MAYBEcorrectly. Use the same logic as in Part 1.
Unlike Part 1, the client now cycles through multiple servers. When a server returns
KV_WRONG_LEADERor the RPC fails, move on to the next server (wrapping around). Userpc_call_timeout(..., 1)so a dead server doesn’t stall the client for long.
Testing
./test_kvraft test_basic_kvraft
./test_kvraft test_concurrent_kvraft
./test_kvraft test_leader_failure_kvraft
./test_kvraft test_persist_kvraft
./test_kvraft test_partition_kvraft
./test_kvraft test_unreliable_kvraft
./test_kvraft test_many_partitions
./test_kvraft test_persist_concurrent
./test_kvraft test_persist_partition
./test_kvraft test_lock