Skip to main content Link Menu Expand (external link) Document Search Copy Copied

Tasks

Table of contents

  1. Receiving tasks
  2. Finishing tasks
  3. Autograder

In this part, you will implement the remainder of the basic MapReduce system. Specifically, you will be implementing distributing map and reduce tasks to workers as well as executing those tasks.

This task must be completed by the checkpoint 1 deadline to receive full credit (more information can be found on Ed).

Receiving tasks

Implement the GetTask RPC to request a task from the coordinator.

The protocol buffers have already been provided for you in proto/coordinator.proto:

message GetTaskReply {
  uint32 job_id = 1;
  string output_dir = 2;
  string app = 3;
  uint32 task = 4;
  string file = 5;
  uint32 n_reduce = 6;
  uint32 n_map = 7;
  bool reduce = 8;
  bool wait = 9;
  repeated MapTaskAssignment map_task_assignments = 10;
  bytes args = 11;
}

Most of the fields should be fairly straightforward to match with the fields provided by the SubmitJob RPC. Additionally, the task field should denote the map or reduce task number that this worker is executing. The wait field should only be true if the worker should become idle and wait before requesting a new task. The reduce field tells the worker whether it should execute a map task or a reduce task.

For map tasks, file corresponds to the input file that the worker should operate on. For reduce tasks, map_task_assignments provides a list of which workers have which map tasks so that the reduce worker can contact the necessary workers for data.

Once you are done, sanity-check that tasks are being assigned correctly by inserting logging statements.

Finishing tasks

If a task completes successfully, the worker will notify the coordinator using the FinishTask RPC. Implement this RPC.

Once the coordinator learns that a task is complete, it should update its data structures. Once all map tasks for a job are complete, the coordinator should begin assigning reduce tasks. Once all map and reduce tasks for a job are complete, the coordinator should mark the job complete. Subsequent calls to the PollJob RPC should have done = true.

Autograder

Once you complete this portion of the assignment, you should be passing the autograder tests up to and including mr-no-duplicates.