General Information:
This is a closed book and one 2-sided handwritten note examination. You have 80 minutes to answer as many questions as possible. The number in parentheses at the beginning of each question indicates the number of points for that question. You should read all of the questions before starting the exam, as some of the questions are substantially more time consuming.

Write all of your answers directly on this paper. Make your answers as concise as possible. If there is something in a question that you believe is open to interpretation, then please ask us about it!

Good Luck!!

<table>
<thead>
<tr>
<th>QUESTION</th>
<th>POINTS ASSIGNED</th>
<th>POINTS OBTAINED</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>20</td>
<td></td>
</tr>
<tr>
<td>2</td>
<td>20</td>
<td></td>
</tr>
<tr>
<td>3</td>
<td>15</td>
<td></td>
</tr>
<tr>
<td>4</td>
<td>20</td>
<td></td>
</tr>
<tr>
<td>5</td>
<td>15</td>
<td></td>
</tr>
<tr>
<td>6</td>
<td>10</td>
<td></td>
</tr>
<tr>
<td>TOTAL</td>
<td>100</td>
<td></td>
</tr>
</tbody>
</table>
1. (20 points total) Short answer questions.
   a. (8 points) True/False and Why? **CIRCLE YOUR ANSWER.**
      i) A lightweight process with one thread is equivalent to a heavyweight process.

         **TRUE**
         Why?
         **TRUE.** A heavyweight process has only one thread. The correct answer was worth 2 points and the justification was worth an additional 2 points.

         **FALSE**
         Why?
         **FALSE.** The OS automatically loads pages from disk when necessary. The correct answer was worth 2 points and the justification was worth an additional 2 points.

      ii) Demand paging requires the programmer to take specific action to force the operating system to load a particular virtual memory page.

         **TRUE**
         Why?
         **FALSE.** The OS automatically loads pages from disk when necessary. The correct answer was worth 2 points and the justification was worth an additional 2 points.

   b. (8 points) Two-level Page Tables:
      i) Give a two to three sentence description of a two-level page table.
         
         A two-level page table uses two levels of page tables where a pagetable pointer points to the top-level table. Entries in the top-level table point to the lower-level page tables. The lower level tables contain PTEs pointing to the physical locations.

      ii) Briefly (2 sentences) state one advantage AND one disadvantage of two-level page tables.
         
         Advantage: Can map sparse address spaces efficiently
         Disadvantage: Requires an additional memory reference to translate virtual to physical addresses.
c. (4 points) List the four requirements for deadlock.

Mutual exclusion, non-preemptable resources, hold and wait, circular chain of waiting.

Each requirement was worth 1 points.
2. (20 points total) Consider the following two functions implementing a producer and consumer by using monitors:

```c
void send(item) {
    lock.acquire()
    enqueue(item);
    printf("before signal()\n");
    dataready.signal(&lock);
    printf("after signal()\n");
    lock.release();
}

item = get() {
    lock.acquire();
    while (queue.isEmpty()) {
        printf("before wait()\n");
        dataready.wait(&lock);
        printf("after wait()\n");
    }
    item = dequeue();
    lock.release();
}
```

a. (4 points) Use no more than three sentences to contrast Hoare and Mesa monitors.

*With Hoare the signaler gives the CPU and the lock to the waiter; With Mesa the signaler schedules the waiter, and then finishes.*

-1 if it was not clear that a thread in a Mesa monitor can hold the lock indefinitely after signaling.
-1 if the signaled/woken thread is put on a wait queue instead of the ready queue.
b. (5 points) Assume two threads T1 and T2, as follows:

\[
\begin{align*}
&T1 \\
&\text{send(item);} \\
&T2 \\
&\text{item = get();}
\end{align*}
\]

What are the possible outputs if the monitor uses the Hoare implementation?

- before signal
- after_signal

- We gave 2 points for the above solution.

- before wait
- before signal
- after wait
- after signal

- We gave 3 points for the above solution.
- -2 if there were more than two answers, but at least 1 point if “something” was right.

c. (5 points) Repeat question (b) for a Mesa implementation of the monitor.

- before signal
- after_signal

- We gave 2 points for the above.

- before wait
- before signal
- after signal
- after wait

- We gave 3 points for the above.
- -2 if there were more than two answers, but at least 1 point if “something” was right.
d. (6 points) Now assume a third thread T3, i.e.,

\[
\begin{array}{c}
T1 & T2 & T3 \\
\text{send(item);} & \text{item = get();} & \text{send(item);} \\
\end{array}
\]

What are the possible outputs if the monitor uses the Hoare implementation? Please specify from which thread does an output come by specifying the thread id in front of the output line, e.g., [T1] before signal or [T2] after wait.

\[
\begin{align*}
[T1] \text{ before signal} & \quad [T3] \text{ before signal} \\
[T1] \text{ after signal} & \quad [T3] \text{ after signal} \\
[T3] \text{ before signal} & \quad [T1] \text{ before signal} \\
[T3] \text{ after signal} & \quad [T1] \text{ after signal} \\
\end{align*}
\]

- 1 point for each of the above ones.

\[
\begin{align*}
[T2] \text{ before wait} & \quad [T2] \text{ before wait} \\
[T1] \text{ before signal} & \quad [T3] \text{ before signal} \\
[T2] \text{ after wait} & \quad [T2] \text{ after wait} \\
[T1] \text{ after signal} & \quad [T3] \text{ after signal} \\
[T3] \text{ before signal} & \quad [T1] \text{ before signal} \\
[T3] \text{ after signal} & \quad [T1] \text{ after signal} \\
\end{align*}
\]

- 2 point for each of the above ones.
- --1 for each answer beyond four.
3. (15 points) Design tradeoffs (15 points total):

You’ve been hired by Orange Computer to help design a new processor and Orange Pro laptop. After choosing the display, case, and other components, you are left with $460 to spend on the following components:

<table>
<thead>
<tr>
<th>Item</th>
<th>Latency</th>
<th>Minimum Size</th>
<th>Cost</th>
</tr>
</thead>
<tbody>
<tr>
<td>TLB</td>
<td>10 ns</td>
<td>256 entries</td>
<td>$0.10/entry</td>
</tr>
<tr>
<td>Main memory</td>
<td>180 ns</td>
<td>2 GB</td>
<td>$10/GB</td>
</tr>
<tr>
<td>Magnetic Disk</td>
<td>8 ms (8M ns)</td>
<td>300 GB</td>
<td>$0.10/GB</td>
</tr>
</tbody>
</table>

The page size is fixed at 64 KB. Assume you want to run up to 20 applications simultaneously. Each application has an overall maximum size of 1 GB and a working set size of 256 MB. TLB entries do not have Process Identifiers. Discuss how you would divide the available funds across the various items to optimize performance.

We start with the disk. Since the disk is the slowest component of the system, we take the minimum size, 300 GB or $30, leaving us with $430. Since the TLB does not contain process identifiers, we only need the minimum number of entries to map the working set for a single process – 256 MB / 64 KB = 4,096 entries or $409.60, leaving us with $21.40. With the remaining money, we could buy 2 GB. However, since the maximum number of applications is 20 each with a working set size of 256 MB, we should provide 5 GB of RAM ($50) to avoid paging, so we should spend $380 on 3,800 entries. While this will cause TLB misses, it does not make sense to increase the TLB any more, since that would require that we decrease the memory size below the requirements for the applications; a situation that will cause the system to start paging.

We awarded 3 points per choice, based upon the reasonableness of the choices. We used the following table:

<table>
<thead>
<tr>
<th>Item / Points</th>
<th>3 points</th>
<th>2 points</th>
<th>1 point</th>
<th>0 points</th>
</tr>
</thead>
<tbody>
<tr>
<td>(a) TLB</td>
<td>3,800 entries</td>
<td>&gt; 3,800</td>
<td>&lt; 3,800</td>
<td>&lt; 256</td>
</tr>
<tr>
<td>(b) Memory</td>
<td>5 GB</td>
<td>&gt; 5 GB</td>
<td>&lt; 5 GB</td>
<td>&lt; 2 GB</td>
</tr>
<tr>
<td>(c) Disk</td>
<td>300 GB</td>
<td>&gt; 300GB, if using extra money for disk instead of memory or TLB</td>
<td>&gt; 300GB</td>
<td>&lt; 300GB</td>
</tr>
</tbody>
</table>

(d) We awarded another 3 points if the TLB answer was based upon an analysis of a single application's working set size (i.e., since only mapping the working set matters and the TLB does not include process identifiers).
(e) We awarded an additional three points based upon the overall reasonableness of the answer. For example, a system with a small amount of paging would lose a point, while a system with a significant amount of paging (e.g., only 2 GB of memory), would lose two points.
4. (20 points) Concurrency control: Consider the following pseudocode that aims to implement a solution for the Dining Philosopher problem. Note that a philosopher can use any chopstick.

```java
// assume chopstick[i].status = FREE, for 1 <= i <= N
get_chopstick(boolean hold_one_chopstick) {
    lock.acquire();
    for (i = 1; i <= N; i++) {
        if (chopstick[i].status == FREE) {
            chopstick[i].status = BUSY;
            return i;
        }
    }
    lock.release();
    return -1;
}

release_chopstick(i) {
    if (i == -1) return;
    chopstick[i].status = FREE;
}

philosopher() {
    plate = FULL;
    while (plate == FULL) {
        chopstick1 = get_chopstick(FALSE);
        if (chopstick1 != -1) {
            chopstick2 = get_chopstick(TRUE);
            plate = EMPTY;
            release_chopstick(chopstick2);
        }
        release_chopstick(chopstick1);
    }
}

main() {
    for (i = 1; i <= N; i++) {
        thread_fork(philosopher());
    }
}
```

a. (2 points) Name an error in how synchronization primitives are used in `get_chopstick()`

*There is a missing `lock.release()` before return i.*
b. (10 points) After fixing the error in part (a), does the program work correctly? If it
does not, give a simple example to show how the program fails, and provide a fix.
If it does, use no more than three sentences to argue why it works.

Is not guaranteed to work. Every philosopher can get a chopstick, fail to get the
second one, release the chopstick they hold, and repeat!

- We gave 5 points for an example.

Add code to get_chopstick() to not give the last chopstick to a philosopher
if that philosopher doesn’t already own a chopstick. For example:

```c
    cnt = 0;
    for (i = 1; i < N; i++) {
        if (chopstick[i].status == FREE) {
            cnt++;
        }
    }
    if (cnt == 1 && hold_one_chopstick == FALSE) {
        return -1;
    }
```

- We subtracted 3 points if you did not give the above solution.
- We subtracted 1 point if you said it in words, but not give the code.
- We subtracted 1 point if your solution was to use mutex around the entire
  body of philosopher(), as we considered this solution to do
  “excessive locking”.

In addition, you need to check for chopstick2 in philosopher(), i.e.,
if (chopstick2 != -1) { plate = EMPTY; }

- We subtracted 2 point if you did not provide the above fix.

(We also need to protect the body of release_chopstick(),
by lock.acquire() and lock.release(). However, we did not
subtract any points for this.)

c. (8 points) Assume main() launches N+1 philosopher threads, instead of N. Will
the program work correctly given the changes you made for parts (a) and (b)? If it
does not, give a simple example to show how the program fails, and provide a fix.
If it does, use no more than three sentences to argue why it works.

No change needed, as the modified code in (b) will guarantee that the last chopstick
will always be picked by a philosopher that already has another chopstick.
5. (15 points total) Scheduling.
   a. (15 points) Consider the following processes, arrival times, and CPU processing requirements:

<table>
<thead>
<tr>
<th>Process Name</th>
<th>Arrival Time</th>
<th>Processing Time</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>0</td>
<td>3</td>
</tr>
<tr>
<td>2</td>
<td>1</td>
<td>5</td>
</tr>
<tr>
<td>3</td>
<td>3</td>
<td>2</td>
</tr>
<tr>
<td>4</td>
<td>9</td>
<td>2</td>
</tr>
</tbody>
</table>

   For each scheduling algorithm, fill in the table with the process that is running on the CPU (for timeslice-based algorithms, assume a 1 unit timeslice). For RR and SRTF, assume that an arriving thread is run at the beginning of its arrival time, if the scheduling policy allows it. The turnaround time is defined as the time a process takes to complete after it arrives.

<table>
<thead>
<tr>
<th>Time</th>
<th>FIFO</th>
<th>RR</th>
<th>SRTF</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>1</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td>1</td>
<td>1</td>
<td>2</td>
<td>1</td>
</tr>
<tr>
<td>2</td>
<td>1</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td>3</td>
<td>2</td>
<td>3</td>
<td>3</td>
</tr>
<tr>
<td>4</td>
<td>2</td>
<td>2</td>
<td>3</td>
</tr>
<tr>
<td>5</td>
<td>2</td>
<td>1</td>
<td>2</td>
</tr>
<tr>
<td>6</td>
<td>2</td>
<td>3</td>
<td>2</td>
</tr>
<tr>
<td>7</td>
<td>2</td>
<td>2</td>
<td>2</td>
</tr>
<tr>
<td>8</td>
<td>3</td>
<td>2</td>
<td>2</td>
</tr>
<tr>
<td>9</td>
<td>3</td>
<td>4</td>
<td>2</td>
</tr>
<tr>
<td>10</td>
<td>4</td>
<td>2</td>
<td>4</td>
</tr>
<tr>
<td>11</td>
<td>4</td>
<td>4</td>
<td>4</td>
</tr>
<tr>
<td>Average Turnaround Time</td>
<td>(\frac{3+7+7+3}{4} = \frac{13}{4} = 3.25)</td>
<td>(\frac{6+10+4+3}{4} = \frac{23}{4} = 5.75)</td>
<td>(\frac{3+2+9+3}{4} = \frac{17}{4} = 4.25)</td>
</tr>
</tbody>
</table>

   Each column is worth 5 points: 3 for correctness of the schedule (we deducted 1/2/3 points if you made minor/intermediate/major mistakes), and 2 for the average Turnaround time (1 point was deducted for minor errors).
6. (10 points total) Caching: Assume a computer system employing a cache, where the access time to the main memory is 100 ns, and the access time to the cache is 20 ns.

   a. (2 points) Assume the cache hit rate is 95%. What is the average access time?

   \[
   \text{Average Access Time} = \text{Hit} \cdot \text{cache\_access\_time} + (1-\text{Hit}) \cdot \text{memory\_access\_time} \\
   = 0.95 \cdot 20\text{ ns} + 0.05 \cdot 100\text{ ns} = 24\text{ ns}
   \]

   Alternatively, we accepted solutions that included the cache time in the memory access time: 
   \[
   \text{AAT} = 0.95 \cdot 20\text{ ns} + 0.05 \cdot (20\text{ ns} + 100\text{ ns}) = 25\text{ ns}.
   \]

   We subtracted one point for minor errors.

   b. (2 points) Assume the system implements virtual memory using a two-level page table with no TLB, and assume the CPU loads a word X from main memory. Assume the cache hit rate for the page entries as well as for the data in memory is 95%. What is the average time it takes to load X?

   The Average Memory Access Time for X (AMAT) requires three memory accesses, two for each page entry, and one for reading X: \(3 \times 24 = 72\text{ ns}\). The alternate solution from (a) yields \(3 \times 25 = 75\text{ ns}\). We only accepted the alternate solution for (b) if you derived the same value for (a).

   c. (3 points) Assume the same setting as in point (b), but now assume that page translation is cached in the TLB (the TLB hit rate is 98%), and the access time to the TLB is 16 ns. What is the average access time to X?

   \[
   \text{AAT}_X = \text{TLB\_hit} \cdot (\text{TLB\_access\_time} + \text{AAT}) + (1-\text{TLB\_hit}) \cdot (3 \cdot \text{AAT}) \\
   = 0.98 \cdot (16\text{ ns} + 24\text{ ns}) + 0.02 \cdot (72\text{ ns}) = 0.98 \cdot 40\text{ ns} + 1.44\text{ ns} = 40.64\text{ ns}
   \]

   Alternate AAT from (a): \(0.98 \cdot (16\text{ ns} + 25\text{ ns}) + 0.02 \cdot (75\text{ ns}) = 41.68\text{ ns}\)

   It was acceptable to include the TLB time in the TLB miss calculation:

   \[
   \text{TLB\_hit} \cdot (\text{TLB\_time} + \text{AMAT}) + (1-\text{TLB\_hit}) \cdot (3 \cdot \text{AMAT} + \text{TLB\_time}) \\
   = 0.98 \cdot (16\text{ ns} + 24\text{ ns}) + 0.02 \cdot (72\text{ ns} + 16\text{ ns}) = 0.98 \cdot 40\text{ ns} + 0.02 \cdot 88\text{ ns} = 40.96\text{ ns}
   \]

   Alternate AAT from (a): \(0.98 \cdot (16\text{ ns} + 25\text{ ns}) + 0.02 \cdot (75\text{ ns} + 16\text{ ns}) = 42\text{ ns}\)

   We subtracted one point for each minor error.

   d. (3 points) Assume we increase the cache size. Is it possible that this increase to lead to a decrease in the cache hit rate? Use no more than three sentences to explain your answer.

   Yes, using a FIFO replacement scheme could result in Belady’s anomaly. Also, using the same hash function while increasing the cache size could cause more collisions and reduce the hit rate. The correct answer was worth one point and the justification was worth two points.
This page intentionally left blank

Do not write answers on this page