**Assignment 4**

1. **Performance Evaluation**

1.1) Consider a 3GHz clock cycled MIPS system with the following instruction types:

*Load* (5 clock cycles), *Store* (5 clock cycles), *R-type* (2 clock cycles), *Branch* (4 clock cycles), and *Jump* (3 clock cycles).

Assume that a program has 40% R-type instructions, 15% Load instructions, 10% Store instructions, 25% Branch instructions, and 10% Jump instructions. You can ignore any latency impact. Calculate its **CPI**, **CPU time** and **MIPS rate**.

1.2) An executing program is timed, and it is found that the I/O wait consumes **20s** of the time. You believe that by using parallel processors, you can improve the performance by a factor of **10**. What is the speedup of the system?

1.3) Assume that two computers (named A and B) are executing four loops of a scientific program with the following number of clock cycles (shown in **Table 1**). For a particular benchmark program, loop1 is executed 20 times, loop2 is executed 30 times, loop3 is executed 50 times and loop4 is executed 70 times. What is the mean speedup of the four loops (from A to B)?

|  |  |  |
| --- | --- | --- |
| **Loop** | **Comp A.** | **Comp B** |
| 1 | 40 | 30 |
| 2 | 50 | 20 |
| 3 | 30 | 25 |
| 4 | 24 | 15 |

**Table 1**

**2. Processor Data Path**

2.1) Briefly describe the following terms:

2.1.1) System instruction set

2.1.2) Instruction decoding

2.1.3) Reduced Instruction Set Computer (RISC)

2.1.4) Complex Instruction Set Computer (CISC)

2.2) Show the **four state elements** of MIPS system and briefly describe each of them.

2.3) Show MIPS **single cycledata path** for the following instruction set (where **S0** and **S1** are 32-bit general purpose registers and S0 has initialized with a value of zero):

**lw S1, 15(S0)**

2.4) Consider the following MIPS assembly code (where S0, S1, and S6 are 32-bit general purpose registers and S0 has initialized with a value of zero):

i) **addi S0, S0,7**

ii) **lw S6, 6(S0)**

iii) **addi S0, S0,3**

iv) **sw S1, 7(S0)**

2.4.1) Show the **immediate value** (in hexadecimal) in register  **S0** after the execution of the instruction (i).

2.4.2) Show the **physical memory addresses** (in hexadecimal) of the instructions (ii), and (iv).

2.5) Assume that a user on a MIPS system creates the following (C/C++/Java) source code (where i, j, f, g, and h are integer variables):

**if(i == j)**

**f = g + h;**

**else**

**f = f - 1;**

Show its equivalent **assembly code**.

**3. Pipelined Processor System**

3.1) Consider an **un-pipelined** machine with five stages (Instruction Fetch, Instruction Decode/Register Fetch, Execute/Address Calculation, Memory Access and Write Back). Assume that it has **1ns** clock cycles. The machine uses four cycles for ALU operations and branches, and five cycles for memory operations. Assume that the relative frequencies of these operations are 45%, 15% and 40% respectively. **Pipelining the machine** adds **1.5 ns** of overhead to the clock. Find out how much **speedup** we will gain in the instruction execution rate. You can ignore any latency impact.

3.2) Single cycle MIPS system is pipelined by subdividing its data path into **five pipeline stages** (5-stage pipeline). Briefly describe the **pipeline stages** of the processor.

3.3) Consider the following assembly code sequence which is currently running on a 5-stage pipelined MIPS system. Show the pipeline sequence diagram for the given instruction stream. If there any **data hazards** among the instructions are detected, then indicate that in the pipeline sequence diagram with its solution(s) which are taken by the system (where S0, S1, S2, S3, S4, S5, S6, S7, S8, S9 and S10 are 32-bit general purpose registers).

**addi S1, S0, 5**

**add S4, S1, S5**

**xor S10, S3, S4**

**lw S6, 8(S5)**

**or S3, S6, S7**

**and S2, S6, S9**

3.4) What is **structural hazard**? Which are the stages of the MIPS system (5-stage pipelined) prone to the structural hazard(s)? Describe in details.

3.5) Assume that the MIPS system has faced 0.84 clock cycles stall per instruction during its pipelined execution. Find the **speedup**of the system.

**4.** **Memory System**

4.1) “A 32-bit MIPS system needs 5 bits addressing mechanism to accessits **register file**”, point out the reason(s).

4.2) Discuss the importance of **dynamic data segment**memory in MIPS systems (*your description should show how the memory sections of this segment support dynamic data allocation during run-time*).

4.3) Consider the following MIPS instruction (where S0 and S1 are two 32-bit general purpose integer registers):

**sw S1, 15(S0)**

Assume that S0 is initialized with a value of zero and S1 has a 32-bit number, **0xFF223344**. Show the **byte addressable** **memory** of the MIPS system after the execution of the store instruction.

4.4) Consider a **4MB** cache (assume that there is no level2 cache in the system) and a **4GB** main memory (organization: 2G x 16). The size of a main memory block is 128-bit (**eight**, 16-bit words). Assume that the processor can only access a word of 16-bit at time from the cache. Based on the given cache, memory block, and main memory details show the address fields which are used by a processor to access the following cache organizations:

4.4.1) Direct

4.4.2) Fully associative

4.4.3) 4-way set associative

4.5) Describe the following terms under a **read/write** context:

4.5.1) cache hit

4.5.2) cache miss