Upload
trandat
View
262
Download
6
Embed Size (px)
Citation preview
German University in Cairo Faculty of Media Engineering and Technology CSEN 601 Computer Architecture Dr. Cherif Salama August 18th, 2014
Page 1
CSEN 601 Computer Architecture
Final Exam Solution
Instructions: Please Read Carefully Before Proceeding.
1. The allowed time for this exam is 3 hours (180 minutes). 2. This exam includes a 5% bonus 3. Non-‐programmable calculators are allowed. 4. No books or other aids are permitted for this test. 5. This exam booklet contains 11 pages, including this one. An appendix sheet and an extra sheet of
scratch paper are attached and have to be kept attached. Note that if one or more pages are missing, you will lose their points. Thus, you must check that your exam booklet is complete.
6. Please write your solutions in the space provided. If you need more space, please use the back of the sheet containing the problem or the extra sheet and make an arrow indicating that.
7. When you are told that time is up, please stop working on the test.
All the best.
Please, do not write anything on this page.
Question 1 2 3 4 Total
Maximum Marks 20 30 40 15 100
Earned Marks 20 30 40 15 105
Bar Code
German University in Cairo Faculty of Media Engineering and Technology CSEN 601 Computer Architecture Dr. Cherif Salama August 18th, 2014
Page 2
Question 1 (20 marks) [20 marks] Mark each of the following statements as true (! ) or false (" ). Unless otherwise stated, assume that all statements are in the context of the MIPS architecture.
# Statement Answer 1 In MIPS it is possible to carry out arithmetic operations on memory operands
without loading them into registers first "
2 The latest Intel processors are backward compatible with older ones ! 3 In the single cycle implementation, the LW instruction is the longest
instruction and as such it is the one responsible for determining the clock period
!
4 In the MIPS pipeline, both reading and writing the register file can be done during half a clock cycle period. In particular reading is done in the first half while writing is done in the second half
"
5 There are 3 types of pipeline hazards: structure, data, and control hazards ! 6 Load-‐use hazards cannot benefit from forwarding at all " 7 A forwarding unit is responsible for forwarding the appropriate operands
when a data dependency that can be solved with forwarding is detected !
8 A double data hazard occurs when two consecutive instructions are modifying the same register that is used by the instruction immediately after them. In this case forwarding of the source operand should be done from the output of the MEM stage
"
9 A two-‐bit predictor performs better than a single bit predictor even if the conditional branch is executed exactly once
"
10 The two-‐bit predictor is an example of a dynamic branch predictor ! 11 A restartable exception is an exception that is normally handled by restarting
the computer "
12 Precise exceptions is the name given to an exception handling mechanism that guarantees that when multiple exceptions arise in the same clock cycle, the exception from the earliest instruction is handled first
!
13 A 2-‐way superscalar processor has a peak IPC of 0.5 " 14 Superscalar processors are processors having deeper pipelines (i.e.; pipelines
with more stages) "
15 Dynamic (out-‐of-‐order) superscalar processors do not guarantee that instructions are written back in strict program order
"
16 In case of memory-‐mapped I/O, special instructions are used to access I/O registers
"
17 Availability can be defined as MTTF/MTBF ! 18 Flash memory is faster than magnetic storage due to better latency ! 19 RAID systems are used to improve performance and availability ! 20 RAID level 1 does not offer any redundancy at all "
German University in Cairo Faculty of Media Engineering and Technology CSEN 601 Computer Architecture Dr. Cherif Salama August 18th, 2014
Page 3
Question 2 (10+10+10=30 marks) Assume that A is an 8-‐integer array whose starting address is in register $s0. The values 3 to 10 are stored in A in ascending order. Assume that B is another 8-‐integer array whose starting address is in register $s1. $s0 and $s1 contain the values 150 and 170 respectively. Consider the following MIPS code: add $t0, $s0, $zero addi $t1, $zero, 7 sll $t1, $t1, 2 add $t1, $t1, $s1 L1: lw $t2, 0($t0) sw $t2, 0($t1) addi $t0, $t0, 4 addi $t1, $t1, -4 slt $t3, $t1, $s1 beq $t3, $zero, L1 2.1 [10 marks] Answer the following questions:
a. What are the contents of $t0, $t1, $t2, $t3 at the end of the program above? b. What are the contents of arrays A and B at the end of the program? c. How many instructions where executed in total? d. Does the code above correspond to accessing arrays using indices or using pointers?
Solution:
a. $t0 = 182, $t1 = 166, $t2 = 8, $t3 = 1 b. Only the last element of Array A will change from 10 to 8. It contains the values 3 to 9 in
ascending order followed by the value 8, while B will contain the value 8, followed by the values 9 to 3 in descending order
c. In total 6 x 8 + 4 = 52 instructions are executed in total d. It corresponds to accessing arrays using pointers
German University in Cairo Faculty of Media Engineering and Technology CSEN 601 Computer Architecture Dr. Cherif Salama August 18th, 2014
Page 4
2.2 [10 marks] Based on the assembly code above, how many times is the beq instruction executed? How many time is the branch taken? If a 1-‐bit branch predictor is used to predict the branch outcome, what is the misprediction percentage? Assume that the beq instruction is initially predicted as taken.
What if a 2-‐bit branch predictor is used? In this case assume that the beq instruction is initially considered to be in the weakly-‐taken state (the weakly-‐taken state is the taken state that is directly
connected to a not taken state). List 3 static prediction techniques and state the misprediction percentage of the beq instruction above for each of them.
Solution: The beq instruction is executed 8 times. The beq instruction causes the branch to be taken the first 7 times it is executed and not taken only in the last time. If a 1-‐bit branch predictor is used with “taken” as the initial prediction, the branch is predicted correctly 7 times and incorrectly only once. The misprediction percentage is 12.5%. The misprediction percentage does not change if a 2-‐bit branch predictor is used with “weakly-‐taken” as the initial prediction. For the static prediction techniques:
1. Always taken: In this case the misprediction percentage is 12.5%. 2. Always not-‐taken: In this case the misprediction percentage is 87.5%. 3. Based on offset sign (backward branches always taken, forward branches always not taken): In
this case the misprediction percentage is 12.5% (the beq instruction is a backward branch). 2.3 [10 marks] Translate the assembly code above into machine code (binary) knowing that the opcodes for add, addi, sll, lw, sw, slt, and beq are 0, 8, 0, 35, 43, 0, and 4 respectively and that the function code for add, sll, and slt are 32, 0, and 42 respectively. It is enough to translate one instruction for each opcode (e.g., there is no need to translate all 3 addi instructions, one addi instruction is enough as shown in the solution table below)
Solution: Line# Instruction Equivalent Machine Code
1 add $t0, $s0, $zero 000000 10000 00000 01000 00000 100000
2 addi $t1, $zero, 7 001000 00000 01001 0000000000000111
3 sll $t1, $t1, 2 000000 00000 01001 01001 00010 000000
5 L1: lw $t2, 0($t0) 100011 01000 01010 0000000000000000
6 sw $t2, 0($t1) 101011 01001 01010 0000000000000000
9 slt $t3, $t1, $s1 000000 01001 10001 01011 00000 101010
10 beq $t3, $zero, L1 000100 01011 00000 1111111111111010
German University in Cairo Faculty of Media Engineering and Technology CSEN 601 Computer Architecture Dr. Cherif Salama August 18th, 2014
Page 5
Question 3 (10+5+5+13+7=40 marks) 3.1 [10 marks] The following sequence is called the Fibonacci sequence: 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, … It is a sequence that starts with F0 = 0 and F1 = 1. Every number after that is the sum of the previous two numbers before it (i.e., FN = FN-‐1 + FN-‐2). Write a recursive MIPS procedure to compute FN using MIPS assembly. The procedure takes a single integer parameter representing the value of N from the caller and it should return the result to it via standard registers. Your code must compute FN without using loops.
Solution: FIB: addi $t0, $zero, 0 # F0 <- 0 (use $t0 as F0) addi $t1, $zero, 1 # F1 <- 1 (use $t1 as F1) beq $a0, $t0, L1 # check if N (in $a0) equals 0 beq $a0, $t1, L2 # check if N equals 1 addi $sp, $sp, -12 # prepare the stack for pushing sw $ra, 8($sp) # push $ra sw $a0, 4($sp) # push N sw $s0, 0($sp) # save $s0 addi $a0, $a0, -1 jal FIB # call fib of (N-1) addi $s0, $v0, 0 # save returned value in $s0 addi $a0, $a0, -1 jal FIB # call fib of (N-2) add $v0, $v0, $s0 # FN <- FN-1 + FN-2 lw $s0, 0($sp) # restore $s0 lw $a0, 4($sp) # pull N lw $ra, 8($sp) # pull $ra addi $sp, $sp, 12 jr $ra L1: addi $v0, $zero, 0 jr $ra L2: addi $v0, $zero, 1 jr $ra
German University in Cairo Faculty of Media Engineering and Technology CSEN 601 Computer Architecture Dr. Cherif Salama August 18th, 2014
Page 6
3.2 [5 marks] Which of the instructions listed in Appendix A are not supported by the datapath displayed in Appendix B?
Solution: The following instructions are not supported by the datapath: lh, lhu, lb, lbu, sb, lui, j, jal, jr 3.3 [5 marks] What are the values of the control signals (except ALUOp) for each of the following instructions: add, addi, lw, sw, beq?
Solution:
RegDest Branch MemRead MemtoReg MemWrite ALUSrc RegWrite
add 1 0 0 0 0 0 1
addi 0 0 0 0 0 1 1
lw 0 0 1 1 0 1 1
sw X 0 0 X 1 1 0
beq X 1 0 X 0 0 0
3.4 [13 marks] Assume a 5-‐stage pipelined MIPS implementation and consider the following instruction sequence:
addi $s0, $s0, 20 lw $s1, 0($s0) add $s2, $s0, $s1 add $s3, $s3, $s0 add $s3, $s1, $s4
a. Find all the data dependencies and their types. b. Assuming the sequence of instructions is executed correctly, that the initial values of $s0, $s1,
$s2, $s3, and $s4 are 1000, 200, 50, 7, and 9 respectively and that memory locations 1000 and 1020 contains the values 300 and 150 respectively, what will be the final values of $s0, $s1, $s2, and $s3?
c. List the hazards assuming there is neither forwarding nor hazard detection units. What we be the final values of $s0, $s1, $s2, and $s3 in this case?
d. Add nop instructions to eliminate the hazards in the previous case. e. Assume there is full forwarding. Indicate hazards and add nop instructions to eliminate them. f. Assuming a clock period of 100 ps, what is the total time to execute this instruction sequence
correctly without forwarding? What is the total time in case of full forwarding? What is the
German University in Cairo Faculty of Media Engineering and Technology CSEN 601 Computer Architecture Dr. Cherif Salama August 18th, 2014
Page 7
speed-‐up achieved by adding full forwarding? Assume that the clock period is increased by 10% in case of full forwarding.
Solution:
a. RAW dependencies: $s0 between I1 and I2, $s0 between I1 and I3, $s0 between I1 and I4, $s1 between I2 and I3, $s1 between I2 and I5 WAW dependencies: $s3 between I4 and I5 WAR dependencies: $s3 between I4 and I5
b. $s0 = 1020, $s1 = 150, $s2 = 1170, and $s3 = 159 c. All the RAW dependencies above are hazards. $s0 = 1020, $s1 = 300, $s2 = 1200, and $s3 = 309 d.
addi $s0, $s0, 20 nop nop lw $s1, 0($s0) nop nop add $s2, $s0, $s1 add $s3, $s3, $s0 add $s3, $s1, $s4
e. In case of full forwarding the only hazard remaining is the $s1 RAW hazard between I2 and I3. addi $s0, $s0, 20 lw $s1, 0($s0) nop add $s2, $s0, $s1 add $s3, $s3, $s0 add $s3, $s1, $s4
f. Without forwarding, the total time = 13 * 100 ps = 1300 ps With full forwarding, the total time = 10 * 110 ps = 1100 ps The speedup achieved by adding full forwarding = 1300 / 1100 = 1.18
German University in Cairo Faculty of Media Engineering and Technology CSEN 601 Computer Architecture Dr. Cherif Salama August 18th, 2014
Page 8
3.5 [7 marks] For the instruction sequence shown in 3.4:
a. What is the total time to execute the same sequence using a single cycle implementation with a clock period of 500 ps? What is the speedup of pipelining (with full forwarding) with respect to the non-‐pipelined implementation?
b. Assuming the above sequence is executed a huge number of times, what is the speedup (in terms of IPC) achieved when going from a 1-‐issue (pipelined with full forwarding) processor to a 3-‐issue statically scheduled processor without any restrictions on the type of instructions that can grouped in each instruction packet. Assume that the compiler is free to rearrange instructions in case of the 3-‐issue processor. Show the compiler schedule you assumed.
Solution:
a. Without pipelining (single cycle), the total time = 5*500 ps = 2500 ps
The speedup achieved by pipelining with full forwarding = 2500 / 1100 = 2.27 b. In case of the 1-‐issue processor the IPC would be 0.833 (since the above sequence has 5
instructions and requires 6 clock cycles). In case of the 3-‐issue processor, the compiler could schedule the instructions as follows:
Instruction 1 Instruction 2 Instruction 3 addi $s0, $s0, 20 nop nop lw $s1, 0($s0) add $s3, $s3, $s0 nop
nop nop nop add $s2, $s0, $s1 add $s3, $s1, $s4 nop
In this case the IPC would be: 5/4 = 1.25. This indicates in a speedup of 1.5.
German University in Cairo Faculty of Media Engineering and Technology CSEN 601 Computer Architecture Dr. Cherif Salama August 18th, 2014
Page 9
Question 4 (8+7=15 marks) 4.1 [8 marks] Consider a 100 MB/s magnetic hard drive having 4 double-‐sided platters (disks). The platters rotate at 7200 rpm. Each platter’s surface is divided into 16000 tracks and each track is divided into 200 sectors (blocks). The storage capacity of each sector is 4096 bytes. The actual average seek time
is 4ms. Assume the controller overhead is zero and that the disk is initially idle. Answer the following questions:
a. What is the total storage capacity of this hard drive? b. How much time does it take to read a 4 MB file whose blocks are scattered randomly on the
disk? c. Defragmentation is the process of re-‐arranging file blocks to occupy consecutive physical blocks
instead of being scattered. What is the speed-‐up achieved when reading the same 4 MB file after defragmentation?
Solution:
a. Total capacity = 4 x 2 x 16000 x 200 x 4096 bytes = 97.656 GBytes b. Time to read one 4096 block = average actual seek time (4 ms) + average rotational latency (½ /
(7200/60) = 4.166 ms) + transfer time (4096/100 MB/s = 0.04096 ms) = 8.2 ms. The 4 MB file has (4 * 1024 * 1024 / 4096) = 1024 blocks. Therefore the time needed to read the entire file is 8.2 x 1024 ms = 8.4 seconds.
c. When reading the defragmented 4 MB file, seek and rotation happen only once. The total time in this case would be seek (4 ms) + rotation (4.166 ms) + transfer (41.943 ms) = 50.1 ms. The speedup is 8400/50.1 = 167.66
German University in Cairo Faculty of Media Engineering and Technology CSEN 601 Computer Architecture Dr. Cherif Salama August 18th, 2014
Page 10
4.2 [7 marks] What does the acronym RAID stand for? List different RAID levels you studied with a brief description of each. If we have 4 equally-‐sized disks full of data that we would like to backup to a RAID
system using disks of the same size, how many disks would be needed for each RAID level? For RAID level 2, assume that we need a 4-‐bit ECC code for each 10-‐bits.
Solution:
RAID stands for Redundant Array of Inexpensive/Independent Disks. RAID levels:
• RAID 0: No redundancy, just stripe data at block level over multiple disks. Requires 4 disks. • RAID 1: Mirroring: Data is written to disk and mirror disk. Requires 8 disks. • RAID 2: Split data at bit level across disks and generate E-‐bit ECC for each N bits. Requires
4+ceil(1.6) = 6 disks • RAID 3: Data striped across disks at byte level and stores parity on redundant disk. Requires 5
disks. • RAID 4: Data striped across disks at block level and stores parity on redundant disk. Requires 5
disks. • RAID 5: Like RAID 4 except that parity blocks distributed across all disks. Requires 5 disks. • RAID 6: Like RAID 5 except that it has double the parity blocks. Requires 6 disks. • Multiple RAID: More advanced systems combining various RAID levels to improve performance
German University in Cairo Faculty of Media Engineering and Technology CSEN 601 Computer Architecture Dr. Cherif Salama August 18th, 2014
Page 11
Appendix A: MIPS instruction set architecture
• Instructions o Arithmetic: add, addi, sub, mul o Load/Store: lw, lh, lhu, lb, lbu, sw, sh, sb, lui o Logic: sll, srl, and, andi, or, ori, nor o Control flow: beq, bne, j, jal, jr o Comparison: slt, slti, sltu, sltui
• Pseudo-‐instructions o move, blt
• Registers (32 in total) listed in numbering order o $zero, $at, $v0-‐$v1, $a0-‐$a3, $t0-‐t7, $s0-‐s7, $t8-‐$t9,
$k0-‐$k1, $gp, $sp, $fp, and $ra • Instruction formats
o R-‐Format (add, sub, mul, sll, srl, and, or, nor, jr, slt, sltu), I-‐Format (addi, lw, lh, lhu, lb, lbu, sw, sh, sb, lui, andi, ori, beq, bne, slti, sltui), and J-‐Format (j, jal)
Appendix B: MIPS datapath (single cycle implementation)
German University in Cairo Faculty of Media Engineering and Technology CSEN 601 Computer Architecture Dr. Cherif Salama August 18th, 2014
Page 12
Extra sheet