Upload
ray-tan
View
19
Download
0
Embed Size (px)
DESCRIPTION
rfrfrfrfrrfrfrfrfr
Citation preview
5/26/2018 Assign 04 Ans
1/3
CS2100 (AY2013/2014 Semester 2) - 1 of 3 - Assignment #4 Answers
CS2100 (AY2013/2014 Semester 2)
Assignment #4 Answers
You are to do this assignment on your own. (Students found copying will be penalised.) Fill in
your nameand tutorial groupin the box above, and your answers in the space indicated below.
Working is not required.
Submit this assignment before 17 April 2014, Thursday, 5pm into the IVLE workbin. Late
submission will not be accepted. Please name your file that includes your matriculation number
(eg: A0071234X.doc or A0071234X.pdf).
1. [15 marks] Fill in the timing charts below with the given processor characteristics.
Each instruction is worth 1 mark. No partial marking, i.e. you need to get all stages for
an instruction correct. The timing charts are independent from one another. Parts (b),
(c), and (d) assume full data forwarding paths.
a. All data forwarding paths are implemented for RAW data hazards.add $1, $2, $3 F D E M Wsub $2, $3, $1 F D E M W
lw $1, 16($2) F D E M Wsub $2, $3, $1 F D STL E M W
b. The early branching is implemented. Note: BTA refers to the instruction at thebranch target address.
beq $1, $2, there F D E M W STL F D E M W
lw $3, 20($1) F D E M Wbeq $1, $2, there F D E M W STL F D E M W
add $1, $2, $3 F D E M Wbeq $1, $2, there F STL D E M W STL F D E M W
c. Branch-Not-Taken prediction is used, but there is no early branching.Suppose the branch is actually taken, show the remaining of the timing chart. You
will need to fill in the right instruction(s) as well for this part. Use B+1, B+2
etc. to refer to instructions after the branch in program order. Use BTA to refer
to instructions at the branch target. Use FLS to indicate pipeline flushing.
beq $1, $2, there F D E M WB+1 F D E FLSB+2 F D FLSB+3 F FLS
BTA F D E M W
ANSWERS
For tutors only! Do not
reveal to students.
5/26/2018 Assign 04 Ans
2/3
CS2100 (AY2013/2014 Semester 2) - 2 of 3 - Assignment #4 Answers
d. Delayed branch is used, but there is no early branching.Supposed the branch is taken, show the remaining of the timing chart. Similar to
part (c), youll need to fill in the rest of the instructions using the same notations.
You only need to show up to the BTA instruction.
beq $1, $2, there F D E M WB+1 F D E M WB+2 F D E M WB+3 F D E M WBTA F D E M W
2. [5 marks] (Adapted from AY11/12 Exam) In computer, colour can be represented
using the CMYK model. In this model, a colour is represented by 4 values
representing the saturation of the four principle colours: Cyan, Magenta, Yellowand
Black.
Suppose we stored the CMYK values of 16 colours as separate 32-bit integers in an
array of size 64. So, the first four array elements (A[03]) represent the CMYK
values for the first colour, the second set of four array elements (A[4.7]) represents
the second colour, etc, as illustrated below:
A[0] Cyan value
A[1] Magenta value
A[2] Yellow value
A[3] Black value
A[4] Cyan value
A[5] Magenta value. .
Consider the following 2 code fragments X and Y in some C-like high level
programming language:
Code X:
//Each "int" is 32-bit
int A[64] = { ......... };
//Cyan values, A[0], A[4], ...
for (i = 0; i < 64; i = i + 4)Change A[i]
//Magenta values, A[1], A[5], ...
for (i = 1; i < 64; i = i + 4)Change A[i]
//Yellow values, A[2], A[6], ...for (i = 2; i < 64; i = i + 4)
Change A[i]
//Black values, A[3], A[7], ...for (i = 3; i < 64; i = i + 4)
Change A[i]
Code Y:
//Each "int" is 32-bitint A[64] = { ......... };
//Go through 16 colours
for (i = 0; i < 16; i = i + 1) {
for (j = 0; j < 4; j = j + 1) {//Go through the 4 values CMYK
Change A[i*4+j]}
}
Colour 1
5/26/2018 Assign 04 Ans
3/3
CS2100 (AY2013/2014 Semester 2) - 3 of 3 - Assignment #4 Answers
For simplicity, the base of array A is assumed to be in memory location 0x0. You
may also ignore the impact of variable i on cache access in the following
questions.
a. (Code X) Given a tiny direct mapped cache with 2 blocks of 8 bytes each. Give atally of the following information: The number of cold/compulsory cache misses,
and the number of conflict misses.
Cold Misses = ____32__________ Conflict Misses = _______32______
b. (Code Y) Given a tiny direct mapped cache with 2 blocks of 8 bytes each. Give atally of the following information: The number of cold/compulsory cache misses,
and the number of conflict misses.
Cold Misses = ______32________ Conflict Misses = ______0_______
c. Does a 2-way set associative cache with 4 blocks of 8 bytes each improve theperformance of (a) or (b)? Why?
No.
The increase in associativity can help to reduce conflict miss only if we reuse
the block in time.
For code X, a block is reused only after a long cycle (e.g. block 0 is reused
only after all even blocks from 2 to 30 are used). By then, block 0 would havebeen evicted long ago.
For code Y, the blocks are not reused at all. So, the additional associativity is
not useful.