Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
ee457_Final_Fall2010_r1.fm December 10, 2010 3:28 pmEE457 Final Exam - Fall 2010 1 / 14C Copyright 2010 Gandhi Puvvada
Fall 2010 EE457 Instructor: Gandhi Puvvada Final Exam (30%) Date: 12/10/2010, Friday Closed Book, Closed Notes; Time: 8:00 - 10:45AM SGM123 Calculator and Cadence Verilog Guide allowed Total points: 235 Name: Perfect score: 220 / 235
1 ( 42 points) 25 min.
Pipelining (Lab 7 Part 3 modified):
On the next page you find the original lab 7 Part 3 Block Diagram, provided for your information.On the page after, you find a modified diagram for you to complete.
Mr. Trojan says that what you intend to do at 12:01AM (at the beginning of a clock) can easily be done at 11:59PM of the previous day (at the end of the previous clock, logic wise, assuming timing is not an issue). The original forwarding in EX1 (controlled by FU1, FORW1) is now moved to ID (controlled by FU_ID, FORW_ID). And the original forwarding in EX2 (controlled by FU2, FORW2) is now moved to EX1 (controlled by FU_EX1, FORW_EX1).
These changes in forwarding do not cause any change in(a) HDU or generation of STALL T / F (b) generation of SKIP1 or SKIP2 T / F (c) the internal forwarding logic/mechanism in the register file T / F
Draw the logic for the two new FUs (Forwarding Units).
If you were to code this new design in RTL coding style, among ID, EX1, and EX2, you would code _____________ first, and then ______________, and finally _____________. Assume that the register file is negative-edge triggered and the rest of the system is positive-edge triggered.
In the RTL coding of Lab 7 Part 3, in the main clocked procedural block, we used _______________ ( if(STALL) / if (~STALL)) ________________ (with / without) an else clause.
PRIORITY
FORW_ID
FU_ID
FORW_EX1
FU_EX1
ee457_Final_Fall2010_r1.fm D
ecember 10, 2010 3:28 pmEE457 Final Exam
- Fall 2010 2 / 14C
Copyright 2010 G
andhi Puvvada
FOR REFERENCE ONLY
PC
XA
Reg. File
XA
RA
RDR-Write
0
10
1
0
10
1
A
Cout
A
Cout
Comp Station in ID Stage
ID_XMEX1 ID_XMEX2
P P Q
IF ID EX1 EX2 WBComp Station in ID Stage
Q
ID_XA EX1_RA ID_XA EX2_RA
P=Q P=Q
ID_XMEX1= ID_XA Matched with EX1_RA
XD
HDU
EN
XM
EX1
XM
EX2
A-3 A+4
EN
XM
EX1
FU1
EN
RD
Writ
e
RA
FU2
XD
XD
EX1_ADD4
EX1_SUB3
EX1_ADD1
EX1_RA
PRIORITY0 1
RESET_BRESET_B RESET_BRESET_B
1. Complete all missing connections to the Reg. File. Also complete the RA(Result Addreee) connection in ID stage (ID_RA).2. Complete all five enable (EN) controls on the pipeline registers (including PC).
4. Complete the skip controls(SKIP1,SKIP2).5. Draw the logic for the HDU, FU1, and FU2, producing STALL, PRIORITY, FORW1, FORW2.
EX2_ADD4
EX2_SUB3
EX2_ADD1EX2_RA
WB_RA
WB_Write
WB_RDX1_Mux
R1_Mux X2_Mux
R2_Mux
SKIP
1
SKIP
2
Qualifying signals
Qualifyingsignals
QualifyingSignals
LAB 7 Part 3 Block Diagram
I-MEMEN
RESET_B
PRIORITYEX2_XMEX1
ADD4SUB3STALL
EN
FOR
W1 FO
RW
2
Fig. 1
ADD4
SUB3
AD
D1
RAM
OV
ADD4
SUB3
AD
D1
RA
MO
V
ADD4
SUB3
AD
D1
RA
MO
V
EX1_MOVEX2_MOV
revised 7/18/2010
3. Complete the forwarding path from EX2 to EX1. Should it start from upstream or downstream of the X2_mux?
FOR REFERENCE ONLY
ee457_Final_Fall2010_r1.fm D
ecember 10, 2010 3:28 pmEE457 Final Exam
- Fall 2010 3 / 14C
Copyright 2010 G
andhi Puvvada
COMPLETE THIS
PC
XA
XA
RA
RDR-Write
0
10
1
0
10
1A
Cout
A
Cout
Comp Station in ID Stage
ID_XMEX1 ID_XMEX2
P P Q
IF ID EX1 EX2 WBComp Station in ID Stage
Q
ID_XA EX1_RA ID_XA EX2_RA
P=Q P=Q
ID_XMEX1= ID_XA Matched with EX1_RA
XD
HDU
EN
A-3 A+4
EN
FU_ID
EN
RD
Writ
e
RA
FU_EX1
XD
XD
EX1_ADD4
EX1_SUB3
EX1_ADD1
EX1_RA
PRIO
RIT
Y
0 1
RESET_BRESET_BRESET_B RESET_B
1. Connect/label all missing connections to the Reg. File. Also complete the RA(Result Addreee) connection in ID stage (ID_RA).2. Complete all five enable (EN) controls on the pipeline registers (including PC).
4. Complete the skip controls(SKIP1,SKIP2).5. Draw on a separate paper the logic for the FU_ID, and FU_EX1,
EX2_ADD4
EX2_SUB3
EX2_ADD1EX2_RA
WB_RA
WB_Write
WB_RDXID_MuxR1_Mux
XEX1_Mux
R2_Mux
SKIP
1
SKIP
2
Qualifying signals Qualifyingsignals
QualifyingSignals
LAB 7 Part 3
I-MEMEN
RESET_B
EX1_XMEX
ADD4SUB3STALL
EN
FOR
W_I
D
FOR
W_E
X1
Subpart 1 Fig. modified
ADD4
SUB3
AD
D1
RAM
OV
ADD4
SUB3
AD
D1
RA
MO
V
ADD4
SUB3
AD
D1
RA
MO
V
EX1_MOVEX2_MOV
12/7/2010
3. Complete the forwarding paths into ID. If a path is not needed, write "no connection".
for Fall2010 Final Exam
InternallyForwardingReg. File
ID_RA
producing PRIORITY, FORW_ID, FORW_EX1.
EX2_XD
EX1_
XD
EX1_
XD
_OU
T
EX2_
XD
_OU
T
ID_X
D_O
UT
Write a “1” or “2”
ee457_Final_Fall2010_r1.fm December 10, 2010 3:28 pmEE457 Final Exam - Fall 2010 4 / 14C Copyright 2010 Gandhi Puvvada
2 ( 12 points) 5 min.
RTL coding: Suppose you are asked to write a verilog RTL code using one clocked always procedural block for the control unit (CU)and another clocked procedural block for the datapath unit (DPU).
A. Would you divide the two parts as per the left diagram or the right diagram? Left / Right B. Is it essential to have the RESET control for the CU or the DPU? CU / DPUC. The outputs of OFL will be treated as intermediate variables or final outputs? Intermediate / FinalD. You will be using blocking or non-blocking assignments to produce these OFL outputs? Blocking / Non-blockingE. Is it possible to combine the two clocked always blocks into one single always block? Yes / NoF. If combining is possible, the combined always block __________ (will / will not) have a RESET control signal in the event list (sensitivity list).
3 ( 27 points) 20 min.
Arithmetic (Fast Adders)
3.1 You are taught the following cascadable incrementer which performs R2R1R0 = A2A1A0 + C0.
I0I1 S
Y
I0I1 S
Y X_Reg
Y_Reg
NSLSM
OFL
DPU
CU
Current_State
I0I1 S
Y
I0I1 S
Y X_Reg
Y_Reg
NSLSM
OFL
DPU
CU
Current_State
ap s
cX2
S2
ap s
cX1
S1
ap s
cX0
S0p0p1p2 C2 C1 C0 C0C3 New CLL INC
A2 A1 A0
R2 R1 R0
ap s
cXi
Si
Si = Xi (+) 0 (+) Ci
Incrementing cell
= Xi (+) Ci
pi = Xi + 0 = Xigi = Xi . 0 = 0
p0p1p2 C2 C1 C0 C0C3 New CLL INC
New CLL INC
Since all gi are zeros, C1 = p0 . C0C2 = p1 . p0 . C0C3 = p2 . p1 . p0 . C0
Least significant module’s C0 is tied to a 1.
ee457_Final_Fall2010_r1.fm December 10, 2010 3:28 pmEE457 Final Exam - Fall 2010 5 / 14C Copyright 2010 Gandhi Puvvada
Complete the following cascadable decrementer which performs R2R1R0 = A2A1A0 -1 by adding 111 to subtract a 1 (R2R1R0 = A2A1A0 + 111). Complete the 7 rectangles.
3.2 You have gone through the following solution to a question in an earlier exam.
A variation of the above questions is to add 000_111_000_111 to A11A10A9A8A7A6A5A4A3A2A1A0 and produce the result R11R10R9R8R7R6R5R4R3R2R1R0 .The following design is complete and correct. Let us analyze and try to improve it!
7pts
ag s
cX2
S2
ag s
cX1
S1
ag s
cX0
S0g0g1g2 C2 C1 C0 C0C3 New CLL DEC
A2 A1 A0
R2 R1 R0
ag s
cXi
Si
Si = Xi (+) 1 (+) Ci
decrementing cell
= Xi + Ci
pi = Xi + 1 = gi = Xi . 1 =
g0g1g2 C2 C1 C0 C0C3 New CLL DEC
New CLL DEC
Since all pi are , C1 = C0C2 = C0C3 = C0
Least significant module’s C0 is tied to a . XNOR
9-bit constant addition: You need to add 000_111_000 to A8A7A6A5A4A3A2A1A0 and producethe result R8R7R6R5R4R3R2R1R0 . Mr. Trojan says that you should be able to do it by cascading justone incrementer and one decrementer designed before. Complete the design below.
ap s
cX2
S2
ap s
cX1
S1
ap s
cX0
S0p0p1p2 C2 C1 C0 C0C3 New CLL INC
ag s
cY2
D2
ag s
cY1
D1
ag s
cY0
D0g0g1g2 C2 C1 C0 C0C3 New CLL DEC
A8 A7 A6 A5 A4 A3 A2 A1 A0
R8 R7 R6 R5 R4 R3 R2 R1 R0
C9?
SOLUTION
9pts
ap sc
X2
S2
ap sc
X1
S1
ap sc
X0
S0p0p1p2 C2 C1 C0 C0C3 New CLL INC
ag sc
Y2
D2
ag sc
Y1
D1
ag sc
Y0
D0g0g1g2 C2 C1 C0 C0C3 New CLL DEC
A5 A4 A3 A2 A1 A0
R5 R4 R3 R2 R1 R0C0
C3C6
ap sc
X2
S2
ap sc
X1
S1
ap sc
X0
S0p0p1p2 C2 C1 C0 C0C3 New CLL INC
ag sc
Y2
D2
ag sc
Y1
D1
ag sc
Y0
D0g0g1g2 C2 C1 C0 C0C3 New CLL DEC
A11 A10 A9 A8 A7 A6
R11 R10 R9 R8 R7 R6C6
C9C12
State cumulative delays
C6: _______; C9: _______; C11: _______; R11: _______; C12: _______;
Note: In EE457, we count an XOR or a XNOR as a2-gate-delay device.
C3: _______; in gate-delays
ee457_Final_Fall2010_r1.fm December 10, 2010 3:28 pmEE457 Final Exam - Fall 2010 6 / 14C Copyright 2010 Gandhi Puvvada
Miss Trojan proposed to have Group Propagates (upper case P’s) and Group Generates (upper case G’s) so that she can add a 2nd level CLL and avoid the linear cascade shown above.
Here is Miss Trojan’s proposed design. She wants you to simplify a regular CLL to form the special 2nd level CLL which takes advantage of the specific values of P’s and G’s and overall C0. Note that this design is meant for this specific 12-bit constant addition and it need not be cascadable or extendible.
5pts
p0p1p2 C2 C1 C0 C0C3 New CLL INC
New CLL INCg0g1g2 C2 C1 C0 C0C3 New CLL DEC
New CLL DEC
P GP G
Since the individual gi’s are all _______ (zero/one)the Group G =
And the Group P =
Since the individual pi’s are all _______ (zero/one)the Group P =
And the Group G =
20pts
ag s
cY2
D2
ag s
cY1
D1
ag s
cY0
D0g0g1g2 C2 C1 C0 C0New CLL DEC
A2 A1 A0
R2 R1 R0
Miss Trojan’s Design
C0C3
ap s
cX2
S2
ap s
cX1
S1
ap s
cX0
S0p0p1p2 C2 C1 C0 C0New CLL INC
ag s
cY2
D2
ag s
cY1
D1
ag s
cY0
D0g0g1g2 C2 C1 C0 C0New CLL DEC
A11 A10 A9 A8 A7 A6
R11 R10 R9 R8 R7 R6
C6C9P G P G P G P G
P0 G0P1 G1P2 G2P3 G3 C3C1C2C4
C122nd level CLL specific for this problem
Write equations for C4, C3, C2, C1 for a generic CLL in terms of the9 inputs (C0, P0, G0, P1, G1, P2, G2, P3, G3), and simplify substituting
C1 = G0 + P0.C0 =
C2 =
C3 =
C4 =
ap s
cX2
S2
ap s
cX1
S1
ap s
cX0
S0p0p1p2 C2 C1 C0 C0New CLL INC
A5 A4 A3
R5 R4 R3
C0: _______ (0 / 1 / variable); 0P0: _______ (0 / 1 / variable); G0: _______ (0 / 1 / variable); P1: _______ (0 / 1 / variable); G1: _______ (0 / 1 / variable); P2: _______ (0 / 1 / variable); G2: _______ (0 / 1 / variable); P3: _______ (0 / 1 / variable); G3: _______ (0 / 1 / variable);
constants of 0 or 1 wherever possible. For example C0 = 0.
State delays in gate-delays
C6: _______;
C9: _______;
C11: _______;
R11: _______;
C12: _______;
C3: _______;
Cancelled
ee457_Final_Fall2010_r1.fm December 10, 2010 3:28 pmEE457 Final Exam - Fall 2010 7 / 14C Copyright 2010 Gandhi Puvvada
3.3 In a 256x256 multiplier, to reduce 256 PPs (partial products) in a CSA tree, the number of levels of the CSA needed are approximately(a) log10256 (b) log2256 (c) log1.5256 (d) log1.564 (e) log264 (f) other ___________
The above is ________________ (lower / upper) bound.
The partial CSA tree on the right needs _______ iterations to reduce the 256 PPs to _____ (2 / 1) vector(s) for further processing by the CPA.
4 ( 71 points) 40 min.
4.1 Out of Order (OoO) Execution
4.1.1 "Branch Prediction and speculative execution beyond branches" is only possible in the design on the _______ (left / right) because in the other design, if we dispatch instructions based on prediction, these speculative instructions ____________________________________________ ____________________________________________________________________________ ____________________________________________________________________________ ____________________________________________________________________________
4.1.2 We know that a memory delay of 10ns would mean 10 clocks for a processor running at ___________ (1 GHz / 2 GHz) and the same 10ns would mean 20 clocks if the processor is running at ___________ (1 GHz / 2 GHz). Hence, when increasing the processor frequency from 1 GHz to 2 GHz (without any change in memory speed), you would recommend that the depth of the instruction queues is __________________ (increased / decreased) in the case of ____________________________________________________________________________(state the queue name/names from among Integer queue, Load-Store queue, Divider queue, Multiplier queue).
S3
S4
S5
S2
S1
CPA
6pts
Issue Unit
Int.
Div
ider
63
2
TAG FIFO
Int.
Mul
tiplie
r
Issue Unit
Int.
Div
ider
63
2
63
2
TAG FIFO
Int.
Mul
tiplie
r
3pts
4pts
ee457_Final_Fall2010_r1.fm December 10, 2010 3:28 pmEE457 Final Exam - Fall 2010 8 / 14C Copyright 2010 Gandhi Puvvada
4.1.3 RAW dependency is solved by simply making the reader wait until the writer can forward the information to the reader in the design on _____________ (the left / the right / both sides).
4.1.4 In the short code of 4 lines on the side, you notice that OoO can potentially cause ___________________________________ (RAW/WAR/WAW/multiple of these(state them)) hazards for $8.
In-order writing alone in the design on the _____________ (left / right) eliminates _____________________________ (RAW/WAR/WAW) hazards among register.
Design on the left: Let us say that the dispatch unit assigns a symbolic Tag of LION to the destination register $8 of instr. #1 and a little later assigns TIGER to the destination register $8 of instr. #3. LION is written against $8 in RST first and later it is replaced by TIGER. State how the hazards listed by you for $8 are addressed in the design on the left.____________________________________________________________________________ ____________________________________________________________________________ ____________________________________________________________________________ ____________________________________________________________________________ ____________________________________________________________________________ ____________________________________________________________________________ Design on the right: Let us say that the dispatch unit assigns the ROB Tag of 21 to the destination register $8 of instr. #1 and a little later assigns the ROB Tag of 23 to the destination register $8 of instr. #3. Again state how the hazards listed by you for $8 are addressed in the design on the right.
____________________________________________________________________________ ____________________________________________________________________________ ____________________________________________________________________________ ____________________________________________________________________________ ____________________________________________________________________________
4.1.5 Conditional branches cause more stalls in dispatch in the design on the ___________ (left / right) where as in the design on the ___________ (left / right) more flushes due to branch mispredictions occur.
4.1.6 Flush of ___________________________ (IFQ / Backend / ROB) is same in both the designs due to (circle all applicable items below) (a) mispredicted conditional branches (beq/bne) (b) unconditional jump (j) (c) unconditional jump and link (jal) (d) unconditional program return (jr $31)
4.1.7 It is enough to predict the direction of a conditional branch using BPB, standing for Branch _____________ Buffer, if prediction is done from the ________________ (IF stage / Dispatch
2pts
add $8, $1, $2; instr. #1lw $10, 100($8); instr. #2add $8, $3, $4; instr. #3lw $11, 100($8); instr. #4
4pts
10pts
3pts
3pts
3pts
Cancelled
ee457_Final_Fall2010_r1.fm December 10, 2010 3:28 pmEE457 Final Exam - Fall 2010 9 / 14C Copyright 2010 Gandhi Puvvada
stage), but in the other case, we need BTB, standing for Branch ___________ Buffer to provide the target address.
4.1.8 On the side we have shown three instructions at the PC values 1040H, 2040H, and 3040H. They are distanced by 1000H bytes = 400H words.If BPB and BTB are each 1K deep (210 = 1K = 400H), does it cause aliasing? Where aliasing is unacceptable? ______________________(trying to predict from IF stage / trying to predict from dispatch stage / both / none). Explain briefly. ________________________________ ________________________________________________________ _____________________________________________________________________________ _____________________________________________________________________________ _____________________________________________________________________________ _____________________________________________________________________________ _____________________________________________________________________________ _____________________________________________________________________________
4.2 Exceptions:
4.2.1 Page fault _______________ (needs /does not need) to be associated with an instruction and hence ________ (is / isn’t) a precise exception.
4.2.2 As part of handling a precise exception, we need to (a) tag the offending instruction with its Cause and EPC info. T/F(b) convert the offending instruction and all the following instructions into bubbles. T/F(c) allow all preceding (senior) instructions in process order to complete. T/F(d) be silent and carry the Cause and EPC until the offending instruction reaches WB. T/F
4.2.3 Place a check mark in the stage (or stages) an exception can occur.
4.3 RAS (Return Address stack)
4.3.1 Consider the following 4 types of program control instructions:(a) unconditional branches (example: j) (b) function calls (example: jal)(c) conditional branches (example: beq, bne)(d) function returns (example: jr $31)
RAS provides target address for the __________________ (j/jal/beq/bne/jr $31) instruction(s).
Exception IF ID EX MEM WB
Page Fault
Integer Overflow
Undefined Opcode
Memory Protection Violation
1040: beq $1, $2, 20;. . . . .
2040: add $4, $5, $6;. . . . .
3040: beq $10, $20, 5;
6pts
2pts
4pts
4pts
4pts
ee457_Final_Fall2010_r1.fm December 10, 2010 3:28 pmEE457 Final Exam - Fall 2010 10 / 14C Copyright 2010 Gandhi Puvvada
A PUSH operation on RAS takes place when _________________ (j/jal/beq/bne/jr $31) is/are executed.A POP operation on RAS takes place when ___________________ (j/jal/beq/bne/jr $31) is/are executed.
4.3.2 RAS, being usually _______________ (small / large), can only predict the return address. The prediction can go wrong if the degree of nesting/ the degree of recursion in recursive call _______________________ (exceeds / does not exceed) the depth of the RAS.
4.4 CMP (Chip Multiprocessors) with CMT (Chip multithreading)
4.4.1 ILP (standing for ___________________________ ) has been more or less fully exploited and processor architects have turned to exploit TLP (standing for ___________________________ )
4.4.2 When one thread is switched with another in a multi-threaded core, the register file contents are saved in the main memory. T / F Since the number of alternative register files ______________ (is / isn’t quite) finite and since the number of process control block copies in memory ______________ (is / isn’t quite) finite, the number of threads per a multi-threaded core is _____________________________________, where as the number of processes that can be run on a core using software context switching is generally ___________________________________________________________________.
4.4.3 Functional units such as ALU are ____________ (common / separate) for the 4 threads running on a core.
4.4.4 The stall penalty due to dependency on a load word instruction can usually be avoided in ______ _______________________ (Fine-Grained / Coarse-Grained / both types of / neither type of ) multithreading.
4.4.5 _________ (Fine / Coarse)-grain multithreading switches threads on each instruction where as _________ (fine / coarse)-grain switches threads on costly stalls such as cache misses.
4.4.6 Dynamic power considerations favor ________________________ (Uniprocessors/Multiprocessors).
4.4.7 A _____________________ (non-blocking / blocking) cache handles multiple cache requests, usually as long as they are hits under a pending miss. A CMP (such as Sun Niagara T1) needs to use a ____________________ (non-blocking / blocking) cache to be able to execute multiple threads. A non-blocking cache ________________ (is / isn’t) useful in an OoO executing processor as it is ______________ (possible / not possible) to handle several load/store instructions in the cache.
4.5 Multiprocessors and Cache Coherence:
4.5.1 Snoopy controller, in a ____________________ (write-through /write-back/both/neither) cache-coherence system, does not care to watch read transactions from the other processors [ R(j) ].
3pts
2pts
6pts
1pts
2pts
2pts
1pts
3pts
3pts
3pts
ee457_Final_Fall2010_r1.fm December 10, 2010 3:28 pmEE457 Final Exam - Fall 2010 11 / 14C Copyright 2010 Gandhi Puvvada
Label the following two state diagrams as write-through or write-back.
4.5.2 The "dual" directory in snoopy protocol refers to the duplicated _________ (TAG / DATA/TLB) RAM.
4.5.3 If there is a "Dirty bit" besides a "Valid bit" associated with a cache block, then the designer must be using a ____________________ (write-through /write-back/any of the two/neither) cache.
5 ( 18 points) 10 min. Non-linear pipelines:
Complete the datapath to support the function, Z, using dedicated OUTPUT stage registers for each stage (and the needed muxes). Show where the output Z is taken from. Complete the reservation table and arrive at ICV. Draw state diagram, record greedy simple cycle(s) and arrive at MAL.
write-through /write-back
write-through /write-back
1.5pts
1.5pts
SQRX
- 3
/9
Dedicated OUTPUT stage registers.
+5
Z = X2 9
2+ 5 - 3
9
SquareSubtract 3Divide by 9
1 2 3 4 5
Add 5
Reservation table for Z
Show the tap off for the Z output. ICV: _________________
STA
TE
DIA
GR
AM
MAL analysis:
6
ee457_Final_Fall2010_r1.fm December 10, 2010 3:28 pmEE457 Final Exam - Fall 2010 12 / 14C Copyright 2010 Gandhi Puvvada
6 ( 44 points) 25 min.
Virtual Memory and Cache
Specs of the Trojan computer (a 32-bit address, 32-bit data, byte-addressable machine with physically addressed cache (more specifically PIPT cache).
Virtual address space = 4GB, Virtual address = 32 bits (VA31-VA0) (232 = 4G), Physical address space = 4GB, Physical address = 32 bits (PA31-PA0) (232 = 4G)
Page size = 2KB (211 = 2K), TLB size = 64 entry (fully-associative) (26 = 64)Page table organization: 2-level table with 256-entry (28 = 256) page directory (top level table)
Cache size = 192KB (3*216 = 192K), Cache Block (cache line size) = four 32-bit words (16 bytes total) (24 = 16), Cache mapping: Set-associative with three blocks per set. (note 3 blocks per set)
Main memory organization: Lower-order Interleaved. Degree of interleaving to suit the most efficient access of the main-memory block for transferring to cache.
6.1 Divide the virtual address into VPN (Virtual Page Number) and Page offset fields. Since TLB is a fully associative TLB, we ____________ (further divide / do not divide) the VPN into TAG and SET fields. How many comparators of what size are needed in the TLB? _____________ _______________________________________
Is any portion of the virtual address used for "indexing" TLB? ______________ (Yes / No ).
6.2 Divide the virtual address into VPN and Page offset fields again and further divide the VPN (based on the page table organization information) into page directory index and 2nd-level page table index.
6.3 Divide the physical address into PPFN (Physical Page Frame Number) and Page offset fields.
4pts
VA19 VA18 VA17 VA16VA31 VA30 VA29 VA28 VA27 VA26 VA25 VA24 VA23 VA22 VA21 VA20 VA3 VA2 VA1 VA0VA15 VA14 VA13 VA12 VA11 VA10 VA9 VA8 VA7 VA6 VA5 VA4
Word Byte
Virtual addressVA31-VA0
BE3-BE0Bank Enables(Byte enables)
3pts
VA19 VA18 VA17 VA16VA31 VA30 VA29 VA28 VA27 VA26 VA25 VA24 VA23 VA22 VA21 VA20 VA3 VA2 VA1 VA0VA15 VA14 VA13 VA12 VA11 VA10 VA9 VA8 VA7 VA6 VA5 VA4
Word Byte
Virtual addressVA31-VA0
BE3-BE0Bank Enables(Byte enables)
3pts
PA19 PA18 PA17 PA16PA31 PA30 PA29 PA28 PA27 PA26 PA25 PA24 PA23 PA22 PA21 PA20 PA3 PA2 PA1 PA0PA15 PA14 PA13 PA12 PA11 PA10 PA9 PA8 PA7 PA6 PA5 PA4
Word Byte
Physical addressPA31-PA0
BE3-BE0Bank Enables(Byte enables)
ee457_Final_Fall2010_r1.fm December 10, 2010 3:28 pmEE457 Final Exam - Fall 2010 13 / 14C Copyright 2010 Gandhi Puvvada
6.4 Divide the physical address (based on cache specifications) into TAG, SET, WORD and BYTE fields
6.5 If the 32-bit physical byte address (produced by address translation through TLB or Page Table) is 90586124H (1001_0000_0101_1000_0110_0001_0010_0100B), which set in the cache you will be approaching? Does this set number form an index (an address) into _____________________________ (the multiple TAG RAMs/the single TAG RAM/neither of these).
Complete the TAG RAM details in the side panel.
6.6 Complete the Cache DATA RAM details below.
6.7 Complete the Interleaved Main Memory details below.
6.8 TLB miss does not cause a TRAP. T / F During TLB look up, a Read/Write/Execute violation (a memory protection violation) causes a TRAP. T / F
6.9 If there is only one TAG RAM, it is a ________________________ (direct mapped/set-associative/fully-associative) cache.
3pts
PA19 PA18 PA17 PA16PA31 PA30 PA29 PA28 PA27 PA26 PA25 PA24 PA23 PA22 PA21 PA20 PA3 PA2 PA1 PA0PA15 PA14 PA13 PA12 PA11 PA10 PA9 PA8 PA7 PA6 PA5 PA4
Word Byte
Physical addressPA31-PA0
BE3-BE0Bank Enables(Byte enables)
Address
Data_in
Data_out
Com
para
tor
HIT
Size =
+ valid
TAG RAM
_____ more (besides the above)
are needed in this cache.
8pts
5pts
DATA RAMAddress
TrojanProcessor D31-D0 D
31-D
24
D15
-D8
D23
-D16
D7-
D0
______ moresuch DATARAM units
Size: Eachof the 4byte_widebanks is a x 8
(besides h one on the sie
5pts
D31-D24 D23-D16
32 bit bidirectional buffer (XCVR)
256MB 256MB
PA - PA
D15-D8 D7-D0
256MB 256MB
D31-D0
______ more such units (besides the one on the left) exist in Main Memory.
2.5pts
1.5pts
ee457_Final_Fall2010_r1.fm December 10, 2010 3:28 pmEE457 Final Exam - Fall 2010 14 / 14C Copyright 2010 Gandhi Puvvada
6.10 In a set associative cache of 2-blocks per set and 4 words per block, the degree of lower-order interleaving recommended for the main memory is __________ (1-way/2-way/4-way/other namely ...) and the number of TAG RAMs is __________ (8/16/32/other namely ...).The depth of a TAG RAM is determined by ________________________________________.
6.11 The fully associative TLB can have a non-power of 2 number of entries, say 53 entries. T / FThe number of sets in a set associative mapping can be a non-power of 2 number, say 53 sets. T / FThe number of TAG RAMs in a set associative mapping can be a non-power of 2 number, say 3. T / F
7 ( 21 points) 15 min.
Page Table: Number of A,B,C Tables built by the OS:
PQRST on the side represents a 20-bit (5-digit hex) VPN in a 3-level page table with upper 8 bits (PQ) indexing the A-level table, next 8 bits (RS) indexing the B-level tables, and the last 4 bits (T) indexing the C-level tables.
7.1 Suppose the first 8 distinct virtual pages accessed by the application program had the VPNs as stated in TABLE-I (in sorted order).How many tables of what size are built by OS by this time?A-level: _____________________________________________ B-level: _____________________________________________ C-level: _____________________________________________
7.2 Complete 8 distinct VPNs of your choice in TABLE-II such that the least number of A,B,C tables are built by OS. This least set consists of ____ of A-Table(s), ____ of B-Table(s), ____ of C-Table(s).
7.3 Similarly, complete 8 distinct VPNs of your choice in TABLE-III such that the most number of A,B,C tables are built by OS. This most set consists of ____ of A-Table(s), ____ of B-Table(s), ____ of C-Table(s).
5pts
4pts
TABLE-II TABLE-IIIP Q R S T P Q R S T P Q R S TTABLE-I
1 2 3 4 51 2 3 4 71 2 3 6 51 3 3 6 51 4 3 6 51 5 3 6 51 6 3 6 51 6 5 6 5
9pts
6pts
6pts
Blank space for rough work
We enjoyed teaching this course. Hope you liked the course. Hope to see some of you in EE454L or EE560. Grades will be out in a week. Enjoy your Xmas break! - Gandhi, Jonathan, Prasanjeet, Sabya, Mehrtash, Ben, Ankit, Girish, Jingming, Sumit
/ColorImageDict > /JPEG2000ColorACSImageDict > /JPEG2000ColorImageDict > /AntiAliasGrayImages false /DownsampleGrayImages true /GrayImageDownsampleType /Bicubic /GrayImageResolution 300 /GrayImageDepth -1 /GrayImageDownsampleThreshold 1.50000 /EncodeGrayImages true /GrayImageFilter /DCTEncode /AutoFilterGrayImages true /GrayImageAutoFilterStrategy /JPEG /GrayACSImageDict > /GrayImageDict > /JPEG2000GrayACSImageDict > /JPEG2000GrayImageDict > /AntiAliasMonoImages false /DownsampleMonoImages true /MonoImageDownsampleType /Bicubic /MonoImageResolution 1200 /MonoImageDepth -1 /MonoImageDownsampleThreshold 1.50000 /EncodeMonoImages true /MonoImageFilter /CCITTFaxEncode /MonoImageDict > /AllowPSXObjects false /PDFX1aCheck false /PDFX3Check false /PDFXCompliantPDFOnly false /PDFXNoTrimBoxError true /PDFXTrimBoxToMediaBoxOffset [ 0.00000 0.00000 0.00000 0.00000 ] /PDFXSetBleedBoxToMediaBox true /PDFXBleedBoxToTrimBoxOffset [ 0.00000 0.00000 0.00000 0.00000 ] /PDFXOutputIntentProfile () /PDFXOutputCondition () /PDFXRegistryName (http://www.color.org) /PDFXTrapped /Unknown
/Description >>> setdistillerparams> setpagedevice