1 ( 42 points) 25 min....Name: Perfect score: 220 / 235 1 ( 42 points) 25 min. Pipelining (Lab 7 Part 3 modified): On the next page you fi nd the original lab 7 Part 3 Block Di agram,

ee457_Final_Fall2010_r1.fm December 10, 2010 3:28 pmEE457 Final Exam - Fall 2010 1 / 14C Copyright 2010 Gandhi Puvvada

Fall 2010 EE457 Instructor: Gandhi Puvvada Final Exam (30%) Date: 12/10/2010, Friday Closed Book, Closed Notes; Time: 8:00 - 10:45AM SGM123 Calculator and Cadence Verilog Guide allowed Total points: 235 Name: Perfect score: 220 / 235

1 ( 42 points) 25 min.

Pipelining (Lab 7 Part 3 modified):

On the next page you find the original lab 7 Part 3 Block Diagram, provided for your information.On the page after, you find a modified diagram for you to complete.

Mr. Trojan says that what you intend to do at 12:01AM (at the beginning of a clock) can easily be done at 11:59PM of the previous day (at the end of the previous clock, logic wise, assuming timing is not an issue). The original forwarding in EX1 (controlled by FU1, FORW1) is now moved to ID (controlled by FU_ID, FORW_ID). And the original forwarding in EX2 (controlled by FU2, FORW2) is now moved to EX1 (controlled by FU_EX1, FORW_EX1).

These changes in forwarding do not cause any change in(a) HDU or generation of STALL T / F (b) generation of SKIP1 or SKIP2 T / F (c) the internal forwarding logic/mechanism in the register file T / F

Draw the logic for the two new FUs (Forwarding Units).

If you were to code this new design in RTL coding style, among ID, EX1, and EX2, you would code _____________ first, and then ______________, and finally _____________. Assume that the register file is negative-edge triggered and the rest of the system is positive-edge triggered.

In the RTL coding of Lab 7 Part 3, in the main clocked procedural block, we used _______________ ( if(STALL) / if (~STALL)) ________________ (with / without) an else clause.

PRIORITY

FORW_ID

FU_ID

FORW_EX1

FU_EX1

ee457_Final_Fall2010_r1.fm D

ecember 10, 2010 3:28 pmEE457 Final Exam

- Fall 2010 2 / 14C

Copyright 2010 G

andhi Puvvada

FOR REFERENCE ONLY

PC

XA

Reg. File

XA

RA

RDR-Write

0

10

1

0

10

1

A

Cout

A

Cout

Comp Station in ID Stage

ID_XMEX1 ID_XMEX2

P P Q

IF ID EX1 EX2 WBComp Station in ID Stage

Q

ID_XA EX1_RA ID_XA EX2_RA

P=Q P=Q

ID_XMEX1= ID_XA Matched with EX1_RA

XD

HDU

EN

XM

EX1

XM

EX2

A-3 A+4

EN

XM

EX1

FU1

EN

RD

Writ

e

RA

FU2

XD

XD

EX1_ADD4

EX1_SUB3

EX1_ADD1

EX1_RA

PRIORITY0 1

RESET_BRESET_B RESET_BRESET_B

1. Complete all missing connections to the Reg. File. Also complete the RA(Result Addreee) connection in ID stage (ID_RA).2. Complete all five enable (EN) controls on the pipeline registers (including PC).

4. Complete the skip controls(SKIP1,SKIP2).5. Draw the logic for the HDU, FU1, and FU2, producing STALL, PRIORITY, FORW1, FORW2.

EX2_ADD4

EX2_SUB3

EX2_ADD1EX2_RA

WB_RA

WB_Write

WB_RDX1_Mux

R1_Mux X2_Mux

R2_Mux

SKIP

1

SKIP

2

Qualifying signals

Qualifyingsignals

QualifyingSignals

LAB 7 Part 3 Block Diagram

I-MEMEN

RESET_B

PRIORITYEX2_XMEX1

ADD4SUB3STALL

EN

FOR

W1 FO

RW

2

Fig. 1

ADD4

SUB3

AD

D1

RAM

OV

ADD4

SUB3

AD

D1

RA

MO

V

ADD4

SUB3

AD

D1

RA

MO

V

EX1_MOVEX2_MOV

revised 7/18/2010

3. Complete the forwarding path from EX2 to EX1. Should it start from upstream or downstream of the X2_mux?

FOR REFERENCE ONLY

ee457_Final_Fall2010_r1.fm D

ecember 10, 2010 3:28 pmEE457 Final Exam

- Fall 2010 3 / 14C

Copyright 2010 G

andhi Puvvada

COMPLETE THIS

PC

XA

XA

RA

RDR-Write

0

10

1

0

10

1A

Cout

A

Cout

Comp Station in ID Stage

ID_XMEX1 ID_XMEX2

P P Q

IF ID EX1 EX2 WBComp Station in ID Stage

Q

ID_XA EX1_RA ID_XA EX2_RA

P=Q P=Q

ID_XMEX1= ID_XA Matched with EX1_RA

XD

HDU

EN

A-3 A+4

EN

FU_ID

EN

RD

Writ

e

RA

FU_EX1

XD

XD

EX1_ADD4

EX1_SUB3

EX1_ADD1

EX1_RA

PRIO

RIT

Y

0 1

RESET_BRESET_BRESET_B RESET_B

1. Connect/label all missing connections to the Reg. File. Also complete the RA(Result Addreee) connection in ID stage (ID_RA).2. Complete all five enable (EN) controls on the pipeline registers (including PC).

4. Complete the skip controls(SKIP1,SKIP2).5. Draw on a separate paper the logic for the FU_ID, and FU_EX1,

EX2_ADD4

EX2_SUB3

EX2_ADD1EX2_RA

WB_RA

WB_Write

WB_RDXID_MuxR1_Mux

XEX1_Mux

R2_Mux

SKIP

1

SKIP

2

Qualifying signals Qualifyingsignals

QualifyingSignals

LAB 7 Part 3

I-MEMEN

RESET_B

EX1_XMEX

ADD4SUB3STALL

EN

FOR

W_I

D

FOR

W_E

X1

Subpart 1 Fig. modified

ADD4

SUB3

AD

D1

RAM

OV

ADD4

SUB3

AD

D1

RA

MO

V

ADD4

SUB3

AD

D1

RA

MO

V

EX1_MOVEX2_MOV

12/7/2010

3. Complete the forwarding paths into ID. If a path is not needed, write "no connection".

for Fall2010 Final Exam

InternallyForwardingReg. File

ID_RA

producing PRIORITY, FORW_ID, FORW_EX1.

EX2_XD

EX1_

XD

EX1_

XD

_OU

T

EX2_

XD

_OU

T

ID_X

D_O

UT

Write a “1” or “2”



RTL coding: Suppose you are asked to write a verilog RTL code using one clocked always procedural block for the control unit (CU)and another clocked procedural block for the datapath unit (DPU).

A. Would you divide the two parts as per the left diagram or the right diagram? Left / Right B. Is it essential to have the RESET control for the CU or the DPU? CU / DPUC. The outputs of OFL will be treated as intermediate variables or final outputs? Intermediate / FinalD. You will be using blocking or non-blocking assignments to produce these OFL outputs? Blocking / Non-blockingE. Is it possible to combine the two clocked always blocks into one single always block? Yes / NoF. If combining is possible, the combined always block __________ (will / will not) have a RESET control signal in the event list (sensitivity list).


Arithmetic (Fast Adders)

3.1 You are taught the following cascadable incrementer which performs R2R1R0 = A2A1A0 + C0.

I0I1 S

Y

I0I1 S

Y X_Reg

Y_Reg

NSLSM

OFL

DPU

CU

Current_State

I0I1 S

Y

I0I1 S

Y X_Reg

Y_Reg

NSLSM

OFL

DPU

CU

Current_State

ap s

cX2

S2

ap s

cX1

S1

ap s

cX0

S0p0p1p2 C2 C1 C0 C0C3 New CLL INC

A2 A1 A0

R2 R1 R0

ap s

cXi

Si

Si = Xi (+) 0 (+) Ci

Incrementing cell

= Xi (+) Ci

pi = Xi + 0 = Xigi = Xi . 0 = 0

p0p1p2 C2 C1 C0 C0C3 New CLL INC

New CLL INC

Since all gi are zeros, C1 = p0 . C0C2 = p1 . p0 . C0C3 = p2 . p1 . p0 . C0

Least significant module’s C0 is tied to a 1.


Complete the following cascadable decrementer which performs R2R1R0 = A2A1A0 -1 by adding 111 to subtract a 1 (R2R1R0 = A2A1A0 + 111). Complete the 7 rectangles.

3.2 You have gone through the following solution to a question in an earlier exam.

A variation of the above questions is to add 000_111_000_111 to A11A10A9A8A7A6A5A4A3A2A1A0 and produce the result R11R10R9R8R7R6R5R4R3R2R1R0 .The following design is complete and correct. Let us analyze and try to improve it!

7pts

ag s

cX2

S2

ag s

cX1

S1

ag s

cX0

S0g0g1g2 C2 C1 C0 C0C3 New CLL DEC

A2 A1 A0

R2 R1 R0

ag s

cXi

Si

Si = Xi (+) 1 (+) Ci

decrementing cell

= Xi + Ci

pi = Xi + 1 = gi = Xi . 1 =

g0g1g2 C2 C1 C0 C0C3 New CLL DEC

New CLL DEC

Since all pi are , C1 = C0C2 = C0C3 = C0

Least significant module’s C0 is tied to a . XNOR

9-bit constant addition: You need to add 000_111_000 to A8A7A6A5A4A3A2A1A0 and producethe result R8R7R6R5R4R3R2R1R0 . Mr. Trojan says that you should be able to do it by cascading justone incrementer and one decrementer designed before. Complete the design below.

ap s

cX2

S2

ap s

cX1

S1

ap s

cX0


ag s

cY2

D2

ag s

cY1

D1

ag s

cY0

D0g0g1g2 C2 C1 C0 C0C3 New CLL DEC

A8 A7 A6 A5 A4 A3 A2 A1 A0

R8 R7 R6 R5 R4 R3 R2 R1 R0

C9?

SOLUTION

9pts

ap sc

X2

S2

ap sc

X1

S1

ap sc

X0


ag sc

Y2

D2

ag sc

Y1

D1

ag sc

Y0


A5 A4 A3 A2 A1 A0

R5 R4 R3 R2 R1 R0C0

C3C6

ap sc

X2

S2

ap sc

X1

S1

ap sc

X0


ag sc

Y2

D2

ag sc

Y1

D1

ag sc

Y0


A11 A10 A9 A8 A7 A6

R11 R10 R9 R8 R7 R6C6

C9C12

State cumulative delays

C6: _______; C9: _______; C11: _______; R11: _______; C12: _______;

Note: In EE457, we count an XOR or a XNOR as a2-gate-delay device.

C3: _______; in gate-delays


Miss Trojan proposed to have Group Propagates (upper case P’s) and Group Generates (upper case G’s) so that she can add a 2nd level CLL and avoid the linear cascade shown above.

Here is Miss Trojan’s proposed design. She wants you to simplify a regular CLL to form the special 2nd level CLL which takes advantage of the specific values of P’s and G’s and overall C0. Note that this design is meant for this specific 12-bit constant addition and it need not be cascadable or extendible.

5pts

p0p1p2 C2 C1 C0 C0C3 New CLL INC

New CLL INCg0g1g2 C2 C1 C0 C0C3 New CLL DEC

New CLL DEC

P GP G

Since the individual gi’s are all _______ (zero/one)the Group G =

And the Group P =

Since the individual pi’s are all _______ (zero/one)the Group P =

And the Group G =

20pts

ag s

cY2

D2

ag s

cY1

D1

ag s

cY0

D0g0g1g2 C2 C1 C0 C0New CLL DEC

A2 A1 A0

R2 R1 R0

Miss Trojan’s Design

C0C3

ap s

cX2

S2

ap s

cX1

S1

ap s

cX0

S0p0p1p2 C2 C1 C0 C0New CLL INC

ag s

cY2

D2

ag s

cY1

D1

ag s

cY0

D0g0g1g2 C2 C1 C0 C0New CLL DEC

A11 A10 A9 A8 A7 A6

R11 R10 R9 R8 R7 R6

C6C9P G P G P G P G

P0 G0P1 G1P2 G2P3 G3 C3C1C2C4

C122nd level CLL specific for this problem

Write equations for C4, C3, C2, C1 for a generic CLL in terms of the9 inputs (C0, P0, G0, P1, G1, P2, G2, P3, G3), and simplify substituting

C1 = G0 + P0.C0 =

C2 =

C3 =

C4 =

ap s

cX2

S2

ap s

cX1

S1

ap s

cX0

S0p0p1p2 C2 C1 C0 C0New CLL INC

A5 A4 A3

R5 R4 R3

C0: _______ (0 / 1 / variable); 0P0: _______ (0 / 1 / variable); G0: _______ (0 / 1 / variable); P1: _______ (0 / 1 / variable); G1: _______ (0 / 1 / variable); P2: _______ (0 / 1 / variable); G2: _______ (0 / 1 / variable); P3: _______ (0 / 1 / variable); G3: _______ (0 / 1 / variable);

constants of 0 or 1 wherever possible. For example C0 = 0.

State delays in gate-delays

C6: _______;

C9: _______;

C11: _______;

R11: _______;

C12: _______;

C3: _______;

Cancelled


3.3 In a 256x256 multiplier, to reduce 256 PPs (partial products) in a CSA tree, the number of levels of the CSA needed are approximately(a) log10256 (b) log2256 (c) log1.5256 (d) log1.564 (e) log264 (f) other ___________

The above is ________________ (lower / upper) bound.

The partial CSA tree on the right needs _______ iterations to reduce the 256 PPs to _____ (2 / 1) vector(s) for further processing by the CPA.


4.1 Out of Order (OoO) Execution

4.1.1 "Branch Prediction and speculative execution beyond branches" is only possible in the design on the _______ (left / right) because in the other design, if we dispatch instructions based on prediction, these speculative instructions ____________________________________________ ____________________________________________________________________________ ____________________________________________________________________________ ____________________________________________________________________________

4.1.2 We know that a memory delay of 10ns would mean 10 clocks for a processor running at ___________ (1 GHz / 2 GHz) and the same 10ns would mean 20 clocks if the processor is running at ___________ (1 GHz / 2 GHz). Hence, when increasing the processor frequency from 1 GHz to 2 GHz (without any change in memory speed), you would recommend that the depth of the instruction queues is __________________ (increased / decreased) in the case of ____________________________________________________________________________(state the queue name/names from among Integer queue, Load-Store queue, Divider queue, Multiplier queue).

S3

S4

S5

S2

S1

CPA

6pts

Issue Unit

Int.

Div

ider

63

2

TAG FIFO

Int.

Mul

tiplie

r

Issue Unit

Int.

Div

ider

63

2

63

2

TAG FIFO

Int.

Mul

tiplie

r

3pts

4pts


4.1.3 RAW dependency is solved by simply making the reader wait until the writer can forward the information to the reader in the design on _____________ (the left / the right / both sides).

4.1.4 In the short code of 4 lines on the side, you notice that OoO can potentially cause ___________________________________ (RAW/WAR/WAW/multiple of these(state them)) hazards for $8.

In-order writing alone in the design on the _____________ (left / right) eliminates _____________________________ (RAW/WAR/WAW) hazards among register.

Design on the left: Let us say that the dispatch unit assigns a symbolic Tag of LION to the destination register $8 of instr. #1 and a little later assigns TIGER to the destination register $8 of instr. #3. LION is written against $8 in RST first and later it is replaced by TIGER. State how the hazards listed by you for $8 are addressed in the design on the left.____________________________________________________________________________ ____________________________________________________________________________ ____________________________________________________________________________ ____________________________________________________________________________ ____________________________________________________________________________ ____________________________________________________________________________ Design on the right: Let us say that the dispatch unit assigns the ROB Tag of 21 to the destination register $8 of instr. #1 and a little later assigns the ROB Tag of 23 to the destination register $8 of instr. #3. Again state how the hazards listed by you for $8 are addressed in the design on the right.

____________________________________________________________________________ ____________________________________________________________________________ ____________________________________________________________________________ ____________________________________________________________________________ ____________________________________________________________________________

4.1.5 Conditional branches cause more stalls in dispatch in the design on the ___________ (left / right) where as in the design on the ___________ (left / right) more flushes due to branch mispredictions occur.

4.1.6 Flush of ___________________________ (IFQ / Backend / ROB) is same in both the designs due to (circle all applicable items below) (a) mispredicted conditional branches (beq/bne) (b) unconditional jump (j) (c) unconditional jump and link (jal) (d) unconditional program return (jr $31)

4.1.7 It is enough to predict the direction of a conditional branch using BPB, standing for Branch _____________ Buffer, if prediction is done from the ________________ (IF stage / Dispatch

2pts

add $8, $1, $2; instr. #1lw $10, 100($8); instr. #2add $8, $3, $4; instr. #3lw $11, 100($8); instr. #4

4pts

10pts

3pts

3pts

3pts

Cancelled


stage), but in the other case, we need BTB, standing for Branch ___________ Buffer to provide the target address.

4.1.8 On the side we have shown three instructions at the PC values 1040H, 2040H, and 3040H. They are distanced by 1000H bytes = 400H words.If BPB and BTB are each 1K deep (210 = 1K = 400H), does it cause aliasing? Where aliasing is unacceptable? ______________________(trying to predict from IF stage / trying to predict from dispatch stage / both / none). Explain briefly. ________________________________ ________________________________________________________ _____________________________________________________________________________ _____________________________________________________________________________ _____________________________________________________________________________ _____________________________________________________________________________ _____________________________________________________________________________ _____________________________________________________________________________

4.2 Exceptions:

4.2.1 Page fault _______________ (needs /does not need) to be associated with an instruction and hence ________ (is / isn’t) a precise exception.

4.2.2 As part of handling a precise exception, we need to (a) tag the offending instruction with its Cause and EPC info. T/F(b) convert the offending instruction and all the following instructions into bubbles. T/F(c) allow all preceding (senior) instructions in process order to complete. T/F(d) be silent and carry the Cause and EPC until the offending instruction reaches WB. T/F

4.2.3 Place a check mark in the stage (or stages) an exception can occur.

4.3 RAS (Return Address stack)

4.3.1 Consider the following 4 types of program control instructions:(a) unconditional branches (example: j) (b) function calls (example: jal)(c) conditional branches (example: beq, bne)(d) function returns (example: jr $31)

RAS provides target address for the __________________ (j/jal/beq/bne/jr $31) instruction(s).

Exception IF ID EX MEM WB

Page Fault

Integer Overflow

Undefined Opcode

Memory Protection Violation

1040: beq $1, $2, 20;. . . . .

2040: add $4, $5, $6;. . . . .

3040: beq $10, $20, 5;

6pts

2pts

4pts

4pts

4pts


A PUSH operation on RAS takes place when _________________ (j/jal/beq/bne/jr $31) is/are executed.A POP operation on RAS takes place when ___________________ (j/jal/beq/bne/jr $31) is/are executed.

4.3.2 RAS, being usually _______________ (small / large), can only predict the return address. The prediction can go wrong if the degree of nesting/ the degree of recursion in recursive call _______________________ (exceeds / does not exceed) the depth of the RAS.

4.4 CMP (Chip Multiprocessors) with CMT (Chip multithreading)

4.4.1 ILP (standing for ___________________________ ) has been more or less fully exploited and processor architects have turned to exploit TLP (standing for ___________________________ )

4.4.2 When one thread is switched with another in a multi-threaded core, the register file contents are saved in the main memory. T / F Since the number of alternative register files ______________ (is / isn’t quite) finite and since the number of process control block copies in memory ______________ (is / isn’t quite) finite, the number of threads per a multi-threaded core is _____________________________________, where as the number of processes that can be run on a core using software context switching is generally ___________________________________________________________________.

4.4.3 Functional units such as ALU are ____________ (common / separate) for the 4 threads running on a core.

4.4.4 The stall penalty due to dependency on a load word instruction can usually be avoided in ______ _______________________ (Fine-Grained / Coarse-Grained / both types of / neither type of ) multithreading.

4.4.5 _________ (Fine / Coarse)-grain multithreading switches threads on each instruction where as _________ (fine / coarse)-grain switches threads on costly stalls such as cache misses.

4.4.6 Dynamic power considerations favor ________________________ (Uniprocessors/Multiprocessors).

4.4.7 A _____________________ (non-blocking / blocking) cache handles multiple cache requests, usually as long as they are hits under a pending miss. A CMP (such as Sun Niagara T1) needs to use a ____________________ (non-blocking / blocking) cache to be able to execute multiple threads. A non-blocking cache ________________ (is / isn’t) useful in an OoO executing processor as it is ______________ (possible / not possible) to handle several load/store instructions in the cache.

4.5 Multiprocessors and Cache Coherence:

4.5.1 Snoopy controller, in a ____________________ (write-through /write-back/both/neither) cache-coherence system, does not care to watch read transactions from the other processors [ R(j) ].

3pts

2pts

6pts

1pts

2pts

2pts

1pts

3pts

3pts

3pts


Label the following two state diagrams as write-through or write-back.

4.5.2 The "dual" directory in snoopy protocol refers to the duplicated _________ (TAG / DATA/TLB) RAM.

4.5.3 If there is a "Dirty bit" besides a "Valid bit" associated with a cache block, then the designer must be using a ____________________ (write-through /write-back/any of the two/neither) cache.

5 ( 18 points) 10 min. Non-linear pipelines:

Complete the datapath to support the function, Z, using dedicated OUTPUT stage registers for each stage (and the needed muxes). Show where the output Z is taken from. Complete the reservation table and arrive at ICV. Draw state diagram, record greedy simple cycle(s) and arrive at MAL.

write-through /write-back

write-through /write-back

1.5pts

1.5pts

SQRX

- 3

/9

Dedicated OUTPUT stage registers.

+5

Z = X2 9

2+ 5 - 3

9

SquareSubtract 3Divide by 9

1 2 3 4 5

Add 5

Reservation table for Z

Show the tap off for the Z output. ICV: _________________

STA

TE

DIA

GR

AM

MAL analysis:

6



Virtual Memory and Cache

Specs of the Trojan computer (a 32-bit address, 32-bit data, byte-addressable machine with physically addressed cache (more specifically PIPT cache).

Virtual address space = 4GB, Virtual address = 32 bits (VA31-VA0) (232 = 4G), Physical address space = 4GB, Physical address = 32 bits (PA31-PA0) (232 = 4G)

Page size = 2KB (211 = 2K), TLB size = 64 entry (fully-associative) (26 = 64)Page table organization: 2-level table with 256-entry (28 = 256) page directory (top level table)

Cache size = 192KB (3*216 = 192K), Cache Block (cache line size) = four 32-bit words (16 bytes total) (24 = 16), Cache mapping: Set-associative with three blocks per set. (note 3 blocks per set)

Main memory organization: Lower-order Interleaved. Degree of interleaving to suit the most efficient access of the main-memory block for transferring to cache.

6.1 Divide the virtual address into VPN (Virtual Page Number) and Page offset fields. Since TLB is a fully associative TLB, we ____________ (further divide / do not divide) the VPN into TAG and SET fields. How many comparators of what size are needed in the TLB? _____________ _______________________________________

Is any portion of the virtual address used for "indexing" TLB? ______________ (Yes / No ).

6.2 Divide the virtual address into VPN and Page offset fields again and further divide the VPN (based on the page table organization information) into page directory index and 2nd-level page table index.

6.3 Divide the physical address into PPFN (Physical Page Frame Number) and Page offset fields.

4pts

VA19 VA18 VA17 VA16VA31 VA30 VA29 VA28 VA27 VA26 VA25 VA24 VA23 VA22 VA21 VA20 VA3 VA2 VA1 VA0VA15 VA14 VA13 VA12 VA11 VA10 VA9 VA8 VA7 VA6 VA5 VA4

Word Byte

Virtual addressVA31-VA0

BE3-BE0Bank Enables(Byte enables)

3pts

VA19 VA18 VA17 VA16VA31 VA30 VA29 VA28 VA27 VA26 VA25 VA24 VA23 VA22 VA21 VA20 VA3 VA2 VA1 VA0VA15 VA14 VA13 VA12 VA11 VA10 VA9 VA8 VA7 VA6 VA5 VA4

Word Byte

Virtual addressVA31-VA0


3pts

PA19 PA18 PA17 PA16PA31 PA30 PA29 PA28 PA27 PA26 PA25 PA24 PA23 PA22 PA21 PA20 PA3 PA2 PA1 PA0PA15 PA14 PA13 PA12 PA11 PA10 PA9 PA8 PA7 PA6 PA5 PA4

Word Byte

Physical addressPA31-PA0



6.4 Divide the physical address (based on cache specifications) into TAG, SET, WORD and BYTE fields

6.5 If the 32-bit physical byte address (produced by address translation through TLB or Page Table) is 90586124H (1001_0000_0101_1000_0110_0001_0010_0100B), which set in the cache you will be approaching? Does this set number form an index (an address) into _____________________________ (the multiple TAG RAMs/the single TAG RAM/neither of these).

Complete the TAG RAM details in the side panel.

6.6 Complete the Cache DATA RAM details below.

6.7 Complete the Interleaved Main Memory details below.

6.8 TLB miss does not cause a TRAP. T / F During TLB look up, a Read/Write/Execute violation (a memory protection violation) causes a TRAP. T / F

6.9 If there is only one TAG RAM, it is a ________________________ (direct mapped/set-associative/fully-associative) cache.

3pts

PA19 PA18 PA17 PA16PA31 PA30 PA29 PA28 PA27 PA26 PA25 PA24 PA23 PA22 PA21 PA20 PA3 PA2 PA1 PA0PA15 PA14 PA13 PA12 PA11 PA10 PA9 PA8 PA7 PA6 PA5 PA4

Word Byte

Physical addressPA31-PA0


Address

Data_in

Data_out

Com

para

tor

HIT

Size =

+ valid

TAG RAM

_____ more (besides the above)

are needed in this cache.

8pts

5pts

DATA RAMAddress

TrojanProcessor D31-D0 D

31-D

24

D15

-D8

D23

-D16

D7-

D0

______ moresuch DATARAM units

Size: Eachof the 4byte_widebanks is a x 8

(besides h one on the sie

5pts

D31-D24 D23-D16

32 bit bidirectional buffer (XCVR)

256MB 256MB

PA - PA

D15-D8 D7-D0

256MB 256MB

D31-D0

______ more such units (besides the one on the left) exist in Main Memory.

2.5pts

1.5pts


6.10 In a set associative cache of 2-blocks per set and 4 words per block, the degree of lower-order interleaving recommended for the main memory is __________ (1-way/2-way/4-way/other namely ...) and the number of TAG RAMs is __________ (8/16/32/other namely ...).The depth of a TAG RAM is determined by ________________________________________.

6.11 The fully associative TLB can have a non-power of 2 number of entries, say 53 entries. T / FThe number of sets in a set associative mapping can be a non-power of 2 number, say 53 sets. T / FThe number of TAG RAMs in a set associative mapping can be a non-power of 2 number, say 3. T / F


Page Table: Number of A,B,C Tables built by the OS:

PQRST on the side represents a 20-bit (5-digit hex) VPN in a 3-level page table with upper 8 bits (PQ) indexing the A-level table, next 8 bits (RS) indexing the B-level tables, and the last 4 bits (T) indexing the C-level tables.

7.1 Suppose the first 8 distinct virtual pages accessed by the application program had the VPNs as stated in TABLE-I (in sorted order).How many tables of what size are built by OS by this time?A-level: _____________________________________________ B-level: _____________________________________________ C-level: _____________________________________________

7.2 Complete 8 distinct VPNs of your choice in TABLE-II such that the least number of A,B,C tables are built by OS. This least set consists of ____ of A-Table(s), ____ of B-Table(s), ____ of C-Table(s).

7.3 Similarly, complete 8 distinct VPNs of your choice in TABLE-III such that the most number of A,B,C tables are built by OS. This most set consists of ____ of A-Table(s), ____ of B-Table(s), ____ of C-Table(s).

5pts

4pts

TABLE-II TABLE-IIIP Q R S T P Q R S T P Q R S TTABLE-I

1 2 3 4 51 2 3 4 71 2 3 6 51 3 3 6 51 4 3 6 51 5 3 6 51 6 3 6 51 6 5 6 5

9pts

6pts

6pts

Blank space for rough work

We enjoyed teaching this course. Hope you liked the course. Hope to see some of you in EE454L or EE560. Grades will be out in a week. Enjoy your Xmas break! - Gandhi, Jonathan, Prasanjeet, Sabya, Mehrtash, Ben, Ankit, Girish, Jingming, Sumit

/ColorImageDict > /JPEG2000ColorACSImageDict > /JPEG2000ColorImageDict > /AntiAliasGrayImages false /DownsampleGrayImages true /GrayImageDownsampleType /Bicubic /GrayImageResolution 300 /GrayImageDepth -1 /GrayImageDownsampleThreshold 1.50000 /EncodeGrayImages true /GrayImageFilter /DCTEncode /AutoFilterGrayImages true /GrayImageAutoFilterStrategy /JPEG /GrayACSImageDict > /GrayImageDict > /JPEG2000GrayACSImageDict > /JPEG2000GrayImageDict > /AntiAliasMonoImages false /DownsampleMonoImages true /MonoImageDownsampleType /Bicubic /MonoImageResolution 1200 /MonoImageDepth -1 /MonoImageDownsampleThreshold 1.50000 /EncodeMonoImages true /MonoImageFilter /CCITTFaxEncode /MonoImageDict > /AllowPSXObjects false /PDFX1aCheck false /PDFX3Check false /PDFXCompliantPDFOnly false /PDFXNoTrimBoxError true /PDFXTrimBoxToMediaBoxOffset [ 0.00000 0.00000 0.00000 0.00000 ] /PDFXSetBleedBoxToMediaBox true /PDFXBleedBoxToTrimBoxOffset [ 0.00000 0.00000 0.00000 0.00000 ] /PDFXOutputIntentProfile () /PDFXOutputCondition () /PDFXRegistryName (http://www.color.org) /PDFXTrapped /Unknown

/Description >>> setdistillerparams> setpagedevice

Documents

1 ( 42 points) 25 min....Name: Perfect score: 220 / 235 1 ( 42 points) 25 min. Pipelining (Lab 7 Part 3 modified): On the next page you fi nd the original lab 7 Part 3 Block Di agram,