47
Review Problem 0 As you wait for class to start, answer the following question: What is important in a computer? What features do you look for when buying one?

Review Problem 0

  • Upload
    myron

  • View
    29

  • Download
    1

Embed Size (px)

DESCRIPTION

Review Problem 0. As you wait for class to start, answer the following question: What is important in a computer? What features do you look for when buying one?. Review Problem 1. What aspects of a microprocessor can affect performance?. Review Problem 2. - PowerPoint PPT Presentation

Citation preview

Page 1: Review Problem 0

1

Review Problem 0

As you wait for class to start, answer the following question: What is important in a computer? What features do

you look for when buying one?

Page 2: Review Problem 0

2

Review Problem 1

What aspects of a microprocessor can affect performance?

Page 3: Review Problem 0

3

Review Problem 2

If a 200 MHz machine runs ½ billion instructions in 10 seconds, what is the CPI of the machine?

If a second machine with the same CPI runs the program in 5 seconds, what is it’s clock rate?

Page 4: Review Problem 0

4

Review Problem 3

A program is 20% multiplication, 50% memory access, 30% other. You can quadruple multiplication speed, or double memory speed How much faster with 4x mult:

How much faster with 2x memory:

How much faster with both 4x mult & 2x memory:

Page 5: Review Problem 0

5

Review Problem 4

In assembly, compute the average of $a0, $a1, $a2, $a3, and put into $v0

Page 6: Review Problem 0

6

Review Problem 5

In assembly, replace the value in $a0 with its absolute value.

Page 7: Review Problem 0

7

Review Problem 6

Register $a0 has the address of a 3 integer array. Set $v0 to 1 if the array is sorted (smallest to largest), 0 otherwise.

Page 8: Review Problem 0

8

Review Problem 7

Sometimes it can be useful to have a program loop infinitely. We can do that, regardless of location, by the instruction:

LOOP: BEQ $7, $7, LOOP Convert this instruction to machine code

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00

Page 9: Review Problem 0

9

Review Problem 8

What does the number 110012 represent?

Page 10: Review Problem 0

10

Review Problem 9

Perform the following binary computations.

1 0 1 1 0+ 0 0 1 1 1

1 0 0 1- 0 0 1 1

Page 11: Review Problem 0

11

Review Problem 10

How would the ALU be used to help with each of the following branches? The first is filled in for you: beq ($rs == $rt) subtract $rt from $rs, use zero flag bne ($rs != $rt) bgez ($rs ≥ 0) bgtz ($rs > 0) blez ($rs ≤ 0) bltz ($rs < 0)

Page 12: Review Problem 0

12

Review Problem 11

Design a 4-bit sra (shift arithmetic right) unit. Note that sra $t0, 1 = $t0/2, $t0, 2 = $t0/4, …

Page 13: Review Problem 0

13

Review Problem 12

Write MIPS assembly to compute $t1 = $t0*5 without using a multiply or divide instruction.

Page 14: Review Problem 0

14

Review Problem 13

What is the value of the following floating-point number?

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00

1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 1 1 0

Page 15: Review Problem 0

15

Review Problem 14 What is done for these ops during each of the CPU’s execute steps at right?

add $t0, $t1, $t2 sw $t3, 16[$t4] lw $t5, 8[$t6]Instruction

Fetch

Instruction

Decode

Operand

Fetch

Execute

Result

Store

Next

Instruction

Page 16: Review Problem 0

16

Review Problem 15

Immediate vals for ADDI are sign-extended, while those for ORI are extended with zeros. Build a sign-extend unit that can handle both.

Page 17: Review Problem 0

17

Review Problem 16

Develop a single-cycle CPU that can do LW and SW (only). Make it as simple as possible

Page 18: Review Problem 0

18

Review Problem 17

What mods are needed to support jump register: PC = Reg[RS]

SignExtn

d

WrEn AddrDin Dout

DataMemory

InstructionFetchUnit

Rs Rt Rs Rt Rd Imm16

imm16

Instructions[31:0]

[25:21]

[20:16]

[15:11]

[15:0]

BranchJump

ALUSrc

RegDst

Rd Rt

ALUcntrl

Aw Aa Ab DaDw Db

RegisterWrEn FileRegWr

MemWr MemToReg

Zero

SignExtn

dSignE

xtnd

WrEn AddrDin Dout

DataMemory

WrEn AddrDin Dout

DataMemory

InstructionFetchUnit

InstructionFetchUnit

Rs Rt Rs Rt Rd Imm16

imm16

Instructions[31:0]

[25:21]

[20:16]

[15:11]

[15:0]

BranchJump

ALUSrc

RegDst

Rd Rt

ALUcntrl

Aw Aa Ab DaDw Db

RegisterWrEn File

Aw Aa Ab DaDw Db

RegisterWrEn FileRegWr

MemWr MemToReg

Zero

Sign

Extn

d

PC

Addr[31:2] Addr[1:0] Instruction

Memory

Con

catenate

Adder

Instr[31:0]Jump

“00”PC[31:28]

Target Instr[25:0]

imm16

“1”

Zero

Branch

Cin

“0” Sign

Extn

dS

ignE

xtnd

PC

PC

Addr[31:2] Addr[1:0] Instruction

Memory

Addr[31:2] Addr[1:0] Instruction

Memory

Con

catenate

Con

catenate

Adder

Adder

Instr[31:0]Jump

“00”PC[31:28]

Target Instr[25:0]

imm16

“1”

Zero

Branch

Cin

“0”

Page 19: Review Problem 0

19

Review Problem 18

Show the datapath changes and control settings needed to implement “addi $rs, $rt, imm16”

SignE

xtnd

WrEn AddrDin Dout

DataMemory

InstructionFetchUnit

Rs Rt Rs Rt Rd Imm16

imm16

Instructions[31:0]

[25:21]

[20:16]

[15:11]

[15:0]

BranchJump

ALUSrc

RegDst

Rd Rt

ALUcntrl

Aw Aa Ab DaDw Db

RegisterWrEn FileRegWr

MemWr MemToReg

Zero

SignE

xtndSign

Extnd

WrEn AddrDin Dout

DataMemory

WrEn AddrDin Dout

DataMemory

InstructionFetchUnit

InstructionFetchUnit

Rs Rt Rs Rt Rd Imm16

imm16

Instructions[31:0]

[25:21]

[20:16]

[15:11]

[15:0]

BranchJump

ALUSrc

RegDst

Rd Rt

ALUcntrl

Aw Aa Ab DaDw Db

RegisterWrEn File

Aw Aa Ab DaDw Db

RegisterWrEn FileRegWr

MemWr MemToReg

Zero

Sign

Extn

d

PC

Addr[31:2] Addr[1:0] Instruction

Memory

Con

catenate

Adder

Instr[31:0]Jump

“00”PC[31:28]

Target Instr[25:0]

imm16

“1”

Zero

Branch

Cin

“0” Sign

Extn

dS

ignE

xtnd

PC

PC

Addr[31:2] Addr[1:0] Instruction

Memory

Addr[31:2] Addr[1:0] Instruction

Memory

Con

catenate

Con

catenate

Adder

Adder

Instr[31:0]Jump

“00”PC[31:28]

Target Instr[25:0]

imm16

“1”

Zero

Branch

Cin

“0”RegDst

ALUSrc

MemToReg

RegWr

MemWr

Branch

Jump

ALUCntrl

Page 20: Review Problem 0

20

Review Problem 19

To allow a CPU to spend a cycle waiting, we use a NOP (No operation) function. What are the control settings for the NOP instruction?

SignE

xtnd

WrEn AddrDin Dout

DataMemory

InstructionFetchUnit

Rs Rt Rs Rt Rd Imm16

imm16

Instructions[31:0]

[25:21]

[20:16]

[15:11]

[15:0]

BranchJump

ALUSrc

RegDst

Rd Rt

ALUcntrl

Aw Aa Ab DaDw Db

RegisterWrEn FileRegWr

MemWr MemToReg

Zero

SignE

xtndSign

Extnd

WrEn AddrDin Dout

DataMemory

WrEn AddrDin Dout

DataMemory

InstructionFetchUnit

InstructionFetchUnit

Rs Rt Rs Rt Rd Imm16

imm16

Instructions[31:0]

[25:21]

[20:16]

[15:11]

[15:0]

BranchJump

ALUSrc

RegDst

Rd Rt

ALUcntrl

Aw Aa Ab DaDw Db

RegisterWrEn File

Aw Aa Ab DaDw Db

RegisterWrEn FileRegWr

MemWr MemToReg

Zero

Sign

Extn

d

PC

Addr[31:2] Addr[1:0] Instruction

Memory

Con

catenate

Adder

Instr[31:0]Jump

“00”PC[31:28]

Target Instr[25:0]

imm16

“1”

Zero

Branch

Cin

“0” Sign

Extn

dS

ignE

xtnd

PC

PC

Addr[31:2] Addr[1:0] Instruction

Memory

Addr[31:2] Addr[1:0] Instruction

Memory

Con

catenate

Con

catenate

Adder

Adder

Instr[31:0]Jump

“00”PC[31:28]

Target Instr[25:0]

imm16

“1”

Zero

Branch

Cin

“0”

RegDst

ALUSrc

MemToReg

RegWr

MemWr

Branch

Jump

ALUCntrl

Page 21: Review Problem 0

21

Review Problem 20

Design the PC for a multi-cycle CPU from basic components (AND, OR, INV, MUX, DFF, etc.)

Page 22: Review Problem 0

22

Review Problem 21

Show the RTL and datapath to support jr $rs (jump to address in register $rs)

Sign

Extn

d

PC

<<

2

MD

R

AL

UO

utB

AWrEn

Addr DoutMemory

Din

IR

[25-21]

[20-16]

[15-11]

[15-0]

Aw Ab Aa Da

Registers Dw WrEn Db

4

Page 23: Review Problem 0

23

Review Problem 22

What mods to the multi-cycle datapath are needed to support load indirect (note – 2 mem accesses): Reg[IR[20:16]] = MEM[MEM[Reg[A] + sign-extend(IR[15:0])]]

Sign

Extn

d

PC

<<

2

MD

R

AL

UO

utB

AWrEn

Addr DoutMemory

Din

IR

[25-21]

[20-16]

[15-11]

[15-0]

Aw Ab Aa Da

Registers Dw WrEn Db

4

Page 24: Review Problem 0

24

Review Problem 23 Draw the complete state diagram (Mealy Machine) for the

Mem, Reg, and IR write enables for a machine that does just R-Type and Branch instructions

WE

State Mem Reg IR

1 0 0 1

2 0 0 0

3Br 0 0 0

3Rt 0 0 0

4Rt 0 1 0

Page 25: Review Problem 0

25

Review Problem 24

WE ALU

State PC Mem Reg IR SrcA SrcB Op Dest MemIn RegIn PCSrc

1 1 0 0 1 PC 4 + X PC X ALU

2 0 0 0 0 PC <<2 + X X X X

3Br (zero) 0 0 0 A B - X X X ALUout

3Rt 0 0 0 0 A B (IR) X X X X

4Rt 0 0 1 0 X X X Rd X ALUout X

3St 0 0 0 0 A SE + X X X X

4St 0 1 0 0 X X X X ALUout X X

3Lo 0 0 0 0 A SE + X X X X

4Lo 0 0 0 0 X X X X ALUout X X

5Lo 0 0 1 0 X X X Rt X MDR X

We can decide whether SrcA=PC means SrcA is 0 or 1. How should we decide how to best convert the symbolic control values below to specific boolean values?

Page 26: Review Problem 0

26

Review Problem 25

A program executes 30% ALU, 30% Load, 20% Store, and 20% Branch. What is the CPI of our multicycle processor for this

machine?

What is the CPI of our single-cycle processor for this program?

Page 27: Review Problem 0

27

Review Problem 26

The pipelined CPU has the stage delays shown Is it better to speed up the ALU by 10ns, or the Data

Memory by 2ns?

Does you answer change for a single-cycle or multi-cycle CPU?

Register

Register

Register

Register

PC

DataMemory

Instr.Memory

RegisterFile

RegisterFile

25ns 20ns 30ns20ns 20ns

Page 28: Review Problem 0

28

Review Problem 27

If we built our register file to have two write ports (i.e. can write two registers at once) would this help our pipelined CPU?

Page 29: Review Problem 0

29

Review Problem 28

What registers are being read and written in the 5th cycle of a pipelined CPU running this code?

add $1, $2, $3nor $4, $5, $6sub $7, $8, $9slt $10, $11, $12nand $13, $14, $15

Ifetch Reg/Dec Mem WrExec

Ifetch Reg/Dec Mem WrExec

Ifetch Reg/Dec Mem WrExec

Ifetch Reg/Dec Mem WrExec

Ifetch Reg/Dec Mem WrExec

Page 30: Review Problem 0

30

Review Problem 29

Do the jump instructions (j, jr) have problems with hazards?

Page 31: Review Problem 0

31

Review Problem 30

What forwarding happens on the following code?

lw $t0, 0($t1)add $t2, $t3, $t3nor $0, $t0, $t4bne $t2, $0, ENDsub $t5, $t2, $t4

Ifetch Reg/Dec Mem WrExec

Ifetch Reg/Dec Mem WrExec

Ifetch Reg/Dec Mem WrExec

Ifetch Reg/Dec Mem WrExec

Ifetch Reg/Dec Mem WrExec

Page 32: Review Problem 0

32

Review Problem 31

What should we do to this code to run it on a CPU with delay slots?

and $t0, $t1, $t2ori $t0, $t0, 7add $t3, $t4, $t5lw $t6, 0($t3)bgez $t6, FOOj BAR

Page 33: Review Problem 0

33

Review Problem 32

Why might a compiler do this transformation?

/* Before */for (j=0; j<2000; j++)

for (i=0; i<2000; i++)x[i][j]+=1;

/* After */for (i=0; i<2000; i++)

for (j=0; j<2000; j++)x[i][j]+=1;

Page 34: Review Problem 0

34

Review Problem 33

Level Hit Time Hit Rate

L1 1 cycle 95%

L2 10 cycles 90%

Main Memory 50 cycles 99%

Disk 50,000 cycles 100%

If you can speed up any level’s hit time by a factor of two, which is the best to speed up?

Page 35: Review Problem 0

35

Review Problem 34

The length (number of blocks) in a direct mapped cache is always a power of 2. Why?

Page 36: Review Problem 0

36

Review Problem 35

For the following access pattern, what is the smallest direct mapped cache that will not use the same cache location twice?

01391741024

Page 37: Review Problem 0

37

Review Problem 36

How many total bits are requires for a direct-mapped cache with 64 KB of data and 8-byte blocks, assuming a 32-bit address?

Index bits:

Bits/block:Data:Valid:Tag:

Total size:

Page 38: Review Problem 0

38

Review Problem 37

Assume we have three caches, with four one-word blocks: Direct mapped, 2-way set assoc. (w/LRU), and fully associative

How many misses will each have on this address pattern: Byte addresses: 0, 32, 0, 24, 32

Page 39: Review Problem 0

39

Review Problem 38

Level Hit Time Hit Rate

L1

L2 10 cycles 90%

Main Memory 40 cycles 99%

Disk 4,000 cycles 100%

Which is the best L1 cache for this system? Direct Mapped: 1 cycle, 80% hit rate 2-way Set Associative: 2 cycle, 90% hit rate Fully Associative: 3 cycle, 95% hit rate

Page 40: Review Problem 0

40

Review Problem 39

Can a direct-mapped cache ever have less cache misses than a fully associative cache of the same capacity? Why/why not?

Page 41: Review Problem 0

41

Review Problem 40

Assume we have separate instruction and data L1 caches. For each feature, state which cache is most likely to have the given feature

Large blocksize

Write-back

2-cycle hit time

Page 42: Review Problem 0

42

Review Problem 41

In a system with 8-bit addresses, virtual memory, and a direct mapped cache, where can these memory accesses be found?

1 011011 01010101 1 111110 00001111 0 111001 11110000 1 111001 11111111

Valid Tag Data

01101001

01100011

11011100

10001011

1 0101 1010 1 0110 1111 0 1000 0001 1 1100 0111

Valid Tag Physical page #

Page 43: Review Problem 0

43

Review Problem 42

For a dynamic branch predictor, why is the Branch History Table a direct-mapped cache? Why not fully associative or set associative?

Page 44: Review Problem 0

44

Review Problem 43

How would various branch predictors do on the bold branch in the following code?

A 1-bit predictor will be correct ___%

A 2-bit predictor will be correct ___%

A 2-bit correlating predictor with 2-bit global branch history will be correct ___%

while (1) { if (i<2) counter++; /* Branch when I = 2 or 3 */ i=(i+1)&3; /* I counts 0,1,2,3,0,1,2,3,… */}

Page 45: Review Problem 0

45

Review Problem 44

Show the constraint graph for this code, indicating the type of hazard for each edge.

1: lw $t1, 4($a0)2: add $t2, $t1, $a03: lw $t3, 8($a1)4: sub $t4, $t3, $a25: sw $t5, 0($a3)6: beq $s0, $s1, FOO

Page 46: Review Problem 0

46

Review Problem 45

Would loop unrolling & register renaming be useful for the following code? If so, what would the resulting code look like?

while (i<400) { if (x[i]==CONST) counter++; /* Count number of CONSTs in array */ i++;}

Page 47: Review Problem 0

47

Review Problem 46

Schedule the code for a 4-way VLIW. Assume no delay slots, and all instructions in parallel with a branch/jump still execute.

ALU1 ALU2 Load/Store Branch/Jump

1: lw $t1, 4($a0)2: add $t2, $t1, $a03: lw $t3, 8($a1)4: sub $t2, $t3, $a25: sw $t5, 0($a3)6: beq $s0, $s1, FOO7: and $t7, $s0, $s1