11
Page 1 Copyright UCB & Morgan Kaufmann ECE568/Koren Part.4 .1 Adapted from UCB and other sources Israel Koren Fall 2010 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Advanced Computer Architecture ECE 568 Part 4 Control Hazards Copyright UCB & Morgan Kaufmann ECE568/Koren Part.4 .2 Adapted from UCB and other sources Control Hazard on Branches 12: beq r1,r3,24 16: and r2,r3,r5 20: or r6,r1,r7 24: add r8,r1,r9 36: xor r10,r1,r11 Reg   A   L   U DMem Ifetch Reg Reg   A   L   U DMem Ifetch Reg Reg   A   L   U DMem Ifetch Reg Reg   A   L   U DMem Ifetch Reg Branch Target  .  .  .

Part4 Branch

Embed Size (px)

Citation preview

8/8/2019 Part4 Branch

http://slidepdf.com/reader/full/part4-branch 1/11

Page 1

Copyright UCB & Morgan KaufmannECE568/Koren Part.4 .1 Adapted from UCB and other sources

Israel Koren

Fall 2010

UNIVERSITY OF MASSACHUSETTSDept. of Electrical & Computer Engineering

Advanced Computer ArchitectureECE 568

Part 4

Control Hazards

Copyright UCB & Morgan KaufmannECE568/Koren Part.4 .2 Adapted from UCB and other sources

Control Hazard on Branches

12: beq r1,r3,24

16: and r2,r3,r5

20: or r6,r1,r7

24: add r8,r1,r9

36: xor r10,r1,r11

Reg   A  L  U

DMemIfetch Reg

Reg   A  L  U

DMemIfetch Reg

Reg   A  L  U

DMemIfetch Reg

Reg  A  L  U DMemIfetch Reg

Branch Target . . .

8/8/2019 Part4 Branch

http://slidepdf.com/reader/full/part4-branch 2/11

Page 2

Copyright UCB & Morgan KaufmannECE568/Koren Part.4 .3 Adapted from UCB and other sources

MIPS pipeline (Fig.A.19)

Copyright UCB & Morgan KaufmannECE568/Koren Part.4 .4 Adapted from UCB and other sources

Control Hazard on BranchesTwo Stage Stall

12: beq r1,r3,24

16: and r2,r3,r5

20: or r6,r1,r7

36: xor r10,r1,r11

Reg   A  L  U

DMemIfetch Reg

Reg   A  L  U

DMemIfetch Reg

Reg   A  L  U

DMemIfetch Reg

Reg  A  L  U

DMemIfetch RegTarget

Freeze or Flush (higher performance butmust undo side effects)

8/8/2019 Part4 Branch

http://slidepdf.com/reader/full/part4-branch 3/11

Page 3

Copyright UCB & Morgan KaufmannECE568/Koren Part.4 .5 Adapted from UCB and other sources

Branch Stall Impact - MIPS R4000

♦Deeper pipeline -> worse penalty

  A  L  U

Instr. Memory Data Memory RegReg

cc1 cc10cc9cc8cc7cc6cc5cc4cc3cc2

  A  L  U

Instr. Memory Data Memory RegReg

  A  L  U

Instr. Memory Data Memory RegReg

  A  L  U

Instr. Memory Data Memory RegReg

  A  L  U

Instr. Memory Data Memory RegReg

Time

BEQZ

Instruction 1

Instruction 2

Instruction 3

Target

Copyright UCB & Morgan KaufmannECE568/Koren Part.4 .6 Adapted from UCB and other sources

Branch Stall Impact - MIPS R4000

♦Assume CPI = 1.0 ignoring branches & data hazards

♦Assume solution was stalling for 3 cycles for everybranch

♦ If 20% branch, Stall 3 cyclesOp Freq Cycles CPI(i)

Other 80% 1 0.8

Branch 20% 4 0.8

new CPI = 1.6 or 60% slower

8/8/2019 Part4 Branch

http://slidepdf.com/reader/full/part4-branch 4/11

Page 4

Copyright UCB & Morgan KaufmannECE568/Koren Part.4 .7 Adapted from UCB and other sources

Conditional and unconditional Branch♦Unconditional branches incur lower penalty

• Assume 2 cycles vs 3 for unconditional branch

♦Avg.Stall cycles =ΣΣΣΣ Branch_frequency x Branch_penaltyBranch_type Freq. Penalty Freq x Penalty

Unconditional 0.04 2 0.08

Conditional 0.16 3 0.48

0.56

Pipeline_speedup = 8 / 1.56 = 5.13

(ignoring all other stalls)

Pipeline_speedup =Pipeline_depth

1 + ΣΣΣΣ Branch_frequency×××× Branch_penalty

Copyright UCB & Morgan KaufmannECE568/Koren Part.4 .8 Adapted from UCB and other sources

Dealing with Branch in MIPS

♦Branch penalty is significant

♦Two part solution:• Determine branch taken or not sooner, AND

• Compute taken branch address earlier

♦MIPS branch tests if register = 0 or ≠≠≠≠ 0

♦MIPS Solution:• Move Zero test to ID/RF stage

• Adder to calculate new PC in ID/RF stage

• 1 clock cycle penalty for branch versus 2

8/8/2019 Part4 Branch

http://slidepdf.com/reader/full/part4-branch 5/11

Page 5

Copyright UCB & Morgan KaufmannECE568/Koren Part.4 .9 Adapted from UCB and other sources

Original MIPS 5-stage pipeline

MemoryAccess WriteBackInstructionFetch Instr. DecodeReg. Fetch Execute/Addr. Calc

 A L   U

 M e m o r  y

 R  e g F  i   l   e

 M U X 

 M U X 

 D  a t  a

 M e m o r  y

 M U X 

SignExtend

Zero?

 I  F  /   I  D 

 I  D  /   E  X 

 M E  M /   W B 

 E  X  /   M E  M

4

 A d  d  e r 

Next SEQ PC Next SEQ PC

RD RD RD W

  B  D  a  t  a

Next PC

 A d  d  r  e s  s 

RS1

RS2

Imm

 M U X 

Copyright UCB & Morgan KaufmannECE568/Koren Part.4 .10 Adapted from UCB and other sources

 A d  d  e r 

 I  F  /   I  D 

Modified MIPS Pipeline

MemoryAccess

WriteBack

InstructionFetch

Instr. DecodeReg. Fetch

Execute/Addr. Calc

 A L   U

 M e m o r  y

 R  e g F  i   l   e

 M U X 

 D  a t  a

 M e m o r  y

 M U X 

SignExtend

Zero?

 M E  M /   W B 

 E  X  /   M E  M

4

 A d  d  e r 

NextSEQPC

RD RD RD W  B  D  a  t  a

Next PC

 A d  d  r  e s  s 

RS1

RS2

Imm

 M U X 

 I  D  /   E  X 

8/8/2019 Part4 Branch

http://slidepdf.com/reader/full/part4-branch 6/11

Page 6

Copyright UCB & Morgan KaufmannECE568/Koren Part.4 .11 Adapted from UCB and other sources

Four Branch Hazard Alternatives – 1 & 2

 #1: Stall until branch direction is clear

 #2: Static Prediction:

(a) Predict Branch Not Taken• Execute successor instructions in sequence

• “Squash” instructions in pipeline if branch actually taken

• 47% MIPS branches not taken on average

(b): Predict Branch Taken• 53% MIPS branches taken on average

• But haven’t calculated yet branch target address

» MIPS still incurs 1 cycle branch penalty

» Other machines: branch target known before outcome

Copyright UCB & Morgan KaufmannECE568/Koren Part.4 .12 Adapted from UCB and other sources

♦ Predict: guess Not Taken then back up if wrong♦ Impact: 0 lost cycles per branch instruction if

right, 1 if wrong (right about 50% of time)• Need to “Squash” following instruction (LOAD) if wrong

Static Prediction - Not Taken

I n s t r.

O r d e 

Time (clock cycles) 

Reg A  L  U

DMemIfetch Reg

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 6 Cycle 7Cycle 5

ADD

BEQ

LOAD

Reg   A  L  U

DMemIfetch Reg

Reg   A  L  U

DMem RegIfetch

8/8/2019 Part4 Branch

http://slidepdf.com/reader/full/part4-branch 7/11

Page 7

Copyright UCB & Morgan KaufmannECE568/Koren Part.4 .13 Adapted from UCB and other sources

♦CPI for branch: (1 *.47 + 2 * .53) = 1.53

♦Total CPI might be: 1.53 * .2 + 1 * .8 = 1.106(20% branch)♦Compare to Always Stall (Freeze)

Predict Not Taken

Copyright UCB & Morgan KaufmannECE568/Koren Part.4 .14 Adapted from UCB and other sources

MIPS R4000Predict Taken - 2 (vs. 3) stall Cycles

cc1 cc10cc9cc8cc7cc6cc5cc4cc3cc2

  A  L  U

Instr. Memory Data Memory RegReg

  A  L  U

Instr. Memory Data Memory RegReg

  A  L  U

Instr. Memory Data Memory RegReg

  A  L  U

Instr. Memory Data Memory RegReg

Time

BEQZ

Instruction 1

Instruction 2

Instruction 3

Target

  A  L  U

Instr. Memory Data Memory RegReg

8/8/2019 Part4 Branch

http://slidepdf.com/reader/full/part4-branch 8/11

Page 8

Copyright UCB & Morgan KaufmannECE568/Koren Part.4 .15 Adapted from UCB and other sources

CPI Penalty - MIPS R4000

CPI_Stall = 1.56, CPI_P/T = 1.46, CPI_P/UT = 1.38

Speedup_(P/UT vs. Stall) = 1.56/1.38 = 1.13

Stall

Copyright UCB & Morgan KaufmannECE568/Koren Part.4 .16 Adapted from UCB and other sources

Four Branch Hazard Alternatives – 3 & 4

 #3: Delayed Branch• Define branch to take place AFTER a following instruction(s)

 branch instruction

sequential successor1sequential successor2........

sequential successorn

 branch target if taken

• 1 slot delay allows proper decision and branch target addressin 5 stage pipeline

• MIPS uses this

 #4: Dynamic Branch Prediction

Branch delay of length n 

 . . .

8/8/2019 Part4 Branch

http://slidepdf.com/reader/full/part4-branch 9/11

Page 9

Copyright UCB & Morgan KaufmannECE568/Koren Part.4 .17 Adapted from UCB and other sources

♦ Impact: 0 clock cycles per branch instruction if canfind useful instruction to put in “slot” (≈≈≈≈ 50% of time)

MIPS - One Delayed Branch Slot

I n s t r.

O r d e r 

Time (clock cycles) 

Reg   A  L  U

DMemIfetch Reg

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 6 Cycle 7Cycle 5

ADD

BEQ

Misc

LOAD

Reg   A  L  U

DMemIfetch Reg

Reg   A  L  U

DMemIfetch Reg

Reg   A  L  U

DMem RegIfetch

Copyright UCB & Morgan KaufmannECE568/Koren Part.4 .18 Adapted from UCB and other sources

Scheduling the branch delay slot

DADD R1,R2,R3

If R2=0 then

Delay Slot

If R2=0 then

DADD R1,R2,R3

Delay Slot

DSUB R4,R5,R6

DSUB R4,R5,R6

DSUB R4,R5,R6

If R1=0 then

If R1=0 then

DADD R1,R2,R3

DADD R1,R2,R3

⇓⇓⇓⇓ ⇓⇓⇓⇓ ⇓⇓⇓⇓

DADD R1,R2,R3

DADD R1,R2,R3

DSUB R4,R5,R6

DSUB R4,R5,R6

If R1=0 then

If R1=0 then

OR R7,R8,R9

Delay Slot

OR R7,R8,R9

From before From target From fall-through

::::

::::

8/8/2019 Part4 Branch

http://slidepdf.com/reader/full/part4-branch 10/11

Page 10

Copyright UCB & Morgan KaufmannECE568/Koren Part.4 .19 Adapted from UCB and other sources

Delayed Branch -

♦Where to get instructions to fill branch delay slot?• Before branch instruction• From the target address: only valuable when branch taken• From fall through: only valuable when branch not taken• Canceling branches allow more slots to be filled

♦Compiler effectiveness for single branch delay slot:• Fills about 60% of branch delay slots• About 80% of instructions executed in branch delay slots useful

in computation• About 48% (60% x 80%) of slots usefully filled

♦CPI=1+Prob{Branch}*Prob{Un-usefull_fill}=1+.2*.52=1.104

♦Delayed Branch downside: 8-10 stage pipelines,multiple instructions issued per clock (superscalar)

Copyright UCB & Morgan KaufmannECE568/Koren Part.4 .20 Adapted from UCB and other sources

Evaluating Branch Alternativesfor MIPS

Scheduling Branch CPI speedup v.scheme penalty stall  

Stall pipeline 1 1.2 1.0

Predict taken 1 1.2 1.0Predict not taken 0/1 1.14 1.05

Delayed branch 0.52 1.104 1.087

UnCond: 4%, NotTaken_Cond: 6%, Taken_Cond: 10%

Slot filled usefully: 48%

CPI {Delayed_Branch}=1+.2*.52=1.104

8/8/2019 Part4 Branch

http://slidepdf.com/reader/full/part4-branch 11/11

Page 11

Copyright UCB & Morgan KaufmannECE568/Koren Part.4 .21 Adapted from UCB and other sources

MIPS R4000 - Delayed Branch and

Static Prediction (Not-Taken)

(2) Taken

(1) Un-Taken