40
Randal E. Bryant Carnegie Mellon University CS:APP CS:APP Chapter 4 Computer Architecture Pipelined Implementation Part II CS:APP Chapter 4 CS:APP Chapter 4 Computer Architecture Computer Architecture Pipelined Pipelined Implementation Implementation Part II Part II http://csapp.cs.cmu.edu

CS:APP Chapter 4 Computer Architecture Pipelined ...raymond.namyst.emi.u-bordeaux.fr/ens/archi-enseirb... · – 2 – CS:APP Overview Make the pipelined processor work! Data Hazards

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: CS:APP Chapter 4 Computer Architecture Pipelined ...raymond.namyst.emi.u-bordeaux.fr/ens/archi-enseirb... · – 2 – CS:APP Overview Make the pipelined processor work! Data Hazards

Randal E. Bryant

Carnegie Mellon University

CS:APP

CS:APP Chapter 4Computer Architecture

PipelinedImplementation

Part II

CS:APP Chapter 4CS:APP Chapter 4Computer ArchitectureComputer Architecture

PipelinedPipelinedImplementationImplementation

Part IIPart II

http://csapp.cs.cmu.edu

Page 2: CS:APP Chapter 4 Computer Architecture Pipelined ...raymond.namyst.emi.u-bordeaux.fr/ens/archi-enseirb... · – 2 – CS:APP Overview Make the pipelined processor work! Data Hazards

– 2 – CS:APP

OverviewOverview

Make the pipelined processor work!

Data HazardsData Hazardsn Instruction having register R as source follows shortly after

instruction having register R as destination

n Common condition, don’t want to slow down pipeline

Control HazardsControl Hazardsn Mispredict conditional branch

l Our design predicts all branches as being ta kenl Naïve pipeline ex ecutes two extra instructio ns

n Getting return address for ret instructionl Naïve pipeline ex ecutes three extra ins tructions

Making Sure It Really WorksMaking Sure It Really Worksn What if multiple special cases happen simultaneously ?

Page 3: CS:APP Chapter 4 Computer Architecture Pipelined ...raymond.namyst.emi.u-bordeaux.fr/ens/archi-enseirb... · – 2 – CS:APP Overview Make the pipelined processor work! Data Hazards

– 3 – CS:APP

Pipeline StagesPipeline Stages

FetchFetchn Select current PC

n Read instruction

n Compute incremented PC

DecodeDecoden Read program registers

ExecuteExecuten Operate ALU

MemoryMemoryn Read or write data memory

Write BackWrite Backn Update register file

PCincrement

PCincrement

CCCCALUALU

Datamemory

Datamemory

Fetch

Decode

Execute

Memory

Wri te back

Registerfile

Registerfile

A BM

E

Registerfile

Registerfile

A BM

E

valP

d_srcA, d_srcB

valA, valB

aluA, aluB

Bch valE

Addr, Data

valM

PC

W_valE, W_valM, W_dstE, W_dstMW_icode, W_valM

icode, ifun,rA, rB, valC

E

M

W

F

D

valP

f_PC

predPC

Instructionmemory

Instructionmemory

M_icode, M_Bch, M_valA

Page 4: CS:APP Chapter 4 Computer Architecture Pipelined ...raymond.namyst.emi.u-bordeaux.fr/ens/archi-enseirb... · – 2 – CS:APP Overview Make the pipelined processor work! Data Hazards

– 4 – CS:APP

PIPE- HardwarePIPE- Hardware

n Pipeline registers holdintermediate valuesfrom instructionexecution

Forward (Upward) PathsForward (Upward) Pathsn Values passed from one

stage to next

n Cannot jump paststagesl e.g., valC passes

through decode

E

M

W

F

D

Instructionmemory

Instructionmemory

PCincrement

PCincrement

Registerfile

Registerfile

ALUALU

Datamemory

Datamemory

SelectPC

rB

dstE dstMSelectA

ALUA

ALUB

Mem.control

Addr

srcA srcB

read

write

ALUfun.

Fetch

Decod e

Execut e

Memory

Wri te back

icode

data out

data in

A BM

E

M_valA

W_valM

W_valE

M_valA

W_valM

d_rvalA

f_PC

PredictPC

valE valM dstE dstM

Bchicode valE valA dstE dstM

icode ifun valC valA valB dstE dstM srcA srcB

valC valPicode ifun rA

predPC

CCCC

d_srcBd_srcA

e_Bch

M_Bch

Page 5: CS:APP Chapter 4 Computer Architecture Pipelined ...raymond.namyst.emi.u-bordeaux.fr/ens/archi-enseirb... · – 2 – CS:APP Overview Make the pipelined processor work! Data Hazards

– 5 – CS:APP

Data Dependencies: 2 Nop’sData Dependencies: 2 Nop’s

0x000: irmovl $10,%edx

1 2 3 4 5 6 7 8 9

F D E M WF D E M W0x006: irmovl $3,%eax F D E M WF D E M W0x00c: nop F D E M WF D E M W0x00d: nop F D E M WF D E M W0x00e: addl %edx,%eax F D E M WF D E M W0x010: halt F D E M WF D E M W

10# demo-h2.ys

W

R[%eax] 3

D

valA R[%edx] = 10valB R[%eax] = 0

•••

W

R[%eax] 3

W

R[%eax] 3

D

valA R[%edx] = 10valB R[%eax] = 0

D

valA R[%edx] = 10valB R[%eax] = 0

•••

Cycle 6

Error

Page 6: CS:APP Chapter 4 Computer Architecture Pipelined ...raymond.namyst.emi.u-bordeaux.fr/ens/archi-enseirb... · – 2 – CS:APP Overview Make the pipelined processor work! Data Hazards

– 6 – CS:APP

Data Dependencies: No NopData Dependencies: No Nop

0x000: irmovl $10,%edx

1 2 3 4 5 6 7 8

F D E M

W0x006: irmovl $3,%eax F D E M

W

F D E M W0x00c: addl %edx,%eax

F D E M W0x00e: halt

# demo-h0.ys

E

D

valA R[%edx] = 0valB R[%eax] = 0

D

valA R[%edx] = 0valB R[%eax] = 0

Cycle 4

Error

MM_valE = 10M_dstE = %edx

e_valE 0 + 3 = 3 E_dstE = %eax

Page 7: CS:APP Chapter 4 Computer Architecture Pipelined ...raymond.namyst.emi.u-bordeaux.fr/ens/archi-enseirb... · – 2 – CS:APP Overview Make the pipelined processor work! Data Hazards

– 7 – CS:APP

Stalling for Data DependenciesStalling for Data Dependencies

n If instruction follows too closely after one that writes register,slow it down

n Hold instruction in decode

n Dynamically inject nop into execute stage

0x000: irmovl $10,%edx

1 2 3 4 5 6 7 8 9

F D E M W0x006: irmovl $3,%eax F D E M W0x00c: nop F D E M W

bubble

F

E M W0x00e: addl %edx,%eax D D E M W0x010: halt F D E M W

10# demo-h2.ys

F

F D E M W0x00d: nop

11

Page 8: CS:APP Chapter 4 Computer Architecture Pipelined ...raymond.namyst.emi.u-bordeaux.fr/ens/archi-enseirb... · – 2 – CS:APP Overview Make the pipelined processor work! Data Hazards

– 8 – CS:APP

Stall ConditionStall Condition

Source RegistersSource Registersn srcA and srcB of current

instruction in decodestage

Destination RegistersDestination Registersn dstE and dstM fields

n Instructions in execute,memory, and write-backstages

Special CaseSpecial Casen Don’t stall for register ID

8l Indicates absenc e of

register operand

E

M

W

F

D

Instructionmemory

Instructionmemory

PCincrement

PCincrement

Registerfile

Registerfile

ALUALU

DatamemoryData

memory

SelectPC

rB

dstE dstMSelectA

ALUA

ALUB

Mem.control

Addr

srcA srcB

read

write

ALUfun.

Fetch

Decode

Execute

Memory

Write back

icode

data out

data in

A BM

E

M_valA

W_valM

W_valE

M_valA

W_valM

d_rvalA

f_PC

PredictPC

valE valM dstE dstM

Bchicode valE valA dstE dstM

icode ifun valC valA valB dstE dstM srcA srcB

valC valPicode ifun rA

predPC

CCCC

d_srcBd_srcA

e_Bch

M_Bch

Page 9: CS:APP Chapter 4 Computer Architecture Pipelined ...raymond.namyst.emi.u-bordeaux.fr/ens/archi-enseirb... · – 2 – CS:APP Overview Make the pipelined processor work! Data Hazards

– 9 – CS:APP

Detecting Stall ConditionDetecting Stall Condition

0x000: irmovl $10,%edx

1 2 3 4 5 6 7 8 9

F D E M W0x006: irmovl $3,%eax F D E M W0x00c: nop F D E M W

bubble

F

E M W0x00e: addl %edx,%eax D D E M W0x010: halt F D E M W

10# demo-h2.ys

F

F D E M W0x00d: nop

11

Cycle 6

W

D

•••

W_dstE = %eaxW_valE = 3

srcA = %edxsrcB = %eax

Page 10: CS:APP Chapter 4 Computer Architecture Pipelined ...raymond.namyst.emi.u-bordeaux.fr/ens/archi-enseirb... · – 2 – CS:APP Overview Make the pipelined processor work! Data Hazards

– 10 – CS:APP

Stalling X3Stalling X3

0x000: irmovl $10,%edx

1 2 3 4 5 6 7 8 9

F D E M W0x006: irmovl $3,%eax F D E M W bubble

F

E M W bubble

D

E M W

0x00c: addl %edx,%eax D D E M W0x00e: halt F D E M W

10# demo-h0.ys

F F

D

F

E M W bubble

11

Cycle 4 •••

WW_dstE = %eax

DsrcA = %edxsrcB = %eax

•••

MM_dstE = %eax

DsrcA = %edxsrcB = %eax

EE_dstE = %eax

DsrcA = %edxsrcB = %eax

Cycle 5

Cycle 6

Page 11: CS:APP Chapter 4 Computer Architecture Pipelined ...raymond.namyst.emi.u-bordeaux.fr/ens/archi-enseirb... · – 2 – CS:APP Overview Make the pipelined processor work! Data Hazards

– 11 – CS:APP

What Happens When Stalling?What Happens When Stalling?

n Stalling instruction held back in decode stage

n Following instruction stays in fetch stage

n Bubbles injected into execute stagel Like dynamicall y generated nop’sl Move through later stages

0x000: irmovl $10,%edx

0x006: irmovl $3,%eax

0x00c: addl %edx,%eax

Cycle 4

0x00e: halt

0x000: irmovl $10,%edx

0x006: irmovl $3,%eax

0x00c: addl %edx,%eax

# demo-h0.ys

0x00e: halt

0x000: irmovl $10,%edx

0x006: irmovl $3,%eax

bubble

0x00c: addl %edx,%eax

Cycle 5

0x00e: halt

0x006: irmovl $3,%eax

bubble

0x00c: addl %edx,%eax

bubble

Cycle 6

0x00e: halt

bubble

bubble

0x00c: addl %edx,%eax

bubble

Cycle 7

0x00e: halt

bubble

bubble

Cycle 8

0x00c: addl %edx,%eax

0x00e: halt

Write BackMemoryExecuteDecode

Fetch

Page 12: CS:APP Chapter 4 Computer Architecture Pipelined ...raymond.namyst.emi.u-bordeaux.fr/ens/archi-enseirb... · – 2 – CS:APP Overview Make the pipelined processor work! Data Hazards

– 12 – CS:APP

Implementing StallingImplementing Stalling

Pipeline ControlPipeline Controln Combinational logic detects stall condition

n Sets mode signals for how pipeline registers should update

E

M

W

F

D rB

srcA

srcB

icode valE valM dstE dstM

Bchicode valE valA dstE dstM

icode ifun valC valA valB dstE dstM srcA srcB

valC valPicode ifun rA

predPC

d_srcB

d_srcA

D_icode

E_dstE

E_dstM

Pipecontrollogic

D_stall

E_bubble

M_dstE

M_dstM

W_dstE

W_dstM

F_stall

Page 13: CS:APP Chapter 4 Computer Architecture Pipelined ...raymond.namyst.emi.u-bordeaux.fr/ens/archi-enseirb... · – 2 – CS:APP Overview Make the pipelined processor work! Data Hazards

– 13 – CS:APP

Pipeline Register ModesPipeline Register ModesRisingclockRisingclock Output = y

yy

RisingclockRisingclock Output = x

xx

xxnop

RisingclockRisingclock Output = nop

Output = xInput = y

stall = 0

bubble= 0

xxNormal

Output = xInput = y

stall = 1

bubble= 0

xxStall

Output = xInput = y

stall = 0

bubble= 1

Bubble

Page 14: CS:APP Chapter 4 Computer Architecture Pipelined ...raymond.namyst.emi.u-bordeaux.fr/ens/archi-enseirb... · – 2 – CS:APP Overview Make the pipelined processor work! Data Hazards

– 14 – CS:APP

Data ForwardingData Forwarding

Naïve PipelineNaïve Pipelinen Register isn’t written until completion of write-back stage

n Source operands read from re gister file in decode stagel Needs to be in registe r file at start of stage

ObservationObservationn Value generated in execute or memor y stage

TrickTrickn Pass value directly from genera ting instruction to decode

stage

n Needs to be available at end of decode stage

Page 15: CS:APP Chapter 4 Computer Architecture Pipelined ...raymond.namyst.emi.u-bordeaux.fr/ens/archi-enseirb... · – 2 – CS:APP Overview Make the pipelined processor work! Data Hazards

– 15 – CS:APP

Data Forwarding ExampleData Forwarding Example

n irmovl in write-back stage

n Destination value inW pipeline register

n Forward as valB fordecode stage

0x000: irmovl $10,%edx

1 2 3 4 5 6 7 8 9

F D E M WF D E M W0x006: irmovl $3,%eax F D E M WF D E M W0x00c: nop F D E M WF D E M W0x00d: nop F D E M WF D E M W0x00e: addl %edx,%eax F D E M WF D E M W0x010: halt F D E M WF D E M W

10# demo-h2.ys

Cycle 6

W

R[%eax] �

3

D

valA�

R[%edx] = 10valB

�W_valE = 3

•••

W_dstE = %eaxW_valE = 3

srcA = %edxsrcB = %eax

Page 16: CS:APP Chapter 4 Computer Architecture Pipelined ...raymond.namyst.emi.u-bordeaux.fr/ens/archi-enseirb... · – 2 – CS:APP Overview Make the pipelined processor work! Data Hazards

– 16 – CS:APP

Bypass PathsBypass PathsDecode StageDecode Stage

n Forwarding logic selectsvalA and valB

n Normally from registerfile

n Forwarding: get valA orvalB from later pipelinestage

Forwarding SourcesForwarding Sourcesn Execute: valE

n Memory: valE , valM

n Write back: valE , valM

PCincrement

PCincrement

CCCCALUALU

Datamemory

Datamemory

Fetch

Decode

Execute

Memory

Write back

Registerfile

Registerfile

A BM

E

Registerfile

Registerfile

A BM

E

valP

d_srcA, d_srcB

valA, valB

Bche_valE

Addr, Data

m_valM

PC

W_valE, W_valM, W_dstE, W_dstMW_icode, W_valM

icode, ifun,rA, rB, valC

E

M

W

F

D

valP

f_PC

predPC

Instructionmemory

Instructionmemory

M_icode, M_Bch, M_valA

M_valE

W_valEW_valM

E_valA, E_valB, E_srcA, E_srcB

Forward

Page 17: CS:APP Chapter 4 Computer Architecture Pipelined ...raymond.namyst.emi.u-bordeaux.fr/ens/archi-enseirb... · – 2 – CS:APP Overview Make the pipelined processor work! Data Hazards

– 17 – CS:APP

Data Forwarding Example #2Data Forwarding Example #2

Register Register %%edxedxn Generated by ALU

during previous cycl e

n Forward from memoryas valA

Register Register %%eaxeaxn Value just generated

by ALU

n Forward from executeas valB

0x000: irmovl $10,%edx

1 2 3 4 5 6 7 8

F D E M

W0x006: irmovl $3,%eax F D E M

W

F D E M W0x00c: addl %edx,%eax

F D E M W0x00e: halt

# demo-h0.ys

Cycle 4

M

D

valA f M_valE = 10valB f e_valE = 3

M_dstE = %edxM_valE = 10

srcA = %edxsrcB = %eax

E

E_dstE = %eaxe_valE f 0 + 3 = 3

Page 18: CS:APP Chapter 4 Computer Architecture Pipelined ...raymond.namyst.emi.u-bordeaux.fr/ens/archi-enseirb... · – 2 – CS:APP Overview Make the pipelined processor work! Data Hazards

– 18 – CS:APP

ImplementingForwarding

ImplementingForwarding

n Add additional feedbackpaths from E, M, and Wpipeline registers intodecode stage

n Create logic blocks toselect from multiplesources for valA and valBin decode stage

M

F

D

Instructionmemory

Instructionmemory

PCincrement

PCincrement

Registerfile

Registerfile

CCCC ALUALU

Datamemory

Datamemory

SelectPC

rB

dstE dstM

ALUA

ALUB

Mem.control

Addr

srcA srcB

read

write

ALUfun.

Fetch

Decode

Execute

Memory

Write back

data out

data in

A BM

E

M_valA

W_valE

W_valM

W_valE

M_valA

W_valM

f_PC

PredictPC

Bchicode valE valA dstE dstM

E icode ifun valC valA valB dstE dstM srcA srcB

valC valPicode ifun rA

predPC

d_srcBd_srcA

e_Bch

M_Bch

Sel+FwdA

FwdB

W icode valE valM dstE dstM

m_valM

W_valM

M_valE

e_valE

Page 19: CS:APP Chapter 4 Computer Architecture Pipelined ...raymond.namyst.emi.u-bordeaux.fr/ens/archi-enseirb... · – 2 – CS:APP Overview Make the pipelined processor work! Data Hazards

– 19 – CS:APP

Implementing ForwardingImplementing Forwarding

M

F

D

Instructionmemory

Instructionmemory

PCincrement

PCincrement

Registerfile

Registerfile

CCCC ALUALU

Datamemory

Datamemory

SelectPC

rB

dstE dstM

ALUA

ALUB

Mem.control

Addr

srcA srcB

read

write

ALUfun.

Fetch

Decode

Execute

Memory

Write back

data out

data in

A BM

E

M_valA

W_valE

W_valM

W_valE

M_valA

W_valM

f_PC

PredictPC

Bchicode valE valA dstE dstM

E icode ifun valC valA valB dstE dstM srcA srcB

valC valPicode ifun rA

predPC

d_srcBd_srcA

e_Bch

M_Bch

Sel+FwdA

FwdB

W icode valE valM dstE dstM

m_valM

W_valM

M_valE

e_valE

## What should be the A value?int new_E_valA = [ # Use incremented PC

D_icode in { ICALL, IJXX } : D_valP; # Forward valE from execute

d_srcA == E_dstE : e_valE; # Forward valM from memory

d_srcA == M_dstM : m_valM; # Forward valE from memory

d_srcA == M_dstE : M_valE; # Forward valM from write back d_srcA == W_dstM : W_valM; # Forward valE from write back

d_srcA == W_dstE : W_valE; # Use value read from register file 1 : d_rvalA;];

Page 20: CS:APP Chapter 4 Computer Architecture Pipelined ...raymond.namyst.emi.u-bordeaux.fr/ens/archi-enseirb... · – 2 – CS:APP Overview Make the pipelined processor work! Data Hazards

– 20 – CS:APP

Limitation of ForwardingLimitation of Forwarding

Load-use dependencyLoad-use dependencyn Value needed by end of

decode stage in cycle 7

n Value read from memory inmemory stage of cycle 8

0x000: irmovl $128,%edx

1 2 3 4 5 6 7 8 9

F D E M

W0x006: irmovl $3,%ecx F D E M

W

0x00c: rmmovl %ecx, 0(%edx) F D E M W0x012: irmovl $10,%ebx F D E M W0x018: mrmovl 0(%edx),%eax # Load %eax F D E M W

# demo-luh.ys

0x01e: addl %ebx,%eax # Use %eax0x020: halt

F D E M W

F D E M W

10

F D E M W

11

Error

MM_dstM = %eaxm_valM f M[128] = 3

Cycle 7 Cycle 8

D

valA f M_valE = 10valB f R[%eax] = 0

D

valA f M_valE = 10valB f R[%eax] = 0

MM_dstE = %ebxM_valE = 10

•••

Page 21: CS:APP Chapter 4 Computer Architecture Pipelined ...raymond.namyst.emi.u-bordeaux.fr/ens/archi-enseirb... · – 2 – CS:APP Overview Make the pipelined processor work! Data Hazards

– 21 – CS:APP

Avoiding Load/Use HazardAvoiding Load/Use Hazard

n Stall using instruction forone cycle

n Can then pick up loadedvalue by forwarding frommemory stage

0x000: irmovl $128,%edx

1 2 3 4 5 6 7 8 9

F D E M

W

F D E M

W0x006: irmovl $3,%ecx F D E M

W

F D E M

W

0x00c: rmmovl %ecx, 0(%edx) F D E M WF D E M W0x012: irmovl $10,%ebx F D E M WF D E M W0x018: mrmovl 0(%edx),%eax # Load %eax F D E M WF D E M W

# demo-luh.ys

0x01e: addl %ebx,%eax # Use %eax0x020: halt

F D E M W

E M W

10

D D E M W

11

bubble

F D E M W

F

F

12

MM_dstM = %eaxm_valM f M[128] = 3

MM_dstM = %eaxm_valM f M[128] = 3

Cycle 8

D

valA f W_valE = 10valB f m_valM = 3

D

valA f W_valE = 10valB f m_valM = 3

WW_dstE = %ebxW_valE = 10

WW_dstE = %ebxW_valE = 10

•••

Page 22: CS:APP Chapter 4 Computer Architecture Pipelined ...raymond.namyst.emi.u-bordeaux.fr/ens/archi-enseirb... · – 2 – CS:APP Overview Make the pipelined processor work! Data Hazards

– 22 – CS:APP

Detecting Load/Use HazardDetecting Load/Use HazardM

F

D

Instructionmemory

Instructionmemory

PCincrement

PCincrement

Registerfile

Registerfile

CCCC ALUALU

Datamemory

Datamemory

SelectPC

rB

dstE dstM

ALUA

ALUB

Mem.control

Addr

srcA srcB

read

write

ALUfun.

Fetch

Decode

Execute

Memory

Write back

data out

data in

A BM

E

M_valA

W_valE

W_valM

W_valE

M_valA

W_valM

f_PC

PredictPC

Bchicode valE valA dstE dstM

E icode ifun valC valA valB dstE dstM srcA srcB

valC valPicode ifun rA

predPC

d_srcBd_srcA

e_Bch

M_Bch

Sel+FwdA

FwdB

W icode valE valM dstE dstM

m_valM

W_valM

M_valE

e_valE

M

F

D

Instructionmemory

Instructionmemory

PCincrement

PCincrement

Registerfile

Registerfile

CCCC ALUALU

Datamemory

Datamemory

SelectPC

rB

dstE dstM

ALUA

ALUB

Mem.control

Addr

srcA srcB

read

write

ALUfun.

Fetch

Decode

Execute

Memory

Write back

data out

data in

A BM

E

M_valA

W_valE

W_valM

W_valE

M_valA

W_valM

f_PC

PredictPC

Bchicode valE valA dstE dstM

E icode ifun valC valA valB dstE dstM srcA srcB

valC valPicode ifun rA

predPC

d_srcBd_srcA

e_Bch

M_Bch

Sel+FwdA

FwdB

W icode valE valM dstE dstM

m_valM

W_valM

M_valE

e_valE

m_valM

W_valM

M_valE

e_valE

E_E_icode icode in { IMRMOVL, IPOPL } &&in { IMRMOVL, IPOPL } &&E_E_dstM dstM in { d_in { d_ srcAsrcA , d_, d_srcB srcB }}

Load/Use HazardLoad/Use Hazard

TriggerTriggerConditionCondition

Page 23: CS:APP Chapter 4 Computer Architecture Pipelined ...raymond.namyst.emi.u-bordeaux.fr/ens/archi-enseirb... · – 2 – CS:APP Overview Make the pipelined processor work! Data Hazards

– 23 – CS:APP

Control for Load/Use HazardControl for Load/Use Hazard

n Stall instructions in fetchand decode stages

n Inject bubble into executestage

0x000: irmovl $128,%edx

1 2 3 4 5 6 7 8 9

F D E M

W

F D E M

W0x006: irmovl $3,%ecx F D E M

W

F D E M

W

0x00c: rmmovl %ecx, 0(%edx) F D E M WF D E M W0x012: irmovl $10,%ebx F D E M WF D E M W

0x018: mrmovl 0(%edx),%eax # Load %eax F D E M WF D E M W

# demo-luh.ys

0x01e: addl %ebx,%eax # Use %eax0x020: halt

F D E M W

E M W

10

D D E M W

11

bubble

F D E M W

F

F

12

stallstall

FF

stallstall

DD

bubblebubble

EE

normalnormal

MM

normalnormal

WW

Load/Use HazardLoad/Use Hazard

ConditionCondition

Page 24: CS:APP Chapter 4 Computer Architecture Pipelined ...raymond.namyst.emi.u-bordeaux.fr/ens/archi-enseirb... · – 2 – CS:APP Overview Make the pipelined processor work! Data Hazards

– 24 – CS:APP

Branch Misprediction ExampleBranch Misprediction Example

n Should only execute first 8 instructions

0x000: xorl %eax,%eax 0x002: jne t # Not taken 0x007: irmovl $1, %eax # Fall through 0x00d: nop 0x00e: nop 0x00f: nop 0x010: halt 0x011: t: irmovl $3, %edx # Target (Should not execute) 0x017: irmovl $4, %ecx # Should not execute 0x01d: irmovl $5, %edx # Should not execute

demo-j.ys

Page 25: CS:APP Chapter 4 Computer Architecture Pipelined ...raymond.namyst.emi.u-bordeaux.fr/ens/archi-enseirb... · – 2 – CS:APP Overview Make the pipelined processor work! Data Hazards

– 25 – CS:APP

Handling MispredictionHandling Misprediction

Predict branch as takenPredict branch as takenn Fetch 2 instructions at target

Cancel when Cancel when mispredictedmispredictedn Detect branch not-taken in execute stage

n On following cycle, replace instructions in execute anddecode by bubbles

n No side effects have occurred ye t

0x000: xorl %eax,%eax

1 2 3 4 5 6 7 8 9

F D E M WF D E M W0x002: jne target # Not taken F D E M WF D E M W

E M W

10# demo-j.ys

0x011: t: irmovl $2,%edx # Target

bubble

0x017: irmovl $3,%ebx # Target+1

F D

E M W

D

Fbubble

0x007: irmovl $1,%eax # Fall through

0x00d: nop

F D E M WF D E M W

F D E M WF D E M W

Page 26: CS:APP Chapter 4 Computer Architecture Pipelined ...raymond.namyst.emi.u-bordeaux.fr/ens/archi-enseirb... · – 2 – CS:APP Overview Make the pipelined processor work! Data Hazards

– 26 – CS:APP

Detecting Mispredicted BranchDetecting Mispredicted Branch

E_E_icode icode = IJXX & !e_Bch= IJXX & !e_BchMispredicted Mispredicted BranchBranch

TriggerTriggerConditionCondition

M

F

D

Instructionmemory

Instructionmemory

PCincrement

PCincrement

Registerfile

Registerfile

CCCC ALUALU

Datamemory

Datamemory

SelectPC

rB

dstE dstM

ALUA

ALUB

Mem.control

Addr

srcA srcB

read

write

ALUfun.

Fetch

Decode

Execute

Memory

Write back

data out

data in

A BM

E

M_valA

W_valE

W_valM

W_valE

M_valA

W_valM

f_PC

PredictPC

Bchicode valE valA dstE dstM

E icode ifun valC valA valB dstE dstM srcA srcB

valC valPicode ifun rA

predPC

d_srcBd_srcA

e_Bch

M_Bch

Sel+FwdA

FwdB

W icode valE valM dstE dstM

m_valM

W_valM

M_valE

e_valE

M

F

D

Instructionmemory

Instructionmemory

PCincrement

PCincrement

Registerfile

Registerfile

CCCC ALUALU

Datamemory

Datamemory

SelectPC

rB

dstE dstM

ALUA

ALUB

Mem.control

Addr

srcA srcB

read

write

ALUfun.

Fetch

Decode

Execute

Memory

Write back

data out

data in

A BM

E

M_valA

W_valE

W_valM

W_valE

M_valA

W_valM

f_PC

PredictPC

Bchicode valE valA dstE dstM

E icode ifun valC valA valB dstE dstM srcA srcB

valC valPicode ifun rA

predPC

d_srcBd_srcA

e_Bch

M_Bch

Sel+FwdA

FwdB

W icode valE valM dstE dstM

m_valM

W_valM

M_valE

e_valE

m_valM

W_valM

M_valE

e_valE

M

F

D

Instructionmemory

Instructionmemory

PCincrement

PCincrement

Registerfile

Registerfile

CCCC ALUALU

Datamemory

Datamemory

SelectPC

rB

dstE dstM

ALUA

ALUB

Mem.control

Addr

srcA srcB

read

write

ALUfun.

Fetch

Decode

Execute

Memory

Write back

data out

data in

A BM

E

M_valA

W_valE

W_valM

W_valE

M_valA

W_valM

f_PC

PredictPC

Bchicode valE valA dstE dstM

E icode ifun valC valA valB dstE dstM srcA srcB

valC valPicode ifun rA

predPC

d_srcBd_srcA

e_Bch

M_Bch

Sel+FwdA

FwdB

W icode valE valM dstE dstM

m_valM

W_valM

M_valE

e_valE

m_valM

W_valM

M_valE

e_valE

M

F

D

Instructionmemory

Instructionmemory

PCincrement

PCincrement

Registerfile

Registerfile

CCCC ALUALU

Datamemory

Datamemory

SelectPC

rB

dstE dstM

ALUA

ALUB

Mem.control

Addr

srcA srcB

read

write

ALUfun.

Fetch

Decode

Execute

Memory

Write back

data out

data in

A BM

E

M_valA

W_valE

W_valM

W_valE

M_valA

W_valM

f_PC

PredictPC

Bchicode valE valA dstE dstM

E icode ifun valC valA valB dstE dstM srcA srcB

valC valPicode ifun rA

predPC

d_srcBd_srcA

e_Bch

M_Bch

Sel+FwdA

FwdB

W icode valE valM dstE dstM

m_valM

W_valM

M_valE

e_valE

m_valM

W_valM

M_valE

e_valE

m_valM

W_valM

M_valE

e_valE

m_valM

W_valM

M_valE

e_valE

Page 27: CS:APP Chapter 4 Computer Architecture Pipelined ...raymond.namyst.emi.u-bordeaux.fr/ens/archi-enseirb... · – 2 – CS:APP Overview Make the pipelined processor work! Data Hazards

– 27 – CS:APP

Control for MispredictionControl for Misprediction

0x000: xorl %eax,%eax

1 2 3 4 5 6 7 8 9

F D E M WF D E M W0x002: jne target # Not taken F D E M WF D E M W

E M W

10# demo-j.ys

0x011: t: irmovl $2,%edx # Target

bubble

0x017: irmovl $3,%ebx # Target+1

F D

E M W

D

Fbubble

0x007: irmovl $1,%eax # Fall through

0x00d: nop

F D E M WF D E M W

F D E M WF D E M W

normalnormal

FF

bubblebubble

DD

bubblebubble

EE

normalnormal

MM

normalnormal

WW

Mispredicted Mispredicted BranchBranch

ConditionCondition

Page 28: CS:APP Chapter 4 Computer Architecture Pipelined ...raymond.namyst.emi.u-bordeaux.fr/ens/archi-enseirb... · – 2 – CS:APP Overview Make the pipelined processor work! Data Hazards

– 28 – CS:APP

0x000: irmovl Stack,%esp # Initialize stack pointer 0x006: call p # Procedure call 0x00b: irmovl $5,%esi # Return point 0x011: halt 0x020: .pos 0x20 0x020: p: irmovl $-1,%edi # procedure 0x026: ret 0x027: irmovl $1,%eax # Should not be executed 0x02d: irmovl $2,%ecx # Should not be executed 0x033: irmovl $3,%edx # Should not be executed 0x039: irmovl $4,%ebx # Should not be executed 0x100: .pos 0x100 0x100: Stack: # Stack: Stack pointer

Return ExampleReturn Example

n Previously executed three a dditional instructions

demo-retb.ys

Page 29: CS:APP Chapter 4 Computer Architecture Pipelined ...raymond.namyst.emi.u-bordeaux.fr/ens/archi-enseirb... · – 2 – CS:APP Overview Make the pipelined processor work! Data Hazards

– 29 – CS:APP

0x026: ret F D E M

Wbubble F D E M

W

bubble F D E M Wbubble F D E M W

0x00b: irmovl $5,%esi # Return F D E M W

# demo-retb

F D E M W

FvalC 5rB %esi

FvalC 5rB %esi

W

valM = 0x0b

W

valM = 0x0b

•••

Correct Return ExampleCorrect Return Example

n As ret passes throughpipeline, stall at fetch stagel While in decode, e xecute, and

memory stage

n Inject bubble into decodestage

n Release stall when reachwrite-back stage

Page 30: CS:APP Chapter 4 Computer Architecture Pipelined ...raymond.namyst.emi.u-bordeaux.fr/ens/archi-enseirb... · – 2 – CS:APP Overview Make the pipelined processor work! Data Hazards

– 30 – CS:APP

Detecting ReturnDetecting Return

IRET in { D_IRET in { D_ icodeicode , E_, E_icodeicode , M_, M_icode icode }}Processing retProcessing ret

TriggerTriggerConditionCondition

M

D

Registerfile

Registerfile

CCCC ALUALU

rB

dstE dstM

ALUA

ALUB

srcA srcB

ALUfun.

Decode

Execute

A BM

E

W_valM

W_valE

Bchicode valE valA dstE dstM

E icode ifun valC valA valB dstE dstM srcA srcB

valC valPicode ifun rA

d_srcBd_srcA

e_Bch

M_Bch

Sel+FwdA

FwdB

M_valE

e_valE

Page 31: CS:APP Chapter 4 Computer Architecture Pipelined ...raymond.namyst.emi.u-bordeaux.fr/ens/archi-enseirb... · – 2 – CS:APP Overview Make the pipelined processor work! Data Hazards

– 31 – CS:APP

0x026: ret F D E M

Wbubble F D E M

W

bubble F D E M Wbubble F D E M W

0x00b: irmovl $5,%esi # Return F D E M W

# demo-retb

F D E M W

Control for ReturnControl for Return

stallstall

FF

bubblebubble

DD

normalnormal

EE

normalnormal

MM

normalnormal

WW

Processing retProcessing ret

ConditionCondition

Page 32: CS:APP Chapter 4 Computer Architecture Pipelined ...raymond.namyst.emi.u-bordeaux.fr/ens/archi-enseirb... · – 2 – CS:APP Overview Make the pipelined processor work! Data Hazards

– 32 – CS:APP

Special Control CasesSpecial Control CasesDetectionDetection

Action (on next cycle)Action (on next cycle)

E_E_icode icode = IJXX & !e_Bch= IJXX & !e_BchMispredicted Mispredicted BranchBranch

E_E_icode icode in { IMRMOVL, IPOPL } &&in { IMRMOVL, IPOPL } &&E_E_dstM dstM in { d_in { d_ srcAsrcA , d_, d_srcB srcB }}

Load/Use HazardLoad/Use Hazard

IRET in { D_IRET in { D_ icodeicode , E_, E_icodeicode , M_, M_icode icode }}Processing retProcessing ret

TriggerTriggerConditionCondition

normalnormal

stallstall

stallstall

FF

bubblebubble

stallstall

bubblebubble

DD

bubblebubble

bubblebubble

normalnormal

EE

normalnormal

normalnormal

normalnormal

MM

normalnormal

normalnormal

normalnormal

WW

Mispredicted Mispredicted BranchBranch

Load/Use HazardLoad/Use Hazard

Processing retProcessing ret

ConditionCondition

Page 33: CS:APP Chapter 4 Computer Architecture Pipelined ...raymond.namyst.emi.u-bordeaux.fr/ens/archi-enseirb... · – 2 – CS:APP Overview Make the pipelined processor work! Data Hazards

– 33 – CS:APP

Implementing Pipeline ControlImplementing Pipeline Control

n Combinational logic generates pipeline control signals

n Action occurs at start of following cycle

E

M

W

F

D

CCCC

rB

srcA

srcB

icode valE valM dstE dstM

Bchicode valE valA dstE dstM

icode ifun valC valA valB dstE dstM srcA srcB

valC valPicode ifun rA

predPC

d_srcB

d_srcA

e_Bch

D_icode

E_icode

M_icode

E_dstM

Pipecontrollogic

D_bubble

D_stall

E_bubble

F_stall

Page 34: CS:APP Chapter 4 Computer Architecture Pipelined ...raymond.namyst.emi.u-bordeaux.fr/ens/archi-enseirb... · – 2 – CS:APP Overview Make the pipelined processor work! Data Hazards

– 34 – CS:APP

Initial Version of Pipeline ControlInitial Version of Pipeline Controlbool F_stall =

# Conditions for a load/use hazardE_icode in { IMRMOVL, IPOPL } && E_dstM in { d_srcA, d_srcB } ||# Stalling at fetch while ret passes through pipelineIRET in { D_icode, E_icode, M_icode };

bool D_stall =# Conditions for a load/use hazardE_icode in { IMRMOVL, IPOPL } && E_dstM in { d_srcA, d_srcB };

bool D_bubble =# Mispredicted branch(E_icode == IJXX && !e_Bch) ||# Stalling at fetch while ret passes through pipeline IRET in { D_icode, E_icode, M_icode };

bool E_bubble =# Mispredicted branch(E_icode == IJXX && !e_Bch) ||# Load/use hazardE_icode in { IMRMOVL, IPOPL } && E_dstM in { d_srcA, d_srcB};

Page 35: CS:APP Chapter 4 Computer Architecture Pipelined ...raymond.namyst.emi.u-bordeaux.fr/ens/archi-enseirb... · – 2 – CS:APP Overview Make the pipelined processor work! Data Hazards

– 35 – CS:APP

Control CombinationsControl Combinations

n Special cases that can arise on same clock cycle

Combination ACombination An Not-taken branchn ret instruction at branch target

Combination BCombination Bn Instruction that reads from memory to %espn Followed by ret instruction

LoadEUseD

M

Load/use

JXXE

D

M

Mispredict

JXXE

D

M

Mispredict

EretD

M

ret 1

retEbubbleD

M

ret 2

bubbleEbubbleD

retM

ret 3

EretD

M

ret 1

EretD

M

ret 1

retEbubbleD

M

ret 2

retEbubbleD

M

ret 2

bubbleEbubbleD

retM

ret 3

bubbleEbubbleD

retM

ret 3

Combination B

Combination A

Page 36: CS:APP Chapter 4 Computer Architecture Pipelined ...raymond.namyst.emi.u-bordeaux.fr/ens/archi-enseirb... · – 2 – CS:APP Overview Make the pipelined processor work! Data Hazards

– 36 – CS:APP

Control Combination AControl Combination A

n Should handle as mispredicted branch

n Stalls F pipeline register

n But PC selection logic will be using M_ valM anyhow

JXXE

D

M

Mispredict

JXXE

D

M

Mispredict

EretD

M

ret 1

EretD

M

ret 1

EretD

M

ret 1

Combination A

normalnormalnormalnormalbubblebubblebubblebubblestallstallCombinationCombination

normalnormal

stallstall

FF

bubblebubble

bubblebubble

DD

bubblebubble

normalnormal

EE

normalnormal

normalnormal

MM

normalnormal

normalnormal

WW

Mispredicted Mispredicted BranchBranch

Processing retProcessing ret

ConditionCondition

E

M

W

F

D

Instructionmemory

Instructionmemory

PCincrement

PCincrement

Registerfile

Registerfile

ALUALU

Datamemory

Datamemory

SelectPC

rB

dstE dstMSelectA

ALUA

ALUB

Mem.control

Addr

srcA srcB

read

write

ALUfun.

Fetch

Decode

Execute

Memory

Write back

icode

data out

data in

A BM

E

M_valA

W_valM

W_valE

M_valA

W_valM

d_rvalA

f_PC

PredictPC

valE valM dstE dstM

Bchicode valE valA dstE dstM

icode ifun valC valA valB dstE dstM srcA srcB

valC valPicode ifun rA

predPC

CCCC

d_srcBd_srcA

e_Bch

M_Bch

CCCC

d_srcBd_srcA

e_Bch

M_Bch

Page 37: CS:APP Chapter 4 Computer Architecture Pipelined ...raymond.namyst.emi.u-bordeaux.fr/ens/archi-enseirb... · – 2 – CS:APP Overview Make the pipelined processor work! Data Hazards

– 37 – CS:APP

Control Combination BControl Combination B

n Would attempt to bubble and stall pipeline register D

n Signaled by processor as pipe line error

LoadEUseD

M

Load/use

EretD

M

ret 1

EretD

M

ret 1

EretD

M

ret 1

Combination B

stallstall

stallstall

stallstall

FF

bubble +bubble +stallstall

stallstall

bubblebubble

DD

bubblebubble

bubblebubble

normalnormal

EE

normalnormal

normalnormal

normalnormal

MM

normalnormal

normalnormal

normalnormal

WW

CombinationCombination

Load/Use HazardLoad/Use Hazard

Processing retProcessing ret

ConditionCondition

Page 38: CS:APP Chapter 4 Computer Architecture Pipelined ...raymond.namyst.emi.u-bordeaux.fr/ens/archi-enseirb... · – 2 – CS:APP Overview Make the pipelined processor work! Data Hazards

– 38 – CS:APP

Handling Control Combination BHandling Control Combination B

n Load/use hazard should get priorityn ret instruction should be held in decode stage for additional

cycle

LoadEUseD

M

Load/use

EretD

M

ret 1

EretD

M

ret 1

EretD

M

ret 1

Combination B

stallstall

stallstall

stallstall

FF

stallstall

stallstall

bubblebubble

DD

bubblebubble

bubblebubble

normalnormal

EE

normalnormal

normalnormal

normalnormal

MM

normalnormal

normalnormal

normalnormal

WW

CombinationCombination

Load/Use HazardLoad/Use Hazard

Processing retProcessing ret

ConditionCondition

Page 39: CS:APP Chapter 4 Computer Architecture Pipelined ...raymond.namyst.emi.u-bordeaux.fr/ens/archi-enseirb... · – 2 – CS:APP Overview Make the pipelined processor work! Data Hazards

– 39 – CS:APP

Corrected Pipeline Control LogicCorrected Pipeline Control Logic

n Load/use hazard should get priorityn ret instruction should be held in decode stage for additional

cycle

stallstall

stallstall

stallstall

FF

stallstall

stallstall

bubblebubble

DD

bubblebubble

bubblebubble

normalnormal

EE

normalnormal

normalnormal

normalnormal

MM

normalnormal

normalnormal

normalnormal

WW

CombinationCombination

Load/Use HazardLoad/Use Hazard

Processing retProcessing ret

ConditionCondition

bool D_bubble =# Mispredicted branch(E_icode == IJXX && !e_Bch) ||# Stalling at fetch while ret passes through pipeline IRET in { D_icode, E_icode, M_icode } # but not condition for a load/use hazard && !(E_icode in { IMRMOVL, IPOPL }

&& E_dstM in { d_srcA, d_srcB });

Page 40: CS:APP Chapter 4 Computer Architecture Pipelined ...raymond.namyst.emi.u-bordeaux.fr/ens/archi-enseirb... · – 2 – CS:APP Overview Make the pipelined processor work! Data Hazards

– 40 – CS:APP

Pipeline SummaryPipeline Summary

Data HazardsData Hazardsn Most handled by forwarding

l No performance penalty

n Load/use hazard requires one cyc le stall

Control HazardsControl Hazardsn Cancel instructions when detect mispredicted branch

l Two clock cycles wasted

n Stall fetch stage while ret passes through pipelinel Three clock cycl es wasted

Control CombinationsControl Combinationsn Must analyze carefully

n First version had subtle bugl Only arises with unusual instruction combination