27
Pipelining 1

Pipelining - cseweb.ucsd.educseweb.ucsd.edu/classes/wi11/cse141/Slides/10_Pipelining.pdf · Pipelining 1.Break up the logic with latches into “pipeline stages” 2.Each stage can

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Pipelining - cseweb.ucsd.educseweb.ucsd.edu/classes/wi11/cse141/Slides/10_Pipelining.pdf · Pipelining 1.Break up the logic with latches into “pipeline stages” 2.Each stage can

Pipelining

1

Page 2: Pipelining - cseweb.ucsd.educseweb.ucsd.edu/classes/wi11/cse141/Slides/10_Pipelining.pdf · Pipelining 1.Break up the logic with latches into “pipeline stages” 2.Each stage can

Today

• Go over Quizzes 2 and 4.• Introduction to pipelining• Maybe hazards

2

Page 3: Pipelining - cseweb.ucsd.educseweb.ucsd.edu/classes/wi11/cse141/Slides/10_Pipelining.pdf · Pipelining 1.Break up the logic with latches into “pipeline stages” 2.Each stage can

Pipelining

What’s  the  latency  for  one  unit  of  work?

What’s  the  throughput?

Logic  (10ns)

Latc

Latc

Logic  (10ns)

Latc

Latc

Logic  (10ns)

Latc

Latc

10n

20n

30n

Cycle  1

Cycle  2

Cycle  3

Page 4: Pipelining - cseweb.ucsd.educseweb.ucsd.edu/classes/wi11/cse141/Slides/10_Pipelining.pdf · Pipelining 1.Break up the logic with latches into “pipeline stages” 2.Each stage can

Pipelining1.Break up the logic with latches into “pipeline stages”2.Each stage can act on different data3.Latches hold the inputs to their stage4.Every clock cycle data transfers from one pipe stage to

the next

Logic  (10ns)

Latc

Latc

Logic(2ns)

Latc

Latc

Logic(2ns)

Latc

Logic(2ns)

Latc

Logic(2ns)

Latc

Logic(2ns)

Latc

Page 5: Pipelining - cseweb.ucsd.educseweb.ucsd.edu/classes/wi11/cse141/Slides/10_Pipelining.pdf · Pipelining 1.Break up the logic with latches into “pipeline stages” 2.Each stage can

Logic Logic Logic Logic Logic

Logic Logic Logic Logic Logic

Logic Logic Logic Logic Logic

Logic Logic Logic Logic Logic

Logic Logic Logic Logic Logic

Logic Logic Logic Logic Logic

2ns

4ns

6ns

8ns

10ns

12ns

What’s the latency for one unit of work?

What’s the throughput?

cycle 1

cycle 2

cycle 3

cycle 4

cycle 5

cycle 6

Page 6: Pipelining - cseweb.ucsd.educseweb.ucsd.edu/classes/wi11/cse141/Slides/10_Pipelining.pdf · Pipelining 1.Break up the logic with latches into “pipeline stages” 2.Each stage can

• Critical path is the longest possible delay between two registers in a design.

• The critical path sets the cycle time, since the cycle time must be long enough for a signal to traverse the critical path.

• Lengthening or shortening non-critical paths does not change performance

• Ideally, all paths are about the same length

6

Critical path review

Logic

Page 7: Pipelining - cseweb.ucsd.educseweb.ucsd.edu/classes/wi11/cse141/Slides/10_Pipelining.pdf · Pipelining 1.Break up the logic with latches into “pipeline stages” 2.Each stage can

7

Pipelining and Logic

Logic

Logic

Logic

Logic

Logic

Logic

Logic

Logic

Logic

Logic

Logic

Logic

• Hopefully, critical path reduced by 1/3

Page 8: Pipelining - cseweb.ucsd.educseweb.ucsd.edu/classes/wi11/cse141/Slides/10_Pipelining.pdf · Pipelining 1.Break up the logic with latches into “pipeline stages” 2.Each stage can

• You cannot pipeline forever• Some logic cannot be pipelined arbitrarily -- Memories• Some logic is inconvenient to pipeline.• How do you insert a register in the middle of an multiplier?

• Registers have a cost• They cost area -- choose “narrow points” in the logic• They cost energy -- latches don’t do any useful work• They cost time

• Extra logic delay

• Set-up and hold times.

• Pipelining may not affect the critical path as you expect

Limitations of Pipelining

8

Logic

Logic

Logic

Logic

Logic

Logic

Page 9: Pipelining - cseweb.ucsd.educseweb.ucsd.edu/classes/wi11/cse141/Slides/10_Pipelining.pdf · Pipelining 1.Break up the logic with latches into “pipeline stages” 2.Each stage can

Pipelining Overhead

• Logic Delay (LD) -- How long does the logic take (i.e., the useful part)

• Set up time (ST) -- How long before the clock edge do the inputs to a register need be ready?

• Register delay (RD) -- Delay through the internals of the register.

9

Page 10: Pipelining - cseweb.ucsd.educseweb.ucsd.edu/classes/wi11/cse141/Slides/10_Pipelining.pdf · Pipelining 1.Break up the logic with latches into “pipeline stages” 2.Each stage can

Pipelining Overhead

10

????? New Data

Setup time

Clock

Register in

Old data New dataRegister out

Register delay

Page 11: Pipelining - cseweb.ucsd.educseweb.ucsd.edu/classes/wi11/cse141/Slides/10_Pipelining.pdf · Pipelining 1.Break up the logic with latches into “pipeline stages” 2.Each stage can

Pipelining Overhead

• Logic Delay (LD) -- How long does the logic take (i.e., the useful part)

• Set up time (ST) -- How long before the clock edge do the inputs to a register need be ready?

• Register delay (RD) -- Delay through the internals of the register.

• BaseCT -- cycle time before pipelining• BaseCT = LD + ST + RD.

• PipeCT -- cycle time after pipelining N times• PipeCT = ST + RD + LD/N• Total time = N*ST + N*RD + LD

11

Page 12: Pipelining - cseweb.ucsd.educseweb.ucsd.edu/classes/wi11/cse141/Slides/10_Pipelining.pdf · Pipelining 1.Break up the logic with latches into “pipeline stages” 2.Each stage can

• You can’t always pipeline how you would like• The critical path only went down “fast logic”

Pipelining Difficulties

12

Fast

Logic

Slow Logic

Slow Logic

Fast

Logic

Fast

Logic

Slow Logic

Slow Logic

Fast

Logic

Page 13: Pipelining - cseweb.ucsd.educseweb.ucsd.edu/classes/wi11/cse141/Slides/10_Pipelining.pdf · Pipelining 1.Break up the logic with latches into “pipeline stages” 2.Each stage can

How to pipeline a processor

• Break each instruction into pieces -- remember the basic algorithm for execution• Fetch• Decode• Collect arguments• Execute• Write back results• Compute next PC

• The “classic 5-stage MIPS pipeline”• Fetch -- read the instruction• Decode -- decode and read from the register file• Execute -- Perform arithmetic ops and address calculations• Memory -- access data memory.• Write back-- Store results in the register file.

13

Page 14: Pipelining - cseweb.ucsd.educseweb.ucsd.edu/classes/wi11/cse141/Slides/10_Pipelining.pdf · Pipelining 1.Break up the logic with latches into “pipeline stages” 2.Each stage can

14

Pipelining a processor

EX

DecodeFetch Mem Write

back

EX

DecodeFetch Mem Write

back

Page 15: Pipelining - cseweb.ucsd.educseweb.ucsd.edu/classes/wi11/cse141/Slides/10_Pipelining.pdf · Pipelining 1.Break up the logic with latches into “pipeline stages” 2.Each stage can

• Break the processor into P pipe stages

• What happens to latency?• L = Inst * CPI * CycleTime

• The cycle time = ?

• CPI = ?

15

Impact of Pipelining

Page 16: Pipelining - cseweb.ucsd.educseweb.ucsd.edu/classes/wi11/cse141/Slides/10_Pipelining.pdf · Pipelining 1.Break up the logic with latches into “pipeline stages” 2.Each stage can

• Break the processor into P pipe stages

• What happens to latency?• L = Inst * CPI * CycleTime

• The cycle time = CT/P

• CPI = 1• CPI is an average: Cycles/instructions• When # of instructions is large, CPI = 1• If just one instruction, CPI = P

16

Impact of Pipelining

Page 17: Pipelining - cseweb.ucsd.educseweb.ucsd.edu/classes/wi11/cse141/Slides/10_Pipelining.pdf · Pipelining 1.Break up the logic with latches into “pipeline stages” 2.Each stage can

Pipelined Datapath

ReadAddress

Instruc(onMemory

Add

PC

4

Write  Data

Read  Addr  1

Read  Addr  2

Write  Addr

Register

File

Read  Data  1

Read  Data  2

16 32

ALU

Shi<le<  2

Add

DataMemory

Address

Write  Data

ReadData

SignExtend

Page 18: Pipelining - cseweb.ucsd.educseweb.ucsd.edu/classes/wi11/cse141/Slides/10_Pipelining.pdf · Pipelining 1.Break up the logic with latches into “pipeline stages” 2.Each stage can

Pipelined Datapath

ReadAddress

Instruc(onMemory

Add

PC

4

Write  Data

Read  Addr  1

Read  Addr  2

Write  Addr

Register

File

Read  Data  1

Read  Data  2

16 32

ALU

Shi<le<  2

Add

DataMemory

Address

Write  Data

ReadData

IFetch/D

ec

Dec/Exec

Exec/M

em

Mem

/WB

SignExtend

Page 19: Pipelining - cseweb.ucsd.educseweb.ucsd.edu/classes/wi11/cse141/Slides/10_Pipelining.pdf · Pipelining 1.Break up the logic with latches into “pipeline stages” 2.Each stage can

Pipelined Datapath

ReadAddress

Instruc(onMemory

Add

4

Write  Data

Read  Addr  1

Read  Addr  2

Write  Addr

Register

File

Read  Data  1

Read  Data  2

16 32

ALU

Shi<le<  2

Add

DataMemory

Address

Write  Data

ReadData

SignExtend

add  …lw  …Sub…Sub  ….Add  …Add  …

Page 20: Pipelining - cseweb.ucsd.educseweb.ucsd.edu/classes/wi11/cse141/Slides/10_Pipelining.pdf · Pipelining 1.Break up the logic with latches into “pipeline stages” 2.Each stage can

Pipelined Datapath

ReadAddress

Instruc(onMemory

Add

4

Write  Data

Read  Addr  1

Read  Addr  2

Write  Addr

Register

File

Read  Data  1

Read  Data  2

16 32

ALU

Shi<le<  2

Add

DataMemory

Address

Write  Data

ReadData

SignExtend

add  …lw  …Sub…Sub  ….Add  …Add  …

Page 21: Pipelining - cseweb.ucsd.educseweb.ucsd.edu/classes/wi11/cse141/Slides/10_Pipelining.pdf · Pipelining 1.Break up the logic with latches into “pipeline stages” 2.Each stage can

Pipelined Datapath

ReadAddress

Instruc(onMemory

Add

4

Write  Data

Read  Addr  1

Read  Addr  2

Write  Addr

Register

File

Read  Data  1

Read  Data  2

16 32

ALU

Shi<le<  2

Add

DataMemory

Address

Write  Data

ReadData

SignExtend

add  …lw  …Sub…Sub  ….Add  …Add  …

Page 22: Pipelining - cseweb.ucsd.educseweb.ucsd.edu/classes/wi11/cse141/Slides/10_Pipelining.pdf · Pipelining 1.Break up the logic with latches into “pipeline stages” 2.Each stage can

Pipelined Datapath

ReadAddress

Instruc(onMemory

Add

4

Write  Data

Read  Addr  1

Read  Addr  2

Write  Addr

Register

File

Read  Data  1

Read  Data  2

16 32

ALU

Shi<le<  2

Add

DataMemory

Address

Write  Data

ReadData

SignExtend

add  …lw  …Sub…Sub  ….Add  …Add  …

Page 23: Pipelining - cseweb.ucsd.educseweb.ucsd.edu/classes/wi11/cse141/Slides/10_Pipelining.pdf · Pipelining 1.Break up the logic with latches into “pipeline stages” 2.Each stage can

Pipelined Datapath

ReadAddress

Instruc(onMemory

Add

4

Write  Data

Read  Addr  1

Read  Addr  2

Write  Addr

Register

File

Read  Data  1

Read  Data  2

16 32

ALU

Shi<le<  2

Add

DataMemory

Address

Write  Data

ReadData

SignExtend

add  …lw  …Subi…Sub  ….Add  …Add  …

Page 24: Pipelining - cseweb.ucsd.educseweb.ucsd.edu/classes/wi11/cse141/Slides/10_Pipelining.pdf · Pipelining 1.Break up the logic with latches into “pipeline stages” 2.Each stage can

Simple Pipelining Control

24

• Compute all the control bits in decode, then pass them from stage to stage. It won’t stay this simple...

Control

EX

DecodeFetch Mem Write

back

Control Control Control

EX

DecodeFetch Mem Write

back

Control

Page 25: Pipelining - cseweb.ucsd.educseweb.ucsd.edu/classes/wi11/cse141/Slides/10_Pipelining.pdf · Pipelining 1.Break up the logic with latches into “pipeline stages” 2.Each stage can

Pipelining is Tricky

• If all the data flows in one direction, pipelining is relatively easy.

• Not so, for processors. • Decode and write back both access the register file.• Branch instructions affect the next PC• Instructions need values computed by previous

instructions

25

Page 26: Pipelining - cseweb.ucsd.educseweb.ucsd.edu/classes/wi11/cse141/Slides/10_Pipelining.pdf · Pipelining 1.Break up the logic with latches into “pipeline stages” 2.Each stage can

Not just tricky, Hazardous!

• Hazards are situations where pipelining does not work as elegantly as we would like• Caused by backward flowing signals• Or by lack of available hardware

• Three kinds• Data hazards -- an input is not available on the cycle it is

needed• Control hazards -- the next instruction is not known• Structural hazards -- we have run out of a hardware

resource

• Detecting, avoiding, and recovering from these hazards is what makes processor design hard.• That, and the Xilinx tools ;-)

26

Page 27: Pipelining - cseweb.ucsd.educseweb.ucsd.edu/classes/wi11/cse141/Slides/10_Pipelining.pdf · Pipelining 1.Break up the logic with latches into “pipeline stages” 2.Each stage can

A Structural Hazard

• Both the decode and write back stage have to access the register file.

• There is only one registers file. A structural hazard!!

• Solution: Write early, read late• Writes occur at the clock edge and complete long

before the end of the cycle• This leave enough time for the outputs to settle for the

reads.

• Hazard avoided!

•27

EX

DecodeFetch Mem Write

back