31
1 Recap Recap (Pipelining) (Pipelining)

Recap (Pipelining)

  • Upload
    quynh

  • View
    52

  • Download
    0

Embed Size (px)

DESCRIPTION

Recap (Pipelining). What is Pipelining?. A way of speeding up execution of tasks Key idea : overlap execution of multiple taks. Automobile Manufacturing. 1. Build frame. 60 min. 2. Add engine. 50 min. 3. Build body. 80 min. 4. Paint. 40 min. 5. Finish.45 min. 275 min. - PowerPoint PPT Presentation

Citation preview

Page 1: Recap (Pipelining)

1

RecapRecap(Pipelining)(Pipelining)

Page 2: Recap (Pipelining)

2

What is Pipelining?• A way of speeding up execution of tasks

• Key idea:

overlap execution of multiple taks

Page 3: Recap (Pipelining)

3

Automobile Manufacturing1. Build frame. 60 min.

2. Add engine. 50 min.

3. Build body. 80 min.

4. Paint. 40 min.

5. Finish. 45 min.

275 min.

Latency: Time from start to finish for one car.

Throughput: Number of finished cars per time unit.

1 car/275 min = 0.218 cars/hour

275 minutes per car.

Issues: How can we make the process better by adding?

(smaller is better)

(larger is better)

Page 4: Recap (Pipelining)

4

An Assembly line

1

1

1

1

1

2

2

2

2

2

3

3

3

3

3

4

4

4

4

4

60 50 80 40 45

First two stagescan’t produce faster thanone car/80 min or a backlog will occurat third stage.

80 80

Last two stages only receive onecar/80 min to work on.

80 80

Latency: 400 min/carThroughput: 4 cars/640 min (1 car/160 min)

time

Will approach 1 car/80 min as time goes on

Page 5: Recap (Pipelining)

5

Pipelining a Digital System

• Key idea: break big computation up into pieces

Separate each piece with a pipeline register

1ns

200ps 200ps 200ps 200ps 200ps

PipelineRegister

Page 6: Recap (Pipelining)

6

Pipelining a Digital System

• Why do this? Because it's faster for repeated computations

1ns

Non-pipelined:1 operation finishesevery 1ns

200ps 200ps 200ps 200ps 200ps

Pipelined:1 operation finishesevery 200ps

Page 7: Recap (Pipelining)

7

Comments about pipelining

• Pipelining increases throughput, but not latency

– Answer available every 200ps, BUT

– A single computation still takes 1ns

• Limitations:

– Computations must be divisible into stages of equal sizes

– Pipeline registers add overhead

Page 8: Recap (Pipelining)

8

Another Example

Comb.Logic

REG

30ns 3ns

Clock

Delay = 33nsThroughput = 30MHz

Time

UnpipelinedSystem

Op1 Op2 Op3??

– One operation must complete before next can begin– Operations spaced 33ns apart

Page 9: Recap (Pipelining)

9

3 Stage Pipelining

– Space operations 13ns apart

– 3 operations occur simultaneously

REG

Clock

Comb.Logic

REG

Comb.Logic

REG

Comb.Logic

10ns 3ns 10ns 3ns 10ns 3ns

Delay = 39nsThroughput = 77MHz

Time

Op1

Op2

Op3

Op4

Page 10: Recap (Pipelining)

10

Limitation: Nonuniform Pipelining

Clock

REG

Com.Log.

REG

Comb.Logic

REG

Comb.Logic

5ns 3ns 15ns 3ns 10ns 3ns

Delay = 18 * 3 = 54 nsThroughput = 55MHz

• Throughput limited by slowest stage• Delay determined by clock period * number of stages

• Must attempt to balance stages

Page 11: Recap (Pipelining)

11

Limitation: Deep Pipelines

• Diminishing returns as add more pipeline stages• Register delays become limiting factor

• Increased latency• Small throughput gains• More hazards

Delay = 48ns, Throughput = 128MHzClock

REG

Com.Log.

5ns 3ns

REG

Com.Log.

5ns 3ns

REG

Com.Log.

5ns 3ns

REG

Com.Log.

5ns 3ns

REG

Com.Log.

5ns 3ns

REG

Com.Log.

5ns 3ns

Page 12: Recap (Pipelining)

12

MIPSPipeliningPipelining

Page 13: Recap (Pipelining)

13

MIPS 5-stage pipelineMIPS 5-stage pipeline• The MIPS processor needs 5 stages to execute instructions

• Pipelining stages:– IF - Instruction Fetch

– ID - Instruction Decode

– EX - Execute / Address Calculation

– MEM - Memory Access (read / write)

– WB - Write Back (results into register file)

• Not all instructions need all the stages (e.g., add instruction does not need the MEM stage)

Page 14: Recap (Pipelining)

14

Basic MIPS Pipelined Processor

IF/ID

Pipeline Registers

5 516

RD1

RD2

RN1 RN2 WN

WD

Register File ALU

EXTND

16 32

RD

WD

DataMemory

ADDR

5

Instruction I32

MUX

<<2RD

InstructionMemory

ADDR

PC

4

ADD

ADD

MUX

32

ID/EX EX/MEM MEM/WB

Page 15: Recap (Pipelining)

15

Pipelined Example - Executing Multiple Instructions

• Consider the following instruction sequence:

lw $r0, 10($r1)

sw $sr3, 20($r4)

add $r5, $r6, $r7

sub $r8, $r9, $r10

Page 16: Recap (Pipelining)

16

Executing Multiple InstructionsClock Cycle 1

LW

5

RD1

RD2

RN1

RN2

WN

WD

RegisterFile

ALU

EXTND

16 32

RD

WD

DataMemory

ADDR

32

MUX

<<2

RD

InstructionMemory

ADDR

PC

4

ADD

ADD

MUX

5

5

5

IF/ID ID/EX EX/MEM MEM/WB

Zero

Page 17: Recap (Pipelining)

17

5

RD1

RD2

RN1

RN2

WN

WD

RegisterFile

ALU

EXTND

16 32

RD

WD

DataMemory

ADDR

32

MUX

<<2

RD

InstructionMemory

ADDR

PC

4

ADD

ADD

MUX

5

5

5

IF/ID ID/EX EX/MEM MEM/WB

Zero

Executing Multiple InstructionsClock Cycle 2

LWSW

Page 18: Recap (Pipelining)

18

5

RD1

RD2

RN1

RN2

WN

WD

RegisterFile

ALU

EXTND

16 32

RD

WD

DataMemory

ADDR

32

MUX

<<2

RD

InstructionMemory

ADDR

PC

4

ADD

ADD

MUX

5

5

5

IF/ID ID/EX EX/MEM MEM/WB

Zero

Executing Multiple InstructionsClock Cycle 3

LWSWADD

Page 19: Recap (Pipelining)

19

5

RD1

RD2

RN1

RN2

WN

WD

RegisterFile

ALU

EXTND

16 32

RD

WD

DataMemory

ADDR

32

MUX

<<2

RD

InstructionMemory

ADDR

PC

4

ADD

ADD

MUX

5

5

5

IF/ID ID/EX EX/MEM MEM/WB

Zero

Executing Multiple InstructionsClock Cycle 4

LWSWADDSUB

Page 20: Recap (Pipelining)

20

Executing Multiple InstructionsClock Cycle 5

5

RD1

RD2

RN1

RN2

WN

WD

RegisterFile

ALU

EXTND

16 32

RD

WD

DataMemory

ADDR

32

MUX

<<2

RD

InstructionMemory

ADDR

PC

4

ADD

ADD

MUX

5

5

5

IF/ID ID/EX EX/MEM MEM/WB

Zero

LWSWADDSUB

Page 21: Recap (Pipelining)

21

Executing Multiple InstructionsClock Cycle 6

5

RD1

RD2

RN1

RN2

WN

WD

RegisterFile

ALU

EXTND

16 32

RD

WD

DataMemory

ADDR

32

MUX

<<2

RD

InstructionMemory

ADDR

PC

4

ADD

ADD

MUX

5

5

5

IF/ID ID/EX EX/MEM MEM/WB

Zero

SWADDSUB

Page 22: Recap (Pipelining)

22

Executing Multiple InstructionsClock Cycle 7

5

RD1

RD2

RN1

RN2

WN

WD

RegisterFile

ALU

EXTND

16 32

RD

WD

DataMemory

ADDR

32

MUX

<<2

RD

InstructionMemory

ADDR

PC

4

ADD

ADD

MUX

5

5

5

IF/ID ID/EX EX/MEM MEM/WB

Zero

ADDSUB

Page 23: Recap (Pipelining)

23

Executing Multiple InstructionsClock Cycle 8

5

RD1

RD2

RN1

RN2

WN

WD

RegisterFile

ALU

EXTND

16 32

RD

WD

DataMemory

ADDR

32

MUX

<<2

RD

InstructionMemory

ADDR

PC

4

ADD

ADD

MUX

5

5

5

IF/ID ID/EX EX/MEM MEM/WB

Zero

SUB

Page 24: Recap (Pipelining)

24

Alternative View - Multicycle Diagram

IM REG ALU DM REGlw $r0, 10($r1)

sw $r3, 20($r4)

add $r5, $r6, $r7

CC 1 CC 2 CC 3 CC 4 CC 5 CC 6 CC 7

IM REG ALU DM REG

IM REG ALU DM REG

sub $r8, $r9, $r10 IM REG ALU DM REG

CC 8

Page 25: Recap (Pipelining)

25

Processor Pipelining

• There are two ways that pipelining can help:

1. Reduce the clock cycle time, and keep the same CPI

2. Reduce the CPI, and keep the same clock cycle time

CPU time = Instruction count * CPU time = Instruction count * CPICPI * * Clock cycle timeClock cycle time

Page 26: Recap (Pipelining)

26

Reduce the clock cycle time, and keep Reduce the clock cycle time, and keep the same CPIthe same CPI

5 516

RD1

RD2

RN1 RN2 WN

WD

Register File ALU

EXTND

16 32

RD

WD

DataMemory

ADDR

5

Instruction I32

MUX

<<2RD

InstructionMemory

ADDR

PC

4

ADD

ADD

MUX

32

CPI = 1CPI = 1

Clock = X HzClock = X Hz

Page 27: Recap (Pipelining)

27

Reduce the clock cycle time, and keep Reduce the clock cycle time, and keep the same CPIthe same CPI

Pipeline Registers

5 516

RD1

RD2

RN1 RN2 WN

WD

Register File ALU

EXTND

16 32

RD

WD

DataMemory

ADDR

5

Instruction I32

MUX

<<2RD

InstructionMemory

ADDR

PC

4

ADD

ADD

MUX

32

CPI = 1CPI = 1

Clock = Clock = X*5 HzX*5 Hz

Page 28: Recap (Pipelining)

28

Reduce the CPI, and keep the same Reduce the CPI, and keep the same cycle timecycle time

5 516

RD1

RD2

RN1 RN2 WN

WD

Register File ALU

EXTND

16 32

RD

WD

DataMemory

ADDR

5

Instruction I32

MUX

<<2RD

InstructionMemory

ADDR

PC

4

ADD

ADD

MUX

32

CPI = 5CPI = 5

Clock = X*5 HzClock = X*5 Hz

Page 29: Recap (Pipelining)

29

Reduce the CPI, and keep the same Reduce the CPI, and keep the same cycle timecycle time

Pipeline Registers

5 516

RD1

RD2

RN1 RN2 WN

WD

Register File ALU

EXTND

16 32

RD

WD

DataMemory

ADDR

5

Instruction I32

MUX

<<2RD

InstructionMemory

ADDR

PC

4

ADD

ADD

MUX

32

CPI = 1CPI = 1

Clock = Clock = X*5 HzX*5 Hz

Page 30: Recap (Pipelining)

30

Pipeline performancePipeline performance

• Ideally we get a speedup (by reducing clock cycle or reducing the CPI) equal to the number of stages.

• In practice, we do not achieve that – but we get close:

– Pipelining has additional overhead (e.g., pipeline registers)

– Pipeline hazards

Page 31: Recap (Pipelining)

31

Pipeline HazardsPipeline Hazards• Hazards are situations in pipelining which

prevent the next instruction in the instruction stream from executing during the designated clock cycle.

• Hazards reduce the ideal speedup gained from pipelining (e.g., CPI =1) and are classified into three classes:

– Structural hazards

– Data hazards

– Control hazards