Recap (Pipelining)

RecapRecap(Pipelining)(Pipelining)

What is Pipelining?• A way of speeding up execution of tasks

• Key idea:

overlap execution of multiple taks

Automobile Manufacturing1. Build frame. 60 min.

2. Add engine. 50 min.

3. Build body. 80 min.

4. Paint. 40 min.

5. Finish. 45 min.

275 min.

Latency: Time from start to finish for one car.

Throughput: Number of finished cars per time unit.

1 car/275 min = 0.218 cars/hour

275 minutes per car.

Issues: How can we make the process better by adding?

(smaller is better)

(larger is better)

An Assembly line

60 50 80 40 45

First two stagescan’t produce faster thanone car/80 min or a backlog will occurat third stage.

Last two stages only receive onecar/80 min to work on.

Latency: 400 min/carThroughput: 4 cars/640 min (1 car/160 min)

Will approach 1 car/80 min as time goes on

Pipelining a Digital System

• Key idea: break big computation up into pieces

Separate each piece with a pipeline register

200ps 200ps 200ps 200ps 200ps

PipelineRegister

Pipelining a Digital System

• Why do this? Because it's faster for repeated computations

Non-pipelined:1 operation finishesevery 1ns

200ps 200ps 200ps 200ps 200ps

Pipelined:1 operation finishesevery 200ps

Comments about pipelining

• Pipelining increases throughput, but not latency

– Answer available every 200ps, BUT

– A single computation still takes 1ns

• Limitations:

– Computations must be divisible into stages of equal sizes

– Pipeline registers add overhead

Another Example

Comb.Logic

30ns 3ns

Delay = 33nsThroughput = 30MHz

UnpipelinedSystem

Op1 Op2 Op3??

– One operation must complete before next can begin– Operations spaced 33ns apart

3 Stage Pipelining

– Space operations 13ns apart

– 3 operations occur simultaneously

Comb.Logic

10ns 3ns 10ns 3ns 10ns 3ns

Delay = 39nsThroughput = 77MHz

Limitation: Nonuniform Pipelining

Com.Log.

Comb.Logic

5ns 3ns 15ns 3ns 10ns 3ns

Delay = 18 * 3 = 54 nsThroughput = 55MHz

• Throughput limited by slowest stage• Delay determined by clock period * number of stages

• Must attempt to balance stages

Limitation: Deep Pipelines

• Diminishing returns as add more pipeline stages• Register delays become limiting factor

• Increased latency• Small throughput gains• More hazards

Delay = 48ns, Throughput = 128MHzClock

Com.Log.

5ns 3ns

Com.Log.

5ns 3ns

Com.Log.

5ns 3ns

Com.Log.

5ns 3ns

Com.Log.

5ns 3ns

Com.Log.

5ns 3ns

MIPSPipeliningPipelining

MIPS 5-stage pipelineMIPS 5-stage pipeline• The MIPS processor needs 5 stages to execute instructions

• Pipelining stages:– IF - Instruction Fetch

– ID - Instruction Decode

– EX - Execute / Address Calculation

– MEM - Memory Access (read / write)

– WB - Write Back (results into register file)

• Not all instructions need all the stages (e.g., add instruction does not need the MEM stage)

Basic MIPS Pipelined Processor

Pipeline Registers

RN1 RN2 WN

Register File ALU

DataMemory

Instruction I32

InstructionMemory

ID/EX EX/MEM MEM/WB

Pipelined Example - Executing Multiple Instructions

• Consider the following instruction sequence:

lw $r0, 10($r1)

sw $sr3, 20($r4)

add $r5, $r6, $r7

sub $r8, $r9, $r10

Executing Multiple InstructionsClock Cycle 1

RegisterFile

DataMemory

InstructionMemory

IF/ID ID/EX EX/MEM MEM/WB

RegisterFile

DataMemory

InstructionMemory

RegisterFile

DataMemory

InstructionMemory

LWSWADD

RegisterFile

DataMemory

InstructionMemory

LWSWADDSUB

RegisterFile

DataMemory

InstructionMemory

LWSWADDSUB

RegisterFile

DataMemory

InstructionMemory

SWADDSUB

RegisterFile

DataMemory

InstructionMemory

ADDSUB

RegisterFile

DataMemory

InstructionMemory

Alternative View - Multicycle Diagram

IM REG ALU DM REGlw $r0, 10($r1)

sw $r3, 20($r4)

add $r5, $r6, $r7

CC 1 CC 2 CC 3 CC 4 CC 5 CC 6 CC 7

IM REG ALU DM REG

sub $r8, $r9, $r10 IM REG ALU DM REG

Processor Pipelining

• There are two ways that pipelining can help:

1. Reduce the clock cycle time, and keep the same CPI

2. Reduce the CPI, and keep the same clock cycle time

CPU time = Instruction count * CPU time = Instruction count * CPICPI * * Clock cycle timeClock cycle time

Reduce the clock cycle time, and keep Reduce the clock cycle time, and keep the same CPIthe same CPI

RN1 RN2 WN

Register File ALU

DataMemory

Instruction I32

InstructionMemory

CPI = 1CPI = 1

Clock = X HzClock = X Hz

Reduce the clock cycle time, and keep Reduce the clock cycle time, and keep the same CPIthe same CPI

Pipeline Registers

RN1 RN2 WN

Register File ALU

DataMemory

Instruction I32

InstructionMemory

CPI = 1CPI = 1

Clock = Clock = X*5 HzX*5 Hz

Reduce the CPI, and keep the same Reduce the CPI, and keep the same cycle timecycle time

RN1 RN2 WN

Register File ALU

DataMemory

Instruction I32

InstructionMemory

CPI = 5CPI = 5

Clock = X*5 HzClock = X*5 Hz

Reduce the CPI, and keep the same Reduce the CPI, and keep the same cycle timecycle time

Pipeline Registers

RN1 RN2 WN

Register File ALU

DataMemory

Instruction I32

InstructionMemory

CPI = 1CPI = 1

Clock = Clock = X*5 HzX*5 Hz

Pipeline performancePipeline performance

• Ideally we get a speedup (by reducing clock cycle or reducing the CPI) equal to the number of stages.

• In practice, we do not achieve that – but we get close:

– Pipelining has additional overhead (e.g., pipeline registers)

– Pipeline hazards

Pipeline HazardsPipeline Hazards• Hazards are situations in pipelining which

prevent the next instruction in the instruction stream from executing during the designated clock cycle.

• Hazards reduce the ideal speedup gained from pipelining (e.g., CPI =1) and are classified into three classes:

– Structural hazards

– Data hazards

– Control hazards

Recap (Pipelining)

Documents

Pipelining and Retiming Prepared by Mark Jarvin. Agenda Synchronous circuit retiming Pipelining Software pipelining

MIPS Pipelining

Review : Pipelining

PIPELINING INSTRUCTION

Todayʼs Menu Multi-Cycle Exceptions Exceptions ... · 13 Pipelining Multicycle Pipelining Let’s build cars 14 Pipelining Can we go faster? Pipelining: Production assembly lines

3 Pipelining

Pipelining Principles

PIPELINING basics - · PIPELINING basics • A pipelined architecture for MIPS • Hurdles in pipelining • Simple solutions to pipelining hurdles • Advanced pipelining

Pipelining Lessons

1 Recap (Pipelining). 2 What is Pipelining? A way of speeding up execution of tasks Key idea : overlap execution of multiple taks

Implementing Another ISA, Basic Pipelining 1 Recap and ...users.ece.utexas.edu/~derek/360NScribeNotes/EE360N... · The notes capture the class discussion and may contain erroneous

Computer Architecture Lecture 6: Pipelining · Recap of Last Lecture Multi-cycle and Microprogrammed Microarchitectures Benefits vs. Design Principles When to Generate Control Signals

Advanced Pipelining

Pipelining cache

· Pipelining Recap Powerful technique for masking latencies Logically, instructions execute one at a time Physically, instructions execute in parallel — Instruction level parallelism

Complex Pipelining

Pipelining & Parallel Processing - ics.kaist.ac.krics.kaist.ac.kr/ee878_2018f/[EE878]3 Pipelining and Parallel Processing.pdf · Pipelining processing By using pipelining latches

Software Pipelining

Pipelining I

Lecture: Pipelining Basicscs6810/pres/14-6810-03.pdf · Lecture: Pipelining Basics • Topics: Performance equations wrap-up, Basic pipelining implementation Video 1: What is pipelining?