1.Pipelining & ILP

7/28/2019 1.Pipelining & ILP

1/38

CS252/CullerLec 2.11/24/02

BYN R REJIN PAUL

LECTURER,CSE DEPT

CS2354 - Advanced Computer Architecture

Pipelining


2/38


Review: Visualizing Pipelining

Ins

tr.

Ord

er

Time (clock cycles)

RegALU

DMemIfetch Reg

RegALU

DMemIfetch Reg

RegALU

DMemIfetch Reg

RegALU

DMemIfetch Reg

Cycle 1Cycle 2 Cycle 3 Cycle 4 Cycle 6 Cycle 7Cycle 5


3/38


Limits to pipelining

Hazards: circumstances that would causeincorrect execution if next instruction werelaunched Structural hazards: Attempting to use the same hardware to

do two different things at the same time

Data hazards: Instruction depends on result of priorinstruction still in the pipeline

Control hazards: Caused by delay between the fetching ofinstructions and decisions about changes in control flow

(branches and jumps).


4/38


Example: One Memory Port/StructuralHazard

I

nstr.

Order

Time (clock cycles)

Load

Instr 1

Instr 2

Instr 3

Instr 4

RegALU

DMemIfetch Reg

RegALU

DMemIfetch Reg

RegALU

DMemIfetch Reg

RegALU

DMemIfetch Reg


DMem

Structural Hazard


5/38


Resolving structural hazards

Defn: attempt to use same hardware fortwo different things at the same time

Solution 1: Waitmust detect the hazard

must have mechanism to stall

Solution 2: Throw more hardware at theproblem


6/38


Detecting and Resolving Structural Hazard

I

nstr.

Order

Time (clock cycles)

Load

Instr 1

Instr 2

Stall

Instr 3

RegALU

DMemIfetch Reg

RegALU

DMemIfetch Reg

RegALU

DMemIfetch Reg


RegALU

DMemIfetch Reg

Bubble Bubble Bubble BubbleBubble


7/38


Instr.

Or

der

addr1,r2,r3

sub r4,r1,r3

and r6,r1,r7

or r8,r1,r9

xor r10,r1,r11

RegALU

DMemIfetch Reg

RegALU

DMemIfetch Reg

Data Hazards

Time (clock cycles)

IF ID/RF EX MEM WB

RegALU

DMemIfetch Reg

RegALU

DMemIfetch Reg

RegALU

DMemIfetch Reg


8/38


Read After Write (RAW)InstrJ tries to read operand before InstrI writes it

Caused by a Data Dependence (in compilernomenclature). This hazard results from an actualneed for communication.

Three Generic Data Hazards

I: addr1,r2,r3J: sub r4,r1,r3


9/38


Write After Read (WAR)InstrJ writes operand beforeInstrI reads it

Called an anti-dependence by compiler writers.This results from reuse of the name r1.

Cant happen in MIPS 5 stage pipeline because: All instructions take 5 stages, and

Reads are always in stage 2, and

Writes are always in stage 5

I: sub r4,r1,r3

J: addr1,r2,r3

K: mul r6,r1,r7



10/38



Write After Write (WAW)InstrJ writes operand beforeInstrI writes it.

Called an output dependence by compiler writersThis also results from the reuse of name r1.

Cant happen in MIPS 5 stage pipeline because:

All instructions take 5 stages, and Writes are always in stage 5

Will see WAR and WAW in later more complicatedpipes

I: sub r1,r4,r3

J: addr1,r2,r3

K: mul r6,r1,r7


11/38


Time (clock cycles)

Forwarding to Avoid Data Hazard

Inst

r.

Order

addr1,r2,r3

sub r4,r1,r3

and r6,r1,r7

or r8,r1,r9

xor r10,r1,r11

RegALU

DMemIfetch Reg

RegALU

DMemIfetch Reg

RegALU

DMemIfetch Reg

Reg ALU

DMemIfetchReg

RegALU

DMemIfetch Reg


12/38


Time (clock cycles)

I

nstr.

Order

lwr1, 0(r2)

sub r4,r1,r6

and r6,r1,r7

or r8,r1,r9

Data Hazard Even with Forwarding

RegALU

DMemIfetch Reg

RegALU

DMemIfetch Reg

Reg ALU

DMemIfetch Reg

RegALU

DMemIfetch Reg


13/38


Resolving this load hazard

Adding hardware? ... not

Detection?

Compilation techniques?

What is the cost of load delays?


14/38


Resolving the Load Data Hazard

Time (clock cycles)

or r8,r1,r9

Ins

tr.

Ord

er

lwr1, 0(r2)

sub r4,r1,r6

and r6,r1,r7

RegALU

DMemIfetch Reg

RegIfetchALU

DMem RegBubble

IfetchALU

DMem RegBubble Reg

IfetchALU

DMemBubble Reg


15/38


Control Hazard on Branches=> Three Stage Stall

10: beq r1,r3,36

14: and r2,r3,r5

18: or r6,r1,r7

22: add r8,r1,r9

36: xor r10,r1,r11

RegALU

DMemIfetch Reg

RegALU

DMemIfetch Reg

RegALU

DMemIfetch Reg

RegALU

DMemIfetch Reg

RegALU

DMemIfetch Reg


16/38


Example: Branch Stall Impact

Two part solution: Determine branch taken or not sooner, AND

Compute taken branch address earlier

MIPS branch tests if register = 0 or 0 MIPS Solution:

Move Zero test to ID/RF stage

Adder to calculate new PC in ID/RF stage

1 clock cycle penalty for branch versus 3


17/38


Adder

IF/ID

Pipelined MIPS Datapath

Memory

Access

Write

Back

Instruction

Fetch

Instr. Decode

Reg. Fetch

Execute

Addr. Calc

ALU

Memory

RegFile

MUX

Data

Memory

MUX

SignExtend

Zero?

MEM/WB

EX/MEM

4

Adder

NextSEQ PC

RD RD RDWBData

Next PC

Address

RS1

RS2

Imm

MUX

ID/EX


18/38


OVERCOME Branch Hazard Alternatives

#1: Stall until branch direction is clear

#2: Predict Branch Not Taken

#3: Predict Branch Taken


19/38


Extending Pipeline To HandleMulticycle operations

Floating point numbers have two parts Exponents &significant

We should deal with the exponent and significantseperately

Example:3.25 X10^3

2.63 X10^-1

3.25 X 10^3

0.000236 X10^3 -Shift the smaller to right until match

3.250263 X10^3


20/38


Cont

So some algorithm needs to be implemented in order toperform the operation

Functional unit should be redesigned to perform alloperations and this type of functional unit require

longer pipeline cycleLatency in the functional unit :

- Latency is defined as the number ofintervening cycles between an instruction that produces

a result and an instruction that uses the result. initiation interval :

number of cycles that must elapsebetween issuing two operations of a given type


21/38


Cont.


22/38


Latencies and initiation intervals for FU


23/38


Pipeline support for FP operations


24/38


FP example


25/38


Cont.


26/38


Cont.

Assuming that the pipeline does all hazard detection inID, there are three checks that must be performedbefore an instruction can issue:

Check For Structural Hazards: Wait until the required

functional unit is available Check for a RAW data hazard: Wait until the sourceregisters are not listed aspending destinations in apipeline register that will not be available

Check for a WAW data hazard: Determine if anyinstruction in Al, . A4,D, Ml, . . . , M7 has thesame register destination as this instruction.


27/38


Instruction Level Parallelism

Pipelining can overlap the execution of instructionswhen they are independent of one another. Thispotential overlap among instructions is calledinstruction-level parallelism (ILP) since theinstructions can be evaluated in parallel.

Instruction-level parallelism (ILP) is a measure ofhow many of the operations in a computer programcan be performed simultaneously. Consider thefollowing program:

1. e = a + b

2. f = c + d

3. g = e * f


28/38



Operation 3 depends on the results of operations 1and 2

3 cannot be calculated until both of them arecompleted

operations 1 and 2 do not depend on any otheroperation, so they can be calculated simultaneously.

A goal of compiler and processor designers is toidentify and take advantage of as much ILP aspossible. Ordinary programs are typically writtenunder a sequential execution model where instructionsexecute one after the other and in the orderspecified by the programmer.


29/38



The simplest and most common way to increase theamount of parallelism available among instructions isto exploit parallelism among iterations of a loop.This type of parallelism is often called loop-levelparallelism.

Example 1for (i=1; i


30/38



for (i=1; i


31/38


Ideas To Reduce Stalls

Technique ReducesDynamic scheduling Data hazard stalls

Dynamic branch

prediction

Control stalls

Issuing multiple

instructions per cycle

Ideal CPI

Speculation Data and control stalls

Dynamic memory

disambiguation

Data hazard stalls involving

memory

Loop unrolling Control hazard stalls

Basic compiler pipeline

scheduling

Data hazard stalls

Compiler dependence

analysis

Ideal CPI and data hazard stalls

Software pipelining and

trace scheduling

Ideal CPI and data hazard stalls

Compiler speculation Ideal CPI, data and control stalls


32/38


InstrJ is data dependent on InstrIInstrJ tries to read operand before InstrI writes it

or InstrJ is data dependent on InstrK which is dependent onInstrI

Caused by a TrueDependence (compiler term)

If true dependence caused a hazard in the pipeline, called aRead After Write (RAW) hazard

I: addr1,r2,r3

J: sub r4,r1,r3

Data Dependence andHazards

Instruction LevelParallelism


33/38


Dependences are a property of programs

Presence of dependence indicates potential for a hazard,but actual hazard and length of any stall is a property ofthe pipeline

Importance of the data dependencies1) indicates the possibility of a hazard

2) determines order in which results must be calculated

Today looking at HW schemes to avoid hazard

Data Dependence andHazards



34/38


Name dependence: when 2 instructions use same register ormemory location, called a name, but no flow of databetween the instructions associated with that name; 2versions of name dependence

InstrJ writes operand beforeInstrI reads it

Called an anti-dependence by compiler writers.This results from reuse of the name r1

If anti-dependence caused a hazard in the pipeline, called aWrite After Read (WAR) hazard

I: sub r4,r1,r3

J: addr1,r2,r3

K: mul r6,r1,r7

Name Dependence #1:Anti-dependence



35/38


Name Dependence #2:Output dependence

InstrJ writes operand beforeInstrI writes it.

Called an output dependence by compiler writersThis also results from the reuse of name r1

If anti-dependence caused a hazard in the pipeline, called

a Write After Write (WAW) hazard

I: sub r1,r4,r3

J: addr1,r2,r3

K: mul r6,r1,r7



36/38


36

Control Dependencies

Every instruction is control dependent on some set ofbranches, and, in general, these control dependenciesmust be preserved to preserve program order

if p1 {

S1;

};if p2 {

S2;

}

S1 is control dependent on p1, and S2 is controldependent on p2 but not on p1.



37/38


Out-Of-Order Execution

DIVD F0,F2,F4

ADDD F10,F0,F8

SUBD F12,F8,F14

Enables out-of-order execution => out-of-order completion


38/38

C 252/C ll

THANK YOU

Documents

1.Pipelining & ILP