View
221
Download
0
Category
Preview:
Citation preview
7/28/2019 1.Pipelining & ILP
1/38
CS252/CullerLec 2.11/24/02
BYN R REJIN PAUL
LECTURER,CSE DEPT
CS2354 - Advanced Computer Architecture
Pipelining
7/28/2019 1.Pipelining & ILP
2/38
CS252/CullerLec 2.21/24/02
Review: Visualizing Pipelining
Ins
tr.
Ord
er
Time (clock cycles)
RegALU
DMemIfetch Reg
RegALU
DMemIfetch Reg
RegALU
DMemIfetch Reg
RegALU
DMemIfetch Reg
Cycle 1Cycle 2 Cycle 3 Cycle 4 Cycle 6 Cycle 7Cycle 5
7/28/2019 1.Pipelining & ILP
3/38
CS252/CullerLec 2.31/24/02
Limits to pipelining
Hazards: circumstances that would causeincorrect execution if next instruction werelaunched Structural hazards: Attempting to use the same hardware to
do two different things at the same time
Data hazards: Instruction depends on result of priorinstruction still in the pipeline
Control hazards: Caused by delay between the fetching ofinstructions and decisions about changes in control flow
(branches and jumps).
7/28/2019 1.Pipelining & ILP
4/38
CS252/CullerLec 2.41/24/02
Example: One Memory Port/StructuralHazard
I
nstr.
Order
Time (clock cycles)
Load
Instr 1
Instr 2
Instr 3
Instr 4
RegALU
DMemIfetch Reg
RegALU
DMemIfetch Reg
RegALU
DMemIfetch Reg
RegALU
DMemIfetch Reg
Cycle 1Cycle 2 Cycle 3 Cycle 4 Cycle 6 Cycle 7Cycle 5
DMem
Structural Hazard
7/28/2019 1.Pipelining & ILP
5/38
CS252/CullerLec 2.51/24/02
Resolving structural hazards
Defn: attempt to use same hardware fortwo different things at the same time
Solution 1: Waitmust detect the hazard
must have mechanism to stall
Solution 2: Throw more hardware at theproblem
7/28/2019 1.Pipelining & ILP
6/38
CS252/CullerLec 2.61/24/02
Detecting and Resolving Structural Hazard
I
nstr.
Order
Time (clock cycles)
Load
Instr 1
Instr 2
Stall
Instr 3
RegALU
DMemIfetch Reg
RegALU
DMemIfetch Reg
RegALU
DMemIfetch Reg
Cycle 1Cycle 2 Cycle 3 Cycle 4 Cycle 6 Cycle 7Cycle 5
RegALU
DMemIfetch Reg
Bubble Bubble Bubble BubbleBubble
7/28/2019 1.Pipelining & ILP
7/38
CS252/CullerLec 2.71/24/02
Instr.
Or
der
addr1,r2,r3
sub r4,r1,r3
and r6,r1,r7
or r8,r1,r9
xor r10,r1,r11
RegALU
DMemIfetch Reg
RegALU
DMemIfetch Reg
Data Hazards
Time (clock cycles)
IF ID/RF EX MEM WB
RegALU
DMemIfetch Reg
RegALU
DMemIfetch Reg
RegALU
DMemIfetch Reg
7/28/2019 1.Pipelining & ILP
8/38
CS252/CullerLec 2.81/24/02
Read After Write (RAW)InstrJ tries to read operand before InstrI writes it
Caused by a Data Dependence (in compilernomenclature). This hazard results from an actualneed for communication.
Three Generic Data Hazards
I: addr1,r2,r3J: sub r4,r1,r3
7/28/2019 1.Pipelining & ILP
9/38
CS252/CullerLec 2.91/24/02
Write After Read (WAR)InstrJ writes operand beforeInstrI reads it
Called an anti-dependence by compiler writers.This results from reuse of the name r1.
Cant happen in MIPS 5 stage pipeline because: All instructions take 5 stages, and
Reads are always in stage 2, and
Writes are always in stage 5
I: sub r4,r1,r3
J: addr1,r2,r3
K: mul r6,r1,r7
Three Generic Data Hazards
7/28/2019 1.Pipelining & ILP
10/38
CS252/CullerLec 2.101/24/02
Three Generic Data Hazards
Write After Write (WAW)InstrJ writes operand beforeInstrI writes it.
Called an output dependence by compiler writersThis also results from the reuse of name r1.
Cant happen in MIPS 5 stage pipeline because:
All instructions take 5 stages, and Writes are always in stage 5
Will see WAR and WAW in later more complicatedpipes
I: sub r1,r4,r3
J: addr1,r2,r3
K: mul r6,r1,r7
7/28/2019 1.Pipelining & ILP
11/38
CS252/CullerLec 2.111/24/02
Time (clock cycles)
Forwarding to Avoid Data Hazard
Inst
r.
Order
addr1,r2,r3
sub r4,r1,r3
and r6,r1,r7
or r8,r1,r9
xor r10,r1,r11
RegALU
DMemIfetch Reg
RegALU
DMemIfetch Reg
RegALU
DMemIfetch Reg
Reg ALU
DMemIfetchReg
RegALU
DMemIfetch Reg
7/28/2019 1.Pipelining & ILP
12/38
CS252/CullerLec 2.121/24/02
Time (clock cycles)
I
nstr.
Order
lwr1, 0(r2)
sub r4,r1,r6
and r6,r1,r7
or r8,r1,r9
Data Hazard Even with Forwarding
RegALU
DMemIfetch Reg
RegALU
DMemIfetch Reg
Reg ALU
DMemIfetch Reg
RegALU
DMemIfetch Reg
7/28/2019 1.Pipelining & ILP
13/38
CS252/CullerLec 2.131/24/02
Resolving this load hazard
Adding hardware? ... not
Detection?
Compilation techniques?
What is the cost of load delays?
7/28/2019 1.Pipelining & ILP
14/38
CS252/CullerLec 2.141/24/02
Resolving the Load Data Hazard
Time (clock cycles)
or r8,r1,r9
Ins
tr.
Ord
er
lwr1, 0(r2)
sub r4,r1,r6
and r6,r1,r7
RegALU
DMemIfetch Reg
RegIfetchALU
DMem RegBubble
IfetchALU
DMem RegBubble Reg
IfetchALU
DMemBubble Reg
7/28/2019 1.Pipelining & ILP
15/38
CS252/CullerLec 2.151/24/02
Control Hazard on Branches=> Three Stage Stall
10: beq r1,r3,36
14: and r2,r3,r5
18: or r6,r1,r7
22: add r8,r1,r9
36: xor r10,r1,r11
RegALU
DMemIfetch Reg
RegALU
DMemIfetch Reg
RegALU
DMemIfetch Reg
RegALU
DMemIfetch Reg
RegALU
DMemIfetch Reg
7/28/2019 1.Pipelining & ILP
16/38
CS252/CullerLec 2.161/24/02
Example: Branch Stall Impact
Two part solution: Determine branch taken or not sooner, AND
Compute taken branch address earlier
MIPS branch tests if register = 0 or 0 MIPS Solution:
Move Zero test to ID/RF stage
Adder to calculate new PC in ID/RF stage
1 clock cycle penalty for branch versus 3
7/28/2019 1.Pipelining & ILP
17/38
CS252/CullerLec 2.171/24/02
Adder
IF/ID
Pipelined MIPS Datapath
Memory
Access
Write
Back
Instruction
Fetch
Instr. Decode
Reg. Fetch
Execute
Addr. Calc
ALU
Memory
RegFile
MUX
Data
Memory
MUX
SignExtend
Zero?
MEM/WB
EX/MEM
4
Adder
NextSEQ PC
RD RD RDWBData
Next PC
Address
RS1
RS2
Imm
MUX
ID/EX
7/28/2019 1.Pipelining & ILP
18/38
CS252/CullerLec 2.181/24/02
OVERCOME Branch Hazard Alternatives
#1: Stall until branch direction is clear
#2: Predict Branch Not Taken
#3: Predict Branch Taken
7/28/2019 1.Pipelining & ILP
19/38
CS252/CullerLec 2.191/24/02
Extending Pipeline To HandleMulticycle operations
Floating point numbers have two parts Exponents &significant
We should deal with the exponent and significantseperately
Example:3.25 X10^3
2.63 X10^-1
3.25 X 10^3
0.000236 X10^3 -Shift the smaller to right until match
3.250263 X10^3
7/28/2019 1.Pipelining & ILP
20/38
CS252/CullerLec 2.201/24/02
Cont
So some algorithm needs to be implemented in order toperform the operation
Functional unit should be redesigned to perform alloperations and this type of functional unit require
longer pipeline cycleLatency in the functional unit :
- Latency is defined as the number ofintervening cycles between an instruction that produces
a result and an instruction that uses the result. initiation interval :
number of cycles that must elapsebetween issuing two operations of a given type
7/28/2019 1.Pipelining & ILP
21/38
CS252/CullerLec 2.211/24/02
Cont.
7/28/2019 1.Pipelining & ILP
22/38
CS252/CullerLec 2.221/24/02
Latencies and initiation intervals for FU
7/28/2019 1.Pipelining & ILP
23/38
CS252/CullerLec 2.231/24/02
Pipeline support for FP operations
7/28/2019 1.Pipelining & ILP
24/38
CS252/CullerLec 2.241/24/02
FP example
7/28/2019 1.Pipelining & ILP
25/38
CS252/CullerLec 2.251/24/02
Cont.
7/28/2019 1.Pipelining & ILP
26/38
CS252/CullerLec 2.261/24/02
Cont.
Assuming that the pipeline does all hazard detection inID, there are three checks that must be performedbefore an instruction can issue:
Check For Structural Hazards: Wait until the required
functional unit is available Check for a RAW data hazard: Wait until the sourceregisters are not listed aspending destinations in apipeline register that will not be available
Check for a WAW data hazard: Determine if anyinstruction in Al, . A4,D, Ml, . . . , M7 has thesame register destination as this instruction.
7/28/2019 1.Pipelining & ILP
27/38
CS252/CullerLec 2.271/24/02
Instruction Level Parallelism
Pipelining can overlap the execution of instructionswhen they are independent of one another. Thispotential overlap among instructions is calledinstruction-level parallelism (ILP) since theinstructions can be evaluated in parallel.
Instruction-level parallelism (ILP) is a measure ofhow many of the operations in a computer programcan be performed simultaneously. Consider thefollowing program:
1. e = a + b
2. f = c + d
3. g = e * f
7/28/2019 1.Pipelining & ILP
28/38
CS252/CullerLec 2.281/24/02
Instruction Level Parallelism
Operation 3 depends on the results of operations 1and 2
3 cannot be calculated until both of them arecompleted
operations 1 and 2 do not depend on any otheroperation, so they can be calculated simultaneously.
A goal of compiler and processor designers is toidentify and take advantage of as much ILP aspossible. Ordinary programs are typically writtenunder a sequential execution model where instructionsexecute one after the other and in the orderspecified by the programmer.
7/28/2019 1.Pipelining & ILP
29/38
CS252/CullerLec 2.291/24/02
Instruction Level Parallelism
The simplest and most common way to increase theamount of parallelism available among instructions isto exploit parallelism among iterations of a loop.This type of parallelism is often called loop-levelparallelism.
Example 1for (i=1; i
7/28/2019 1.Pipelining & ILP
30/38
CS252/CullerLec 2.301/24/02
Instruction Level Parallelism
for (i=1; i
7/28/2019 1.Pipelining & ILP
31/38
CS252/CullerLec 2.311/24/02
Ideas To Reduce Stalls
Technique ReducesDynamic scheduling Data hazard stalls
Dynamic branch
prediction
Control stalls
Issuing multiple
instructions per cycle
Ideal CPI
Speculation Data and control stalls
Dynamic memory
disambiguation
Data hazard stalls involving
memory
Loop unrolling Control hazard stalls
Basic compiler pipeline
scheduling
Data hazard stalls
Compiler dependence
analysis
Ideal CPI and data hazard stalls
Software pipelining and
trace scheduling
Ideal CPI and data hazard stalls
Compiler speculation Ideal CPI, data and control stalls
7/28/2019 1.Pipelining & ILP
32/38
CS252/CullerLec 2.321/24/02
InstrJ is data dependent on InstrIInstrJ tries to read operand before InstrI writes it
or InstrJ is data dependent on InstrK which is dependent onInstrI
Caused by a TrueDependence (compiler term)
If true dependence caused a hazard in the pipeline, called aRead After Write (RAW) hazard
I: addr1,r2,r3
J: sub r4,r1,r3
Data Dependence andHazards
Instruction LevelParallelism
7/28/2019 1.Pipelining & ILP
33/38
CS252/CullerLec 2.331/24/02
Dependences are a property of programs
Presence of dependence indicates potential for a hazard,but actual hazard and length of any stall is a property ofthe pipeline
Importance of the data dependencies1) indicates the possibility of a hazard
2) determines order in which results must be calculated
Today looking at HW schemes to avoid hazard
Data Dependence andHazards
Instruction LevelParallelism
7/28/2019 1.Pipelining & ILP
34/38
CS252/CullerLec 2.341/24/02
Name dependence: when 2 instructions use same register ormemory location, called a name, but no flow of databetween the instructions associated with that name; 2versions of name dependence
InstrJ writes operand beforeInstrI reads it
Called an anti-dependence by compiler writers.This results from reuse of the name r1
If anti-dependence caused a hazard in the pipeline, called aWrite After Read (WAR) hazard
I: sub r4,r1,r3
J: addr1,r2,r3
K: mul r6,r1,r7
Name Dependence #1:Anti-dependence
Instruction LevelParallelism
7/28/2019 1.Pipelining & ILP
35/38
CS252/CullerLec 2.351/24/02
Name Dependence #2:Output dependence
InstrJ writes operand beforeInstrI writes it.
Called an output dependence by compiler writersThis also results from the reuse of name r1
If anti-dependence caused a hazard in the pipeline, called
a Write After Write (WAW) hazard
I: sub r1,r4,r3
J: addr1,r2,r3
K: mul r6,r1,r7
Instruction LevelParallelism
7/28/2019 1.Pipelining & ILP
36/38
CS252/CullerLec 2.361/24/02
36
Control Dependencies
Every instruction is control dependent on some set ofbranches, and, in general, these control dependenciesmust be preserved to preserve program order
if p1 {
S1;
};if p2 {
S2;
}
S1 is control dependent on p1, and S2 is controldependent on p2 but not on p1.
Instruction LevelParallelism
7/28/2019 1.Pipelining & ILP
37/38
CS252/CullerLec 2.371/24/02
Out-Of-Order Execution
DIVD F0,F2,F4
ADDD F10,F0,F8
SUBD F12,F8,F14
Enables out-of-order execution => out-of-order completion
7/28/2019 1.Pipelining & ILP
38/38
C 252/C ll
THANK YOU
Recommended