1.Pipelining & ILP

  • Upload
    ccn07

  • View
    221

  • Download
    0

Embed Size (px)

Citation preview

  • 7/28/2019 1.Pipelining & ILP

    1/38

    CS252/CullerLec 2.11/24/02

    BYN R REJIN PAUL

    LECTURER,CSE DEPT

    CS2354 - Advanced Computer Architecture

    Pipelining

  • 7/28/2019 1.Pipelining & ILP

    2/38

    CS252/CullerLec 2.21/24/02

    Review: Visualizing Pipelining

    Ins

    tr.

    Ord

    er

    Time (clock cycles)

    RegALU

    DMemIfetch Reg

    RegALU

    DMemIfetch Reg

    RegALU

    DMemIfetch Reg

    RegALU

    DMemIfetch Reg

    Cycle 1Cycle 2 Cycle 3 Cycle 4 Cycle 6 Cycle 7Cycle 5

  • 7/28/2019 1.Pipelining & ILP

    3/38

    CS252/CullerLec 2.31/24/02

    Limits to pipelining

    Hazards: circumstances that would causeincorrect execution if next instruction werelaunched Structural hazards: Attempting to use the same hardware to

    do two different things at the same time

    Data hazards: Instruction depends on result of priorinstruction still in the pipeline

    Control hazards: Caused by delay between the fetching ofinstructions and decisions about changes in control flow

    (branches and jumps).

  • 7/28/2019 1.Pipelining & ILP

    4/38

    CS252/CullerLec 2.41/24/02

    Example: One Memory Port/StructuralHazard

    I

    nstr.

    Order

    Time (clock cycles)

    Load

    Instr 1

    Instr 2

    Instr 3

    Instr 4

    RegALU

    DMemIfetch Reg

    RegALU

    DMemIfetch Reg

    RegALU

    DMemIfetch Reg

    RegALU

    DMemIfetch Reg

    Cycle 1Cycle 2 Cycle 3 Cycle 4 Cycle 6 Cycle 7Cycle 5

    DMem

    Structural Hazard

  • 7/28/2019 1.Pipelining & ILP

    5/38

    CS252/CullerLec 2.51/24/02

    Resolving structural hazards

    Defn: attempt to use same hardware fortwo different things at the same time

    Solution 1: Waitmust detect the hazard

    must have mechanism to stall

    Solution 2: Throw more hardware at theproblem

  • 7/28/2019 1.Pipelining & ILP

    6/38

    CS252/CullerLec 2.61/24/02

    Detecting and Resolving Structural Hazard

    I

    nstr.

    Order

    Time (clock cycles)

    Load

    Instr 1

    Instr 2

    Stall

    Instr 3

    RegALU

    DMemIfetch Reg

    RegALU

    DMemIfetch Reg

    RegALU

    DMemIfetch Reg

    Cycle 1Cycle 2 Cycle 3 Cycle 4 Cycle 6 Cycle 7Cycle 5

    RegALU

    DMemIfetch Reg

    Bubble Bubble Bubble BubbleBubble

  • 7/28/2019 1.Pipelining & ILP

    7/38

    CS252/CullerLec 2.71/24/02

    Instr.

    Or

    der

    addr1,r2,r3

    sub r4,r1,r3

    and r6,r1,r7

    or r8,r1,r9

    xor r10,r1,r11

    RegALU

    DMemIfetch Reg

    RegALU

    DMemIfetch Reg

    Data Hazards

    Time (clock cycles)

    IF ID/RF EX MEM WB

    RegALU

    DMemIfetch Reg

    RegALU

    DMemIfetch Reg

    RegALU

    DMemIfetch Reg

  • 7/28/2019 1.Pipelining & ILP

    8/38

    CS252/CullerLec 2.81/24/02

    Read After Write (RAW)InstrJ tries to read operand before InstrI writes it

    Caused by a Data Dependence (in compilernomenclature). This hazard results from an actualneed for communication.

    Three Generic Data Hazards

    I: addr1,r2,r3J: sub r4,r1,r3

  • 7/28/2019 1.Pipelining & ILP

    9/38

    CS252/CullerLec 2.91/24/02

    Write After Read (WAR)InstrJ writes operand beforeInstrI reads it

    Called an anti-dependence by compiler writers.This results from reuse of the name r1.

    Cant happen in MIPS 5 stage pipeline because: All instructions take 5 stages, and

    Reads are always in stage 2, and

    Writes are always in stage 5

    I: sub r4,r1,r3

    J: addr1,r2,r3

    K: mul r6,r1,r7

    Three Generic Data Hazards

  • 7/28/2019 1.Pipelining & ILP

    10/38

    CS252/CullerLec 2.101/24/02

    Three Generic Data Hazards

    Write After Write (WAW)InstrJ writes operand beforeInstrI writes it.

    Called an output dependence by compiler writersThis also results from the reuse of name r1.

    Cant happen in MIPS 5 stage pipeline because:

    All instructions take 5 stages, and Writes are always in stage 5

    Will see WAR and WAW in later more complicatedpipes

    I: sub r1,r4,r3

    J: addr1,r2,r3

    K: mul r6,r1,r7

  • 7/28/2019 1.Pipelining & ILP

    11/38

    CS252/CullerLec 2.111/24/02

    Time (clock cycles)

    Forwarding to Avoid Data Hazard

    Inst

    r.

    Order

    addr1,r2,r3

    sub r4,r1,r3

    and r6,r1,r7

    or r8,r1,r9

    xor r10,r1,r11

    RegALU

    DMemIfetch Reg

    RegALU

    DMemIfetch Reg

    RegALU

    DMemIfetch Reg

    Reg ALU

    DMemIfetchReg

    RegALU

    DMemIfetch Reg

  • 7/28/2019 1.Pipelining & ILP

    12/38

    CS252/CullerLec 2.121/24/02

    Time (clock cycles)

    I

    nstr.

    Order

    lwr1, 0(r2)

    sub r4,r1,r6

    and r6,r1,r7

    or r8,r1,r9

    Data Hazard Even with Forwarding

    RegALU

    DMemIfetch Reg

    RegALU

    DMemIfetch Reg

    Reg ALU

    DMemIfetch Reg

    RegALU

    DMemIfetch Reg

  • 7/28/2019 1.Pipelining & ILP

    13/38

    CS252/CullerLec 2.131/24/02

    Resolving this load hazard

    Adding hardware? ... not

    Detection?

    Compilation techniques?

    What is the cost of load delays?

  • 7/28/2019 1.Pipelining & ILP

    14/38

    CS252/CullerLec 2.141/24/02

    Resolving the Load Data Hazard

    Time (clock cycles)

    or r8,r1,r9

    Ins

    tr.

    Ord

    er

    lwr1, 0(r2)

    sub r4,r1,r6

    and r6,r1,r7

    RegALU

    DMemIfetch Reg

    RegIfetchALU

    DMem RegBubble

    IfetchALU

    DMem RegBubble Reg

    IfetchALU

    DMemBubble Reg

  • 7/28/2019 1.Pipelining & ILP

    15/38

    CS252/CullerLec 2.151/24/02

    Control Hazard on Branches=> Three Stage Stall

    10: beq r1,r3,36

    14: and r2,r3,r5

    18: or r6,r1,r7

    22: add r8,r1,r9

    36: xor r10,r1,r11

    RegALU

    DMemIfetch Reg

    RegALU

    DMemIfetch Reg

    RegALU

    DMemIfetch Reg

    RegALU

    DMemIfetch Reg

    RegALU

    DMemIfetch Reg

  • 7/28/2019 1.Pipelining & ILP

    16/38

    CS252/CullerLec 2.161/24/02

    Example: Branch Stall Impact

    Two part solution: Determine branch taken or not sooner, AND

    Compute taken branch address earlier

    MIPS branch tests if register = 0 or 0 MIPS Solution:

    Move Zero test to ID/RF stage

    Adder to calculate new PC in ID/RF stage

    1 clock cycle penalty for branch versus 3

  • 7/28/2019 1.Pipelining & ILP

    17/38

    CS252/CullerLec 2.171/24/02

    Adder

    IF/ID

    Pipelined MIPS Datapath

    Memory

    Access

    Write

    Back

    Instruction

    Fetch

    Instr. Decode

    Reg. Fetch

    Execute

    Addr. Calc

    ALU

    Memory

    RegFile

    MUX

    Data

    Memory

    MUX

    SignExtend

    Zero?

    MEM/WB

    EX/MEM

    4

    Adder

    NextSEQ PC

    RD RD RDWBData

    Next PC

    Address

    RS1

    RS2

    Imm

    MUX

    ID/EX

  • 7/28/2019 1.Pipelining & ILP

    18/38

    CS252/CullerLec 2.181/24/02

    OVERCOME Branch Hazard Alternatives

    #1: Stall until branch direction is clear

    #2: Predict Branch Not Taken

    #3: Predict Branch Taken

  • 7/28/2019 1.Pipelining & ILP

    19/38

    CS252/CullerLec 2.191/24/02

    Extending Pipeline To HandleMulticycle operations

    Floating point numbers have two parts Exponents &significant

    We should deal with the exponent and significantseperately

    Example:3.25 X10^3

    2.63 X10^-1

    3.25 X 10^3

    0.000236 X10^3 -Shift the smaller to right until match

    3.250263 X10^3

  • 7/28/2019 1.Pipelining & ILP

    20/38

    CS252/CullerLec 2.201/24/02

    Cont

    So some algorithm needs to be implemented in order toperform the operation

    Functional unit should be redesigned to perform alloperations and this type of functional unit require

    longer pipeline cycleLatency in the functional unit :

    - Latency is defined as the number ofintervening cycles between an instruction that produces

    a result and an instruction that uses the result. initiation interval :

    number of cycles that must elapsebetween issuing two operations of a given type

  • 7/28/2019 1.Pipelining & ILP

    21/38

    CS252/CullerLec 2.211/24/02

    Cont.

  • 7/28/2019 1.Pipelining & ILP

    22/38

    CS252/CullerLec 2.221/24/02

    Latencies and initiation intervals for FU

  • 7/28/2019 1.Pipelining & ILP

    23/38

    CS252/CullerLec 2.231/24/02

    Pipeline support for FP operations

  • 7/28/2019 1.Pipelining & ILP

    24/38

    CS252/CullerLec 2.241/24/02

    FP example

  • 7/28/2019 1.Pipelining & ILP

    25/38

    CS252/CullerLec 2.251/24/02

    Cont.

  • 7/28/2019 1.Pipelining & ILP

    26/38

    CS252/CullerLec 2.261/24/02

    Cont.

    Assuming that the pipeline does all hazard detection inID, there are three checks that must be performedbefore an instruction can issue:

    Check For Structural Hazards: Wait until the required

    functional unit is available Check for a RAW data hazard: Wait until the sourceregisters are not listed aspending destinations in apipeline register that will not be available

    Check for a WAW data hazard: Determine if anyinstruction in Al, . A4,D, Ml, . . . , M7 has thesame register destination as this instruction.

  • 7/28/2019 1.Pipelining & ILP

    27/38

    CS252/CullerLec 2.271/24/02

    Instruction Level Parallelism

    Pipelining can overlap the execution of instructionswhen they are independent of one another. Thispotential overlap among instructions is calledinstruction-level parallelism (ILP) since theinstructions can be evaluated in parallel.

    Instruction-level parallelism (ILP) is a measure ofhow many of the operations in a computer programcan be performed simultaneously. Consider thefollowing program:

    1. e = a + b

    2. f = c + d

    3. g = e * f

  • 7/28/2019 1.Pipelining & ILP

    28/38

    CS252/CullerLec 2.281/24/02

    Instruction Level Parallelism

    Operation 3 depends on the results of operations 1and 2

    3 cannot be calculated until both of them arecompleted

    operations 1 and 2 do not depend on any otheroperation, so they can be calculated simultaneously.

    A goal of compiler and processor designers is toidentify and take advantage of as much ILP aspossible. Ordinary programs are typically writtenunder a sequential execution model where instructionsexecute one after the other and in the orderspecified by the programmer.

  • 7/28/2019 1.Pipelining & ILP

    29/38

    CS252/CullerLec 2.291/24/02

    Instruction Level Parallelism

    The simplest and most common way to increase theamount of parallelism available among instructions isto exploit parallelism among iterations of a loop.This type of parallelism is often called loop-levelparallelism.

    Example 1for (i=1; i

  • 7/28/2019 1.Pipelining & ILP

    30/38

    CS252/CullerLec 2.301/24/02

    Instruction Level Parallelism

    for (i=1; i

  • 7/28/2019 1.Pipelining & ILP

    31/38

    CS252/CullerLec 2.311/24/02

    Ideas To Reduce Stalls

    Technique ReducesDynamic scheduling Data hazard stalls

    Dynamic branch

    prediction

    Control stalls

    Issuing multiple

    instructions per cycle

    Ideal CPI

    Speculation Data and control stalls

    Dynamic memory

    disambiguation

    Data hazard stalls involving

    memory

    Loop unrolling Control hazard stalls

    Basic compiler pipeline

    scheduling

    Data hazard stalls

    Compiler dependence

    analysis

    Ideal CPI and data hazard stalls

    Software pipelining and

    trace scheduling

    Ideal CPI and data hazard stalls

    Compiler speculation Ideal CPI, data and control stalls

  • 7/28/2019 1.Pipelining & ILP

    32/38

    CS252/CullerLec 2.321/24/02

    InstrJ is data dependent on InstrIInstrJ tries to read operand before InstrI writes it

    or InstrJ is data dependent on InstrK which is dependent onInstrI

    Caused by a TrueDependence (compiler term)

    If true dependence caused a hazard in the pipeline, called aRead After Write (RAW) hazard

    I: addr1,r2,r3

    J: sub r4,r1,r3

    Data Dependence andHazards

    Instruction LevelParallelism

  • 7/28/2019 1.Pipelining & ILP

    33/38

    CS252/CullerLec 2.331/24/02

    Dependences are a property of programs

    Presence of dependence indicates potential for a hazard,but actual hazard and length of any stall is a property ofthe pipeline

    Importance of the data dependencies1) indicates the possibility of a hazard

    2) determines order in which results must be calculated

    Today looking at HW schemes to avoid hazard

    Data Dependence andHazards

    Instruction LevelParallelism

  • 7/28/2019 1.Pipelining & ILP

    34/38

    CS252/CullerLec 2.341/24/02

    Name dependence: when 2 instructions use same register ormemory location, called a name, but no flow of databetween the instructions associated with that name; 2versions of name dependence

    InstrJ writes operand beforeInstrI reads it

    Called an anti-dependence by compiler writers.This results from reuse of the name r1

    If anti-dependence caused a hazard in the pipeline, called aWrite After Read (WAR) hazard

    I: sub r4,r1,r3

    J: addr1,r2,r3

    K: mul r6,r1,r7

    Name Dependence #1:Anti-dependence

    Instruction LevelParallelism

  • 7/28/2019 1.Pipelining & ILP

    35/38

    CS252/CullerLec 2.351/24/02

    Name Dependence #2:Output dependence

    InstrJ writes operand beforeInstrI writes it.

    Called an output dependence by compiler writersThis also results from the reuse of name r1

    If anti-dependence caused a hazard in the pipeline, called

    a Write After Write (WAW) hazard

    I: sub r1,r4,r3

    J: addr1,r2,r3

    K: mul r6,r1,r7

    Instruction LevelParallelism

  • 7/28/2019 1.Pipelining & ILP

    36/38

    CS252/CullerLec 2.361/24/02

    36

    Control Dependencies

    Every instruction is control dependent on some set ofbranches, and, in general, these control dependenciesmust be preserved to preserve program order

    if p1 {

    S1;

    };if p2 {

    S2;

    }

    S1 is control dependent on p1, and S2 is controldependent on p2 but not on p1.

    Instruction LevelParallelism

  • 7/28/2019 1.Pipelining & ILP

    37/38

    CS252/CullerLec 2.371/24/02

    Out-Of-Order Execution

    DIVD F0,F2,F4

    ADDD F10,F0,F8

    SUBD F12,F8,F14

    Enables out-of-order execution => out-of-order completion

  • 7/28/2019 1.Pipelining & ILP

    38/38

    C 252/C ll

    THANK YOU