32
Data Hazards in Pipelined Processors Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology March 13, 2013 http://csg.csail.mit.edu/ 6.375 L11-1

Data Hazards in Pipelined Processors Arvind Computer Science & Artificial Intelligence Lab

  • Upload
    vera

  • View
    38

  • Download
    0

Embed Size (px)

DESCRIPTION

Data Hazards in Pipelined Processors Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology. A different 2-Stage pipeline: 2-Stage-DH pipeline. eEpoch. fEpoch. d2e. Fetch, Decode, RegisterFetch. Execute, Memory, WriteBack. Register File. - PowerPoint PPT Presentation

Citation preview

Page 1: Data Hazards in Pipelined Processors Arvind Computer Science & Artificial Intelligence Lab

Data Hazardsin Pipelined Processors

ArvindComputer Science & Artificial Intelligence Lab.Massachusetts Institute of Technology

March 13, 2013 http://csg.csail.mit.edu/6.375 L11-1

Page 2: Data Hazards in Pipelined Processors Arvind Computer Science & Artificial Intelligence Lab

A different 2-Stage pipeline:2-Stage-DH pipeline

PC

InstMemory

Decode

Register File

Execute

DataMemory

d2e

redi

rect

fEp

och

eEpoch

pred

FifosUse the same epoch solution for control hazards as before

Fetch, Decode, RegisterFetch Execute, Memory, WriteBack

March 13, 2013 http://csg.csail.mit.edu/6.375 L11-2

Page 3: Data Hazards in Pipelined Processors Arvind Computer Science & Artificial Intelligence Lab

Data Hazardsfetch & decode execute

d2e

time t0 t1 t2 t3 t4 t5 t6 t7 . . . .FDstage FD1 FD2 FD3 FD4 FD5EXstage EX1 EX2 EX3 EX4 EX5

I1 Add(R1,R2,R3)I2 Add(R4,R1,R2)

I2 must be stalled until I1 updates the register file

pc rf dMem

time t0 t1 t2 t3 t4 t5 t6 t7 . . . .FDstage FD1 FD2 FD2 FD3 FD4 FD5EXstage EX1 EX2 EX3 EX4 EX5

March 13, 2013 http://csg.csail.mit.edu/6.375 L11-3

Page 4: Data Hazards in Pipelined Processors Arvind Computer Science & Artificial Intelligence Lab

Dealing with data hazardsKeep track of instructions in the pipeline and determine if the register values to be fetched are stale, i.e., will be modified by some older instruction still in the pipeline. This condition is referred to as a read-after-write (RAW) hazardStall the Fetch from dispatching the instruction as long as RAW hazard prevailsRAW hazard will disappear as the pipeline drains

Scoreboard: A data structure to keep track of the instructions in the pipeline beyond the Fetch stage

March 13, 2013 http://csg.csail.mit.edu/6.375 L11-4

Page 5: Data Hazards in Pipelined Processors Arvind Computer Science & Artificial Intelligence Lab

Data HazardData hazard depends upon the match between the source registers of the fetched instruction and the destination register of an instruction already in the pipelineThe matching of source and destination must take into account whether the source and destination registers are validfunction Bool isFound (Maybe#(RIndx) dst, Maybe#(RIndx) src); return isValid(dst) && isValid(src) && (validValue(dst)==validValue(src));endfunction

March 13, 2013 http://csg.csail.mit.edu/6.375 L11-5

Page 6: Data Hazards in Pipelined Processors Arvind Computer Science & Artificial Intelligence Lab

Scoreboard: Keeping track of instructions in execution

Scoreboard: a data structure to keep track of the destination registers of the instructions beyond the fetch stage

method insert: inserts the destination (if any) of an instruction in the scoreboard when the instruction is decoded

method search1(src): searches the scoreboard for a data hazard

method search2(src): same as search1 method remove: deletes the oldest entry when an

instruction commits

March 13, 2013 http://csg.csail.mit.edu/6.375 L11-6

Page 7: Data Hazards in Pipelined Processors Arvind Computer Science & Artificial Intelligence Lab

2-Stage-DH pipeline:Scoreboard and Stall logic

PC

InstMemory

Decode

Register File

Execute

DataMemory

d2e

redi

rect

fEp

och

eEpoch

pred

scoreboard

March 13, 2013 http://csg.csail.mit.edu/6.375 L11-7

Page 8: Data Hazards in Pipelined Processors Arvind Computer Science & Artificial Intelligence Lab

2-Stage-DH pipeline correctedmodule mkProc(Proc); Reg#(Addr) pc <- mkRegU; RFile rf <- mkRFile; IMemory iMem <- mkIMemory; DMemory dMem <- mkDMemory; Fifo#(Decode2Execute) d2e <- mkFifo; Reg#(Bool) fEpoch <- mkReg(False); Reg#(Bool) eEpoch <- mkReg(False); Fifo#(Addr) execRedirect <- mkFifo;

Scoreboard#(1) sb <- mkScoreboard;// contains only one slot because Execute

// can contain at most one instruction

rule doFetch … rule doExecute …

March 13, 2013 http://csg.csail.mit.edu/6.375 L11-8

Page 9: Data Hazards in Pipelined Processors Arvind Computer Science & Artificial Intelligence Lab

2-Stage-DH pipelinedoFetch rule second attemptrule doFetch; let inst = iMem.req(pc); if(execRedirect.notEmpty) begin fEpoch <= !fEpoch; pc <= execRedirect.first; execRedirect.deq; end else begin let ppc = nextAddrPredictor(pc); pc <= ppc; let dInst = decode(inst); let stall = sb.search1(dInst.src1)|| sb.search2(dInst.src2); if(!stall) begin

let rVal1 = rf.rd1(validRegValue(dInst.src1)); let rVal2 = rf.rd2(validRegValue(dInst.src2)); d2e.enq(Decode2Execute{pc: pc, ppc: ppc,

dIinst: dInst, epoch: fEpoch, rVal1: rVal1, rVal2: rVal2}); sb.insert(dInst.rDst); end endendrule

What should happen to pc when Fetch stalls?

pc should change only when the instruction is enqueued in d2e

March 13, 2013 http://csg.csail.mit.edu/6.375 L11-9

Page 10: Data Hazards in Pipelined Processors Arvind Computer Science & Artificial Intelligence Lab

2-Stage-DH pipelinedoFetch rule correctedrule doFetch; let inst = iMem.req(pc); if(execRedirect.notEmpty) begin fEpoch <= !fEpoch; pc <= execRedirect.first; execRedirect.deq; end else begin let ppc = nextAddrPredictor(pc); pc <= ppc; let dInst = decode(inst); let stall = sb.search1(dInst.src1)|| sb.search2(dInst.src2); if(!stall) begin

let rVal1 = rf.rd1(validRegValue(dInst.src1)); let rVal2 = rf.rd2(validRegValue(dInst.src2)); d2e.enq(Decode2Execute{pc: pc, ppc: ppc,

dIinst: dInst, epoch: fEpoch, rVal1: rVal1, rVal2: rVal2}); sb.insert(dInst.rDst); end endendrule

pc <= ppc; end

March 13, 2013 http://csg.csail.mit.edu/6.375 L11-10

Page 11: Data Hazards in Pipelined Processors Arvind Computer Science & Artificial Intelligence Lab

2-Stage-DH pipelinedoExecute rule correctedrule doExecute; let x = d2e.first; let dInst = x.dInst; let pc = x.pc; let ppc = x.ppc; let epoch = x.epoch; let rVal1 = x.rVal1; let rVal2 = x.rVal2; if(epoch == eEpoch) begin let eInst = exec(dInst, rVal1, rVal2, pc, ppc); if(eInst.iType == Ld) eInst.data <- dMem.req(MemReq{op:Ld, addr:eInst.addr, data:?}); else if (eInst.iType == St) let d <- dMem.req(MemReq{op:St, addr:eInst.addr, data:eInst.data}); if (isValid(eInst.dst)) rf.wr(validRegValue(eInst.dst), eInst.data); if(eInst.mispredict) begin execRedirect.enq(eInst.addr); eEpoch <= !eEpoch; end end d2e.deq; sb.remove;endrule

March 13, 2013 http://csg.csail.mit.edu/6.375 L11-11

Page 12: Data Hazards in Pipelined Processors Arvind Computer Science & Artificial Intelligence Lab

Scoreboard method calls:concurrency and correctness issues

scoreboard must permit concurrent execution of search1, search2 and insert for this rule to execute. Should the result of search be affected by

concurrent insert? possibly concurrent remove?

rule doFetch; ... let dInst = decode(inst); let stall = sb.search1(dInst.src1)|| sb.search2(dInst.src2); if(!stall) begin ... sb.insert(dInst.rDst); pc <= ppc; end endendrule

No search {<, CF} insert

If search is affected by remove then the effect of removed instruction must have taken place. Otherwise make (search CF remove) March 13, 2013 http://csg.csail.mit.edu/6.375 L11-12

Page 13: Data Hazards in Pipelined Processors Arvind Computer Science & Artificial Intelligence Lab

Assumptions about method calls

Within a rule d2e.first ? d2e.deq redirect.first ? redirect.deq sb.search ? sb.insert

High-level correctness What must be true if sb.search > sb.remove ?

doFetch doExecuted2e

redirectRegister File

Scoreboardremovesearch insert

wrrd1 rd2

March 13, 2013 http://csg.csail.mit.edu/6.375 L11-13

The effect of the removed instruction must have taken place, i.e., any updates to the register file must be visible to the subsequent register reads

<<

<

Page 14: Data Hazards in Pipelined Processors Arvind Computer Science & Artificial Intelligence Lab

Concurrency analysisnormal Register File (rf.rd < rf.wr)

Assume both redirect and d2e are CF FIFOs, for concurrent execution doFetch ? doExecute {sb.search, sb.insert} ? sb.remove

doFetch doExecuted2e

redirectRegister File

Scoreboardremovesearch insert

wrrd1 rd2

March 13, 2013 http://csg.csail.mit.edu/6.375 L11-14

{<, CF}<

Page 15: Data Hazards in Pipelined Processors Arvind Computer Science & Artificial Intelligence Lab

Concurrency analysisBypass Register File (rf.rw < rf.wd)

Assume both redirect and d2e are CF FIFOs, for concurrent execution doExecute ? doFetch sb.remove ? {sb.search, sb.insert}

doFetch doExecuted2e

redirectRegister File

Scoreboardremovesearch insert

wrrd1 rd2

March 13, 2013 http://csg.csail.mit.edu/6.375 L11-15

{<, CF}<

Page 16: Data Hazards in Pipelined Processors Arvind Computer Science & Artificial Intelligence Lab

Performance analysis

{d2e.first, d2e.deq} {<, CF} d2e.enq Will pipelined d2e Fifo improve performance?

redirect.enq {<, CF} {redirect.first, redirect.deq} Will bypassed redirect Fifo improve performance ?

sb.remove {<, CF} {sb.search, sb.insert} Why CF Scoreboard is bad for performance?

doFetch doExecuted2e

redirectRegister File

Scoreboardremovesearch insert

wrrd1 rd2doExecute < doFetch

rf.wr < rf.rd

March 13, 2013 http://csg.csail.mit.edu/6.375 L11-16

No

Yes, by redirecting the PC a cycle earlier

It negates the advantage of bypass register file by unnecessarily stalling on an instruction whose register updates are visible to fetch

Page 17: Data Hazards in Pipelined Processors Arvind Computer Science & Artificial Intelligence Lab

2-Stage-DH pipelinewith proper specification of Fifos, rf, scoreboardmodule mkProc(Proc); Reg#(Addr) pc <- mkRegU; RFile rf <- mkBypassRFile; IMemory iMem <- mkIMemory; DMemory dMem <- mkDMemory; Fifo#(Decode2Execute) d2e <- mkPipelineFifo; Reg#(Bool) fEpoch <- mkReg(False); Reg#(Bool) eEpoch <- mkReg(False); Fifo#(Addr) execRedirect <- mkBypassFifo;

Scoreboard#(1) sb <- mkPipelineScoreboard;// contains only one slot because Execute

// can contain at most one instruction

rule doFetch … rule doExecute …

March 13, 2013 http://csg.csail.mit.edu/6.375 L11-17

Can a destination register name appear more than once in the scoreboard ?

Page 18: Data Hazards in Pipelined Processors Arvind Computer Science & Artificial Intelligence Lab

Need to check for WAW hazards

If multiple instructions in the scoreboard can update the register which the current instruction wants to read, then the current instruction has to read the update for the youngest of those instructionsThis issue can be avoided if only one instruction in the pipeline is allowed to update a particular register

This means we have to stall if an instruction writes to the same destination as a previous instruction (WAW hazard)

This may negate some advantage of bypassing A more advanced solution to avoid WAW hazards is

register renamingMarch 13, 2013 http://csg.csail.mit.edu/6.375 L11-18

Page 19: Data Hazards in Pipelined Processors Arvind Computer Science & Artificial Intelligence Lab

Fetch rule with bypassingrule doFetch; let inst = iMem.req(pc); if(execRedirect.notEmpty) begin fEpoch <= !fEpoch; pc <= execRedirect.first; execRedirect.deq; end else begin let ppc = nextAddrPredictor(pc); let dInst = decode(inst); let stall = sb.search1(dInst.src1)|| sb.search2(dInst.src2); || sb.search3(dInst.dst); if(!stall) begin

let rVal1 = rf.rd1(validRegValue(dInst.src1)); let rVal2 = rf.rd2(validRegValue(dInst.src2)); d2e.enq(Decode2Execute{pc: pc, ppc: ppc,

dIinst: dInst, epoch: fEpoch, rVal1: rVal1, rVal2: rVal2}); sb.insert(dInst.rDst); pc <= ppc; end endendrule

March 13, 2013 http://csg.csail.mit.edu/6.375 L11-19

Page 20: Data Hazards in Pipelined Processors Arvind Computer Science & Artificial Intelligence Lab

Multi-stage pipeline with Data Hazards

March 13, 2013 http://csg.csail.mit.edu/6.375 L11-20

Page 21: Data Hazards in Pipelined Processors Arvind Computer Science & Artificial Intelligence Lab

3-Stage-DH pipeline

PC

InstMemory

Decode

Register File

Execute

DataMemory

d2e n

extP

C

fEp

och

eEpoch

pred e2c

scoreboard

Exec2Commit{Maybe#(RIndx)dst, Data data};March 13, 2013 http://csg.csail.mit.edu/6.375 L11-21

Page 22: Data Hazards in Pipelined Processors Arvind Computer Science & Artificial Intelligence Lab

3-Stage-DH pipelinemodule mkProc(Proc); Reg#(Addr) pc <- mkRegU; RFile rf <- mkBypassRFile; IMemory iMem <- mkIMemory; DMemory dMem <- mkDMemory; Fifo#(1, Decode2Execute) d2e <- mkPipelineFifo; Fifo#(1, Exec2Commit) e2c <- mkPipelineFifo;

Scoreboard#(2) sb <- mkPipelineScoreboard;// contains two instructions

Reg#(Bool) fEpoch <- mkReg(False); Reg#(Bool) eEpoch <- mkReg(False); Fifo#(Addr) execRedirect <- mkBypassFifo;

March 13, 2013 http://csg.csail.mit.edu/6.375 L11-22

Page 23: Data Hazards in Pipelined Processors Arvind Computer Science & Artificial Intelligence Lab

3-Stage-DH pipelinedoFetch rulerule doFetch; let inst = iMem.req(pc); if(execRedirect.notEmpty) begin fEpoch <= !fEpoch; pc <= execRedirect.first; execRedirect.deq; end else begin let ppc = nextAddrPredictor(pc); let dInst = decode(inst); let stall = sb.search1(dInst.src1)|| sb.search2(dInst.src2) || sb.search3(dInst.dst);; if(!stall) begin

let rVal1 = rf.rd1(validRegValue(dInst.src1)); let rVal2 = rf.rd2(validRegValue(dInst.src2)); d2e.enq(Decode2Execute{pc: pc, ppc: ppc,

dIinst: dInst, epoch: fEpoch, rVal1: rVal1, rVal2: rVal2}); sb.insert(dInst.rDst); pc <= ppc; end endendrule Unchanged from 2-stage DH

March 13, 2013 http://csg.csail.mit.edu/6.375 L11-23

Page 24: Data Hazards in Pipelined Processors Arvind Computer Science & Artificial Intelligence Lab

3-Stage-DH pipelinedoExecute rulerule doExecute; let x = d2e.first; let dInst = x.dInst; let pc = x.pc; let ppc = x.ppc; let epoch = x.epoch; let rVal1 = x.rVal1; let rVal2 = x.rVal2; if(epoch == eEpoch) begin let eInst = exec(dInst, rVal1, rVal2, pc, ppc); if(eInst.iType == Ld) eInst.data <- dMem.req(MemReq{op:Ld, addr:eInst.addr, data:?}); else if (eInst.iType == St) let d <- dMem.req(MemReq{op:St, addr:eInst.addr, data:eInst.data}); if (isValid(eInst.dst)) rf.wr(validRegValue(eInst.dst), eInst.data); if(eInst.mispredict) begin execRedirect.enq(eInst.addr); eEpoch <= !eEpoch; end end d2e.deq; sb.remove;endrule

e2c.enq(Exec2Commit{dst:eInst.dst, data:eInst.data});

else e2c.enq(Exec2Commit{dst:Invalid, data:?});

March 13, 2013 http://csg.csail.mit.edu/6.375 L11-24

Page 25: Data Hazards in Pipelined Processors Arvind Computer Science & Artificial Intelligence Lab

3-Stage-DH pipelinedoCommit rule rule doCommit; let dst = eInst.first.dst; let data = eInst.first.data; if(isValid(dst)) rf.wr(tuple2(validValue(dst), data); e2c.deq; sb.remove; endrule

March 13, 2013 http://csg.csail.mit.edu/6.375 L11-25

Page 26: Data Hazards in Pipelined Processors Arvind Computer Science & Artificial Intelligence Lab

6-Stage-DH pipeline

PC

InstMemory

Decode

Register File

Execute

DataMemory

d2e n

extP

C

fEp

och

eEpoch

pred e2c

scoreboard

Fetch Decode RegRead Execute Mem WBMarch 13, 2013 http://csg.csail.mit.edu/6.375 L11-26

Page 27: Data Hazards in Pipelined Processors Arvind Computer Science & Artificial Intelligence Lab

Take awaySystematic application of pipelining requires a deep understanding of hazards especially in the presence of concurrent operationsPerformance issues are subtle

For instance, the value of having a bypass network depends on how frequently it is exercised by programs

March 13, 2013 http://csg.csail.mit.edu/6.375 L11-27

Page 28: Data Hazards in Pipelined Processors Arvind Computer Science & Artificial Intelligence Lab

Module implementationsscoreboard can be implemented using searchable Fifos in a straightforward manner

March 13, 2013 L11-28http://csg.csail.mit.edu/6.375

Page 29: Data Hazards in Pipelined Processors Arvind Computer Science & Artificial Intelligence Lab

Normal Register Filemodule mkRFile(RFile); Vector#(32,Reg#(Data)) rfile <- replicateM(mkReg(0));

method Action wr(RIndx rindx, Data data); if(rindx!=0) rfile[rindx] <= data; endmethod method Data rd1(RIndx rindx) = rfile[rindx]; method Data rd2(RIndx rindx) = rfile[rindx];endmodule

{rd1, rd2} < wr

March 13, 2013 http://csg.csail.mit.edu/6.375 L11-29

Page 30: Data Hazards in Pipelined Processors Arvind Computer Science & Artificial Intelligence Lab

Bypass Register File using EHRmodule mkBypassRFile(RFile); Vector#(32,Ehr#(2, Data)) rfile <- replicateM(mkEhr(0));

method Action wr(RIndx rindx, Data data); if(rindex!==0) (rfile[rindex])[0] <= data; endmethod method Data rd1(RIndx rindx) = (rfile[rindx])[1]; method Data rd2(RIndx rindx) = (rfile[rindx])[1];endmodule

wr < {rd1, rd2}

March 13, 2013 http://csg.csail.mit.edu/6.375 L11-30

Page 31: Data Hazards in Pipelined Processors Arvind Computer Science & Artificial Intelligence Lab

Bypass Register Filewith external bypassingmodule mkBypassRFile(BypassRFile); RFile rf <- mkRFile; Fifo#(1, Tuple2#(RIndx, Data)) bypass <- mkBypassSFifo; rule move; begin rf.wr(bypass.first); bypass.deq end; endrule method Action wr(RIndx rindx, Data data); if(rindex!==0) bypass.enq(tuple2(rindx, data)); endmethod method Data rd1(RIndx rindx) = return (!bypass.search1(rindx)) ? rf.rd1(rindx) : bypass.read1(rindx); method Data rd2(RIndx rindx) = return (!bypass.search2(rindx)) ? rf.rd2(rindx) : bypass.read2(rindx);endmodule

{rf.rd1, rf.rd2} < rf.wr

wr < {rd1, rd2}March 13, 2013 http://csg.csail.mit.edu/6.375 L11-31

Page 32: Data Hazards in Pipelined Processors Arvind Computer Science & Artificial Intelligence Lab

Scoreboard implementationusing searchable Fifosfunction Bool isFound (Maybe#(RIndx) dst, Maybe#(RIndx) src); return isValid(dst) && isValid(src) && (validValue(dst)==validValue(src));endfunction

module mkCFScoreboard(Scoreboard#(size)); SFifo#(size, Maybe#(RIndx), Maybe#(RIndx)) f <- mkCFSFifo(isFound);  method insert = f.enq; method remove = f.deq; method search1 = f.search1; method search2 = f.search2;endmodule 

March 13, 2013 http://csg.csail.mit.edu/6.375 L11-32