32
February 23, 2007 http:// csg.csail.mit.edu/6.375/ L08-1 Blusepc-5: Dead cycles, bubbles and Forwarding in Pipelines Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology

Blusepc-5: Dead cycles, bubbles and Forwarding in Pipelines Arvind

  • Upload
    sirius

  • View
    27

  • Download
    0

Embed Size (px)

DESCRIPTION

Blusepc-5: Dead cycles, bubbles and Forwarding in Pipelines Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology. Topics. Simultaneous enq & deq in a FIFO The RWire solution Dead cycle elimination in the IP circular pipeline code - PowerPoint PPT Presentation

Citation preview

Page 1: Blusepc-5:  Dead cycles, bubbles and Forwarding in Pipelines Arvind

February 23, 2007 http://csg.csail.mit.edu/6.375/ L08-1

Blusepc-5: Dead cycles, bubbles and Forwarding in Pipelines

Arvind Computer Science & Artificial Intelligence LabMassachusetts Institute of Technology

Page 2: Blusepc-5:  Dead cycles, bubbles and Forwarding in Pipelines Arvind

February 23, 2007 L08-2http://csg.csail.mit.edu/6.375/

Topics

Simultaneous enq & deq in a FIFOThe RWire solutionDead cycle elimination in the IP circular pipeline codeTwo-stage processor pipelineValue forwarding to reduce bubbles

Page 3: Blusepc-5:  Dead cycles, bubbles and Forwarding in Pipelines Arvind

February 23, 2007 L08-3http://csg.csail.mit.edu/6.375/

Implicit guards (conditions)

Rulerule <name> (<guard>); <action>; endrule

where

<action> ::= r <= <exp>

| m.g(<exp>)

| if (<exp>) <actions> endif

make implicit guards explicit

m.gB(<exp>) when m.gG

Page 4: Blusepc-5:  Dead cycles, bubbles and Forwarding in Pipelines Arvind

February 23, 2007 L08-4http://csg.csail.mit.edu/6.375/

Guards vs If’sA guard on one action of a parallel group of actions affects every action within the group(a1 when p1); (a2 when p2)

==> (a1; a2) when (p1 && p2)

A condition of a Conditional action only affects the actions within the scope of the conditional action(if (p1) a1); a2

p1 has no effect on a2 ... Mixing ifs and whens(if (p) (a1 when q)) ; a2

((if (p) a1); a2) when (p&q | !p)

Page 5: Blusepc-5:  Dead cycles, bubbles and Forwarding in Pipelines Arvind

February 23, 2007 L08-5http://csg.csail.mit.edu/6.375/

Example: making guards explicit

rule recirculate (True); if (p) fifo.enq(8); r <= 7;endrule

rule recirculate ((p && fifo.engG) || !p);

if (p) fifo.enqB(8);

r <= 7;endrule

Page 6: Blusepc-5:  Dead cycles, bubbles and Forwarding in Pipelines Arvind

February 23, 2007 L08-6http://csg.csail.mit.edu/6.375/

A problem ... (from the last lecture)

rule recirculate (True); TableEntry p <- ram.resp(); match {.rip, .tok} = fifo.first(); if (isLeaf(p)) cbuf.put(tok, p); else begin fifo.enq(tuple2(rip << 8, tok)); ram.req(p+signExtend(rip[15:8])); end fifo.deq();endrule

The fifo needs to be able to do enq and deq simultaneously for this rule to make sense

Page 7: Blusepc-5:  Dead cycles, bubbles and Forwarding in Pipelines Arvind

February 23, 2007 L08-7http://csg.csail.mit.edu/6.375/

module mkFIFO1 (FIFO#(t)); Reg#(t) data <- mkRegU(); Reg#(Bool) full <- mkReg(False); method Action enq(t x) if (!full); full <= True; data <= x; endmethod method Action deq() if (full); full <= False; endmethod method t first() if (full); return (data); endmethod method Action clear(); full <= False; endmethodendmodule

One Element FIFOenq and deq cannot even be enabled together much less fire concurrently!

n

not empty

not full rdyenab

rdyenab

enq

deq

FIFO

module

The functionality we want is as if deq “happens” before enq; if deq does not happen then enq behaves normally

Page 8: Blusepc-5:  Dead cycles, bubbles and Forwarding in Pipelines Arvind

February 23, 2007 L08-8http://csg.csail.mit.edu/6.375/

RWire to rescue

interface RWire#(type t);method Action wset(t x);method Maybe#(t) wget();

endinterface

Like a register in that you can read and write it but unlike a register

- read happens after write- data disappears in the next cycle

Page 9: Blusepc-5:  Dead cycles, bubbles and Forwarding in Pipelines Arvind

February 23, 2007 L08-9http://csg.csail.mit.edu/6.375/

module mkLFIFO1 (FIFO#(t)); Reg#(t) data <- mkRegU(); Reg#(Bool) full <- mkReg(False); RWire#(void) deqEN <- mkRWire(); method Action enq(t x) if

(!full || isValid (deqEN.wget())); full <= True; data <= x; endmethod method Action deq() if (full); full <= False; deqEN.wset(?); endmethod method t first() if (full); return (data); endmethod method Action clear(); full <= False; endmethodendmodule

One Element “Loopy” FIFO

not empty

not full rdyenab

rdyenab

enq

deq

FIFO

module

or

!full

Page 10: Blusepc-5:  Dead cycles, bubbles and Forwarding in Pipelines Arvind

February 23, 2007 L08-10http://csg.csail.mit.edu/6.375/

Problem solved!

rule recirculate (True); TableEntry p <- ram.resp(); match {.rip, .tok} = fifo.first(); if (isLeaf(p)) cbuf.put(tok, p); else begin fifo.enq(tuple2(rip << 8, tok)); ram.req(p+signExtend(rip[15:8])); end fifo.deq();endrule

What if fifo is empty?

LFIFO fifo <- mkLFIFO; // use a loopy fifo

Page 11: Blusepc-5:  Dead cycles, bubbles and Forwarding in Pipelines Arvind

February 23, 2007 L08-11http://csg.csail.mit.edu/6.375/

The Dead Cycle Problem

rule enter (True); Token tok <- cbuf.getToken(); IP ip = inQ.first(); ram.req(ext(ip[31:16])); fifo.enq(tuple2(ip[15:0], tok)); inQ.deq();endrule

enter?enter?done?done?RAM

cbufinQ

fifo

rule recirculate (True); TableEntry p <- ram.resp(); match {.rip, .tok} = fifo.first(); if (isLeaf(p)) cbuf.put(tok, p); else begin fifo.enq(tuple2(rip << 8, tok)); ram.req(p+signExtend(rip[15:8])); end fifo.deq();endrule

Can a new request enter the system simultaneously with an old one leaving?

Page 12: Blusepc-5:  Dead cycles, bubbles and Forwarding in Pipelines Arvind

February 23, 2007 L08-12http://csg.csail.mit.edu/6.375/

Scheduling conflicting rules

When two rules conflict on a shared resource, they cannot both execute in the same clockThe compiler produces logic that ensures that, when both rules are applicable, only one will fire Which one? source annotations

(* descending_urgency = “recirculateh, enter” *)

Page 13: Blusepc-5:  Dead cycles, bubbles and Forwarding in Pipelines Arvind

February 23, 2007 L08-13http://csg.csail.mit.edu/6.375/

A slightly simpler example

rule enter (True); IP ip = inQ.first(); ram.req(ip[31:16]); fifo.enq(ip[15:0]); inQ.deq();endrule

enter?enter?done?done?RAM

inQ

fifo

rule recirculate (True); TableEntry p = ram.peek(); ram.deq(); IP rip = fifo.first(); if (isLeaf(p)) outQ.enq(p); else begin fifo.enq(rip << 8); ram.req(p + rip[15:8]); end fifo.deq();endrule

In general these two rules conflict but when isLeaf(p) is true there is no apparent conflict!

Page 14: Blusepc-5:  Dead cycles, bubbles and Forwarding in Pipelines Arvind

February 23, 2007 L08-14http://csg.csail.mit.edu/6.375/

Rule Splitingrule foo (True); if (p) r1 <= 5; else r2 <= 7;endrule

rule fooT (p); r1 <= 5;endrule

rule fooF (!p); r2 <= 7;endrule

rule fooT and fooF can be scheduled independently with some other rule

Page 15: Blusepc-5:  Dead cycles, bubbles and Forwarding in Pipelines Arvind

February 23, 2007 L08-15http://csg.csail.mit.edu/6.375/

Spliting the recirculate rulerule recirculate (!isLeaf(ram.peek())); IP rip = fifo.first(); fifo.enq(rip << 8); ram.req(ram.peek() + rip[15:8]); fifo.deq(); ram.deq();endrule

rule exit (isLeaf(ram.peek())); outQ.enq(ram.peek()); fifo.deq(); ram.deq();endrule

rule enter (True); IP ip = inQ.first(); ram.req(ip[31:16]); fifo.enq(ip[15:0]); inQ.deq();endrule

Now rules enter and exit can be scheduled simultaneously

Page 16: Blusepc-5:  Dead cycles, bubbles and Forwarding in Pipelines Arvind

February 23, 2007 L08-16http://csg.csail.mit.edu/6.375/

Sometimes rule splitting is not possiblerule recirculate (True); TableEntry p <- ram.resp(); match {.rip, .tok} = fifo.first(); if (isLeaf(p)) cbuf.put(tok, p); else begin fifo.enq(tuple2(rip << 8, tok)); ram.req(p+signExtend(rip[15:8])); end fifo.deq();endrule

You will have to resort to interface changes and/or the use of RWires

makes life difficult

Page 17: Blusepc-5:  Dead cycles, bubbles and Forwarding in Pipelines Arvind

February 23, 2007 L08-17http://csg.csail.mit.edu/6.375/

Packaging a module:Turning a rule into a method

inQ

enter?enter?done?done?RAM

cbuf

fifo

rule enter (True); Token t <- cbuf.getToken(); IP ip = inQ.first(); ram.req(ip[31:16]); fifo.enq(tuple2(ip[15:0], t)); inQ.deq();endrule method Action enter (IP ip);

Token t <- cbuf.getToken(); ram.req(ip[31:16]); fifo.enq(tuple2(ip[15:0], t));endmethod

Page 18: Blusepc-5:  Dead cycles, bubbles and Forwarding in Pipelines Arvind

February 23, 2007 http://csg.csail.mit.edu/6.375/ L08-18

Processor with a two-stage pipeline

5-minute break to stretch you legs

Page 19: Blusepc-5:  Dead cycles, bubbles and Forwarding in Pipelines Arvind

February 23, 2007 L08-19http://csg.csail.mit.edu/6.375/

Processor Pipelines and FIFOs

fetch execute

iMem

rf

CPU

decode memory

pc

write-back

dMem

Page 20: Blusepc-5:  Dead cycles, bubbles and Forwarding in Pipelines Arvind

February 23, 2007 L08-20http://csg.csail.mit.edu/6.375/

interface SFIFO#(type t, type tr); method Action enq(t); // enqueue an item method Action deq(); // remove oldest entry method t first(); // inspect oldest item method Action clear(); // make FIFO empty method Bool find(tr); // search FIFOendinterface

SFIFO (glue between stages)

n = # of bits needed to represent the values of type “t“

m = # of bits needed to represent the values of type “tr"

not full

not empty

not empty

rdyenab

n

n

rdyenab

rdy

enq

deq

first S

FIFO

module

clea

renab

find

mbool

more on searchable FIFOs later

Page 21: Blusepc-5:  Dead cycles, bubbles and Forwarding in Pipelines Arvind

February 23, 2007 L08-21http://csg.csail.mit.edu/6.375/

Two-Stage Pipeline

fetch & decode

execute

pc rfCPU

bumodule mkCPU#(Mem iMem, Mem dMem)(Empty); Reg#(Iaddress) pc <- mkReg(0);

RegFile#(RName, Bit#(32)) rf <- mkRegFileFull();SFIFO#(InstTemplate, RName) bu

<- mkSFifo(findf); Instr instr = iMem.read(pc); Iaddress predIa = pc + 1; InstTemplate it = bu.first();

rule fetch_decode ...endmodule

Page 22: Blusepc-5:  Dead cycles, bubbles and Forwarding in Pipelines Arvind

February 23, 2007 L08-22http://csg.csail.mit.edu/6.375/

Instructions & Templates

typedef union tagged { struct {RName dst; Value op1; Value op2} EAdd; struct {Value cond; Iaddress tAddr} EBz; struct {RName dst; Daddress addr} ELoad; struct {Value data; Daddress addr} EStore;} InstTemplate deriving(Eq, Bits);

typedef union tagged { struct {RName dst; RName src1; RName src2} Add; struct {RName cond; RName addr} Bz; struct {RName dst; RName addr} Load; struct {RName value; RName addr} Store;} Instr deriving(Bits, Eq);

typedef Bit#(32) Iaddress;typedef Bit#(32) Daddress;typedef Bit#(32) Value;

Page 23: Blusepc-5:  Dead cycles, bubbles and Forwarding in Pipelines Arvind

February 23, 2007 L08-23http://csg.csail.mit.edu/6.375/

Rules for Add

rule decodeAdd(instr matches Add{dst:.rd,src1:.ra,src2:.rb}) bu.enq (EAdd{dst:rd,op1:rf[ra],op2:rf[rb]}); pc <= predIa;endrule

rule executeAdd(it matches EAdd{dst:.rd,op1:.va,op2:.vb}) rf.upd(rd, va + vb); bu.deq();endrule

implicit check:

implicit check:

fetch & decode

execute

pc rfCPU

bu

bu notfull

bu notempty

Page 24: Blusepc-5:  Dead cycles, bubbles and Forwarding in Pipelines Arvind

February 23, 2007 L08-24http://csg.csail.mit.edu/6.375/

Fetch & Decode Rule: Reexamined

Wrong! Because instructions in bu may be modifying ra or rb

stall !

fetch & decode

execute

pc rfCPU

bu

rule decodeAdd (instr matches Add{dst:.rd,src1:.ra,src2:.rb}) bu.enq (EAdd{dst:rd, op1:rf[ra], op2:rf[rb]});

pc <= predIa;endrule

Page 25: Blusepc-5:  Dead cycles, bubbles and Forwarding in Pipelines Arvind

February 23, 2007 L08-25http://csg.csail.mit.edu/6.375/

Fetch & Decode Rule: corrected

fetch & decode

execute

pc rfCPU

bu

rule decodeAdd (instr matches Add{dst:.rd,src1:.ra,src2:.rb} bu.enq (EAdd{dst:rd, op1:rf[ra], op2:rf[rb]}); pc <= predIa;endrule

&&& !bu.find(ra) &&& !bu.find(rb))

Page 26: Blusepc-5:  Dead cycles, bubbles and Forwarding in Pipelines Arvind

February 23, 2007 L08-26http://csg.csail.mit.edu/6.375/

Rules for Branch

rule decodeBz(instr matches Bz{cond:.rc,addr:.addr}) &&& !bu.find(rc) &&& !bu.find(addr)); bu.enq (EBz{cond:rf[rc],addr:rf[addr]}); pc <= predIa; endrule

rule bzTaken(it matches EBz{cond:.vc,addr:.va}) &&& (vc==0)); pc <= va; bu.clear(); endrulerule bzNotTaken (it matches EBz{cond:.vc,addr:.va}) &&& (vc != 0)); bu.deq; endrule

fetch & decode

execute

pc rfCPU

bu

rule-atomicity ensures thatpc update, anddiscard of pre-fetched instrs in bu, are doneconsistently

Page 27: Blusepc-5:  Dead cycles, bubbles and Forwarding in Pipelines Arvind

February 23, 2007 L08-27http://csg.csail.mit.edu/6.375/

The Stall SignalBool stall = case (instr) matches tagged Add {dst:.rd,src1:.ra,src2:.rb}:

return (bu.find(ra) || bu.find(rb)); tagged Bz {cond:.rc,addr:.addr}:

return (bu.find(rc) || bu.find(addr)); tagged Load {dst:.rd,addr:.addr}:

return (bu.find(addr)); tagged Store {value:.v,addr:.addr}:

return (bu.find(v)) || bu.find(addr)); endcase;

Need to extend the fifo interface with the “find” method where “find” searches the fifo using the findf function

Page 28: Blusepc-5:  Dead cycles, bubbles and Forwarding in Pipelines Arvind

February 23, 2007 L08-28http://csg.csail.mit.edu/6.375/

Parameterization: The Stall Functionfunction Bool stallfunc (Instr instr,

SFIFO#(InstTemplate, RName) bu); case (instr) matches tagged Add {dst:.rd,src1:.ra,src2:.rb}:

return (bu.find(ra) || bu.find(rb)); tagged Bz {cond:.rc,addr:.addr}:

return (bu.find(rc) || bu.find(addr)); tagged Load {dst:.rd,addr:.addr}:

return (bu.find(addr)); tagged Store {value:.v,addr:.addr}:

return (bu.find(v)) || bu.find(addr)); endcaseendfunction

We need to include the following call in the mkCPU module

Bool stall = stallfunc(instr, bu); no extra gates!

Page 29: Blusepc-5:  Dead cycles, bubbles and Forwarding in Pipelines Arvind

February 23, 2007 L08-29http://csg.csail.mit.edu/6.375/

The findf functionfunction Bool findf (RName r, InstrTemplate it); case (it) matches tagged EAdd{dst:.rd,op1:.ra,op2:.rb}:

return (r == rd); tagged EBz {cond:.c,addr:.a}:

return (False); tagged ELoad{dst:.rd,addr:.a}:

return (r == rd); tagged EStore{value:.v,addr:.a}:

return (False); endcaseendfunction

SFIFO#(InstrTemplate, RName) bu <- mkSFifo(findf);

mkSFifo can be parameterized by the search function!

no extra gates!

Page 30: Blusepc-5:  Dead cycles, bubbles and Forwarding in Pipelines Arvind

February 23, 2007 L08-30http://csg.csail.mit.edu/6.375/

Fetch & Decode Rulerule fetch_and_decode(!stall); case (instr) matches tagged Add {dst:.rd,src1:.ra,src2:.rb}: bu.enq(EAdd{dst:rd,op1:rf[ra],op2:rf[rb]}); tagged Bz {cond:.rc,addr:.addr}: bu.enq(EBz{cond:rf[rc],addr:rf[addr]}); tagged Load {dst:.rd,addr:.addr}: bu.enq(ELoad{dst:rd,addr:rf[addr]}); tagged Store{value:.v,addr:.addr}: bu.enq(EStore{value:rf[v],addr:rf[addr]}); endcase pc<= predIa;endrule

Page 31: Blusepc-5:  Dead cycles, bubbles and Forwarding in Pipelines Arvind

February 23, 2007 L08-31http://csg.csail.mit.edu/6.375/

Fetch & Decode Rule another style

InstrTemplate newIt = case (instr) matches tagged Add {dst:.rd,src1:.ra,src2:.rb}: return EAdd{dst:rd,op1:rf[ra],op2:rf[rb]}; tagged Bz {cond:.rc,addr:.addr}: return EBz{cond:rf[rc],addr:rf[addr]}; tagged Load {dst:.rd,addr:.addr}: return ELoad{dst:rd,addr:rf[addr]}; tagged Store{value:.v,addr:.addr}: return EStore{value:rf[v],addr:rf[addr]}; endcase;

rule fetch_and_decode (!stall); bu.enq(newIt); pc <= predIa;endrule

Conceptually cleaner; hides unnecessary details

Page 32: Blusepc-5:  Dead cycles, bubbles and Forwarding in Pipelines Arvind

February 23, 2007 L08-32http://csg.csail.mit.edu/6.375/

Execute Rulerule execute (True); case (it) matches tagged EAdd{dst:.rd,src1:.va,src2:.vb}: begin rf.upd(rd, va+vb); bu.deq(); end tagged EBz {cond:.cv,addr:.av}: if (cv == 0) then begin

pc <= av; bu.clear(); end else bu.deq(); tagged ELoad{dst:.rd,addr:.av}: begin rf.upd(rd, dMem.read(av)); bu.deq(); end tagged EStore{value:.vv,addr:.av}: begin dMem.write(av, vv); bu.deq(); end endcaseendrule Next time simultaneous execution of

fetch and execute rules