Upload
sirius
View
27
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Blusepc-5: Dead cycles, bubbles and Forwarding in Pipelines Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology. Topics. Simultaneous enq & deq in a FIFO The RWire solution Dead cycle elimination in the IP circular pipeline code - PowerPoint PPT Presentation
Citation preview
February 23, 2007 http://csg.csail.mit.edu/6.375/ L08-1
Blusepc-5: Dead cycles, bubbles and Forwarding in Pipelines
Arvind Computer Science & Artificial Intelligence LabMassachusetts Institute of Technology
February 23, 2007 L08-2http://csg.csail.mit.edu/6.375/
Topics
Simultaneous enq & deq in a FIFOThe RWire solutionDead cycle elimination in the IP circular pipeline codeTwo-stage processor pipelineValue forwarding to reduce bubbles
February 23, 2007 L08-3http://csg.csail.mit.edu/6.375/
Implicit guards (conditions)
Rulerule <name> (<guard>); <action>; endrule
where
<action> ::= r <= <exp>
| m.g(<exp>)
| if (<exp>) <actions> endif
make implicit guards explicit
m.gB(<exp>) when m.gG
February 23, 2007 L08-4http://csg.csail.mit.edu/6.375/
Guards vs If’sA guard on one action of a parallel group of actions affects every action within the group(a1 when p1); (a2 when p2)
==> (a1; a2) when (p1 && p2)
A condition of a Conditional action only affects the actions within the scope of the conditional action(if (p1) a1); a2
p1 has no effect on a2 ... Mixing ifs and whens(if (p) (a1 when q)) ; a2
((if (p) a1); a2) when (p&q | !p)
February 23, 2007 L08-5http://csg.csail.mit.edu/6.375/
Example: making guards explicit
rule recirculate (True); if (p) fifo.enq(8); r <= 7;endrule
rule recirculate ((p && fifo.engG) || !p);
if (p) fifo.enqB(8);
r <= 7;endrule
February 23, 2007 L08-6http://csg.csail.mit.edu/6.375/
A problem ... (from the last lecture)
rule recirculate (True); TableEntry p <- ram.resp(); match {.rip, .tok} = fifo.first(); if (isLeaf(p)) cbuf.put(tok, p); else begin fifo.enq(tuple2(rip << 8, tok)); ram.req(p+signExtend(rip[15:8])); end fifo.deq();endrule
The fifo needs to be able to do enq and deq simultaneously for this rule to make sense
February 23, 2007 L08-7http://csg.csail.mit.edu/6.375/
module mkFIFO1 (FIFO#(t)); Reg#(t) data <- mkRegU(); Reg#(Bool) full <- mkReg(False); method Action enq(t x) if (!full); full <= True; data <= x; endmethod method Action deq() if (full); full <= False; endmethod method t first() if (full); return (data); endmethod method Action clear(); full <= False; endmethodendmodule
One Element FIFOenq and deq cannot even be enabled together much less fire concurrently!
n
not empty
not full rdyenab
rdyenab
enq
deq
FIFO
module
The functionality we want is as if deq “happens” before enq; if deq does not happen then enq behaves normally
February 23, 2007 L08-8http://csg.csail.mit.edu/6.375/
RWire to rescue
interface RWire#(type t);method Action wset(t x);method Maybe#(t) wget();
endinterface
Like a register in that you can read and write it but unlike a register
- read happens after write- data disappears in the next cycle
February 23, 2007 L08-9http://csg.csail.mit.edu/6.375/
module mkLFIFO1 (FIFO#(t)); Reg#(t) data <- mkRegU(); Reg#(Bool) full <- mkReg(False); RWire#(void) deqEN <- mkRWire(); method Action enq(t x) if
(!full || isValid (deqEN.wget())); full <= True; data <= x; endmethod method Action deq() if (full); full <= False; deqEN.wset(?); endmethod method t first() if (full); return (data); endmethod method Action clear(); full <= False; endmethodendmodule
One Element “Loopy” FIFO
not empty
not full rdyenab
rdyenab
enq
deq
FIFO
module
or
!full
February 23, 2007 L08-10http://csg.csail.mit.edu/6.375/
Problem solved!
rule recirculate (True); TableEntry p <- ram.resp(); match {.rip, .tok} = fifo.first(); if (isLeaf(p)) cbuf.put(tok, p); else begin fifo.enq(tuple2(rip << 8, tok)); ram.req(p+signExtend(rip[15:8])); end fifo.deq();endrule
What if fifo is empty?
LFIFO fifo <- mkLFIFO; // use a loopy fifo
February 23, 2007 L08-11http://csg.csail.mit.edu/6.375/
The Dead Cycle Problem
rule enter (True); Token tok <- cbuf.getToken(); IP ip = inQ.first(); ram.req(ext(ip[31:16])); fifo.enq(tuple2(ip[15:0], tok)); inQ.deq();endrule
enter?enter?done?done?RAM
cbufinQ
fifo
rule recirculate (True); TableEntry p <- ram.resp(); match {.rip, .tok} = fifo.first(); if (isLeaf(p)) cbuf.put(tok, p); else begin fifo.enq(tuple2(rip << 8, tok)); ram.req(p+signExtend(rip[15:8])); end fifo.deq();endrule
Can a new request enter the system simultaneously with an old one leaving?
February 23, 2007 L08-12http://csg.csail.mit.edu/6.375/
Scheduling conflicting rules
When two rules conflict on a shared resource, they cannot both execute in the same clockThe compiler produces logic that ensures that, when both rules are applicable, only one will fire Which one? source annotations
(* descending_urgency = “recirculateh, enter” *)
February 23, 2007 L08-13http://csg.csail.mit.edu/6.375/
A slightly simpler example
rule enter (True); IP ip = inQ.first(); ram.req(ip[31:16]); fifo.enq(ip[15:0]); inQ.deq();endrule
enter?enter?done?done?RAM
inQ
fifo
rule recirculate (True); TableEntry p = ram.peek(); ram.deq(); IP rip = fifo.first(); if (isLeaf(p)) outQ.enq(p); else begin fifo.enq(rip << 8); ram.req(p + rip[15:8]); end fifo.deq();endrule
In general these two rules conflict but when isLeaf(p) is true there is no apparent conflict!
February 23, 2007 L08-14http://csg.csail.mit.edu/6.375/
Rule Splitingrule foo (True); if (p) r1 <= 5; else r2 <= 7;endrule
rule fooT (p); r1 <= 5;endrule
rule fooF (!p); r2 <= 7;endrule
rule fooT and fooF can be scheduled independently with some other rule
February 23, 2007 L08-15http://csg.csail.mit.edu/6.375/
Spliting the recirculate rulerule recirculate (!isLeaf(ram.peek())); IP rip = fifo.first(); fifo.enq(rip << 8); ram.req(ram.peek() + rip[15:8]); fifo.deq(); ram.deq();endrule
rule exit (isLeaf(ram.peek())); outQ.enq(ram.peek()); fifo.deq(); ram.deq();endrule
rule enter (True); IP ip = inQ.first(); ram.req(ip[31:16]); fifo.enq(ip[15:0]); inQ.deq();endrule
Now rules enter and exit can be scheduled simultaneously
February 23, 2007 L08-16http://csg.csail.mit.edu/6.375/
Sometimes rule splitting is not possiblerule recirculate (True); TableEntry p <- ram.resp(); match {.rip, .tok} = fifo.first(); if (isLeaf(p)) cbuf.put(tok, p); else begin fifo.enq(tuple2(rip << 8, tok)); ram.req(p+signExtend(rip[15:8])); end fifo.deq();endrule
You will have to resort to interface changes and/or the use of RWires
makes life difficult
February 23, 2007 L08-17http://csg.csail.mit.edu/6.375/
Packaging a module:Turning a rule into a method
inQ
enter?enter?done?done?RAM
cbuf
fifo
rule enter (True); Token t <- cbuf.getToken(); IP ip = inQ.first(); ram.req(ip[31:16]); fifo.enq(tuple2(ip[15:0], t)); inQ.deq();endrule method Action enter (IP ip);
Token t <- cbuf.getToken(); ram.req(ip[31:16]); fifo.enq(tuple2(ip[15:0], t));endmethod
February 23, 2007 http://csg.csail.mit.edu/6.375/ L08-18
Processor with a two-stage pipeline
5-minute break to stretch you legs
February 23, 2007 L08-19http://csg.csail.mit.edu/6.375/
Processor Pipelines and FIFOs
fetch execute
iMem
rf
CPU
decode memory
pc
write-back
dMem
February 23, 2007 L08-20http://csg.csail.mit.edu/6.375/
interface SFIFO#(type t, type tr); method Action enq(t); // enqueue an item method Action deq(); // remove oldest entry method t first(); // inspect oldest item method Action clear(); // make FIFO empty method Bool find(tr); // search FIFOendinterface
SFIFO (glue between stages)
n = # of bits needed to represent the values of type “t“
m = # of bits needed to represent the values of type “tr"
not full
not empty
not empty
rdyenab
n
n
rdyenab
rdy
enq
deq
first S
FIFO
module
clea
renab
find
mbool
more on searchable FIFOs later
February 23, 2007 L08-21http://csg.csail.mit.edu/6.375/
Two-Stage Pipeline
fetch & decode
execute
pc rfCPU
bumodule mkCPU#(Mem iMem, Mem dMem)(Empty); Reg#(Iaddress) pc <- mkReg(0);
RegFile#(RName, Bit#(32)) rf <- mkRegFileFull();SFIFO#(InstTemplate, RName) bu
<- mkSFifo(findf); Instr instr = iMem.read(pc); Iaddress predIa = pc + 1; InstTemplate it = bu.first();
rule fetch_decode ...endmodule
February 23, 2007 L08-22http://csg.csail.mit.edu/6.375/
Instructions & Templates
typedef union tagged { struct {RName dst; Value op1; Value op2} EAdd; struct {Value cond; Iaddress tAddr} EBz; struct {RName dst; Daddress addr} ELoad; struct {Value data; Daddress addr} EStore;} InstTemplate deriving(Eq, Bits);
typedef union tagged { struct {RName dst; RName src1; RName src2} Add; struct {RName cond; RName addr} Bz; struct {RName dst; RName addr} Load; struct {RName value; RName addr} Store;} Instr deriving(Bits, Eq);
typedef Bit#(32) Iaddress;typedef Bit#(32) Daddress;typedef Bit#(32) Value;
February 23, 2007 L08-23http://csg.csail.mit.edu/6.375/
Rules for Add
rule decodeAdd(instr matches Add{dst:.rd,src1:.ra,src2:.rb}) bu.enq (EAdd{dst:rd,op1:rf[ra],op2:rf[rb]}); pc <= predIa;endrule
rule executeAdd(it matches EAdd{dst:.rd,op1:.va,op2:.vb}) rf.upd(rd, va + vb); bu.deq();endrule
implicit check:
implicit check:
fetch & decode
execute
pc rfCPU
bu
bu notfull
bu notempty
February 23, 2007 L08-24http://csg.csail.mit.edu/6.375/
Fetch & Decode Rule: Reexamined
Wrong! Because instructions in bu may be modifying ra or rb
stall !
fetch & decode
execute
pc rfCPU
bu
rule decodeAdd (instr matches Add{dst:.rd,src1:.ra,src2:.rb}) bu.enq (EAdd{dst:rd, op1:rf[ra], op2:rf[rb]});
pc <= predIa;endrule
February 23, 2007 L08-25http://csg.csail.mit.edu/6.375/
Fetch & Decode Rule: corrected
fetch & decode
execute
pc rfCPU
bu
rule decodeAdd (instr matches Add{dst:.rd,src1:.ra,src2:.rb} bu.enq (EAdd{dst:rd, op1:rf[ra], op2:rf[rb]}); pc <= predIa;endrule
&&& !bu.find(ra) &&& !bu.find(rb))
February 23, 2007 L08-26http://csg.csail.mit.edu/6.375/
Rules for Branch
rule decodeBz(instr matches Bz{cond:.rc,addr:.addr}) &&& !bu.find(rc) &&& !bu.find(addr)); bu.enq (EBz{cond:rf[rc],addr:rf[addr]}); pc <= predIa; endrule
rule bzTaken(it matches EBz{cond:.vc,addr:.va}) &&& (vc==0)); pc <= va; bu.clear(); endrulerule bzNotTaken (it matches EBz{cond:.vc,addr:.va}) &&& (vc != 0)); bu.deq; endrule
fetch & decode
execute
pc rfCPU
bu
rule-atomicity ensures thatpc update, anddiscard of pre-fetched instrs in bu, are doneconsistently
February 23, 2007 L08-27http://csg.csail.mit.edu/6.375/
The Stall SignalBool stall = case (instr) matches tagged Add {dst:.rd,src1:.ra,src2:.rb}:
return (bu.find(ra) || bu.find(rb)); tagged Bz {cond:.rc,addr:.addr}:
return (bu.find(rc) || bu.find(addr)); tagged Load {dst:.rd,addr:.addr}:
return (bu.find(addr)); tagged Store {value:.v,addr:.addr}:
return (bu.find(v)) || bu.find(addr)); endcase;
Need to extend the fifo interface with the “find” method where “find” searches the fifo using the findf function
February 23, 2007 L08-28http://csg.csail.mit.edu/6.375/
Parameterization: The Stall Functionfunction Bool stallfunc (Instr instr,
SFIFO#(InstTemplate, RName) bu); case (instr) matches tagged Add {dst:.rd,src1:.ra,src2:.rb}:
return (bu.find(ra) || bu.find(rb)); tagged Bz {cond:.rc,addr:.addr}:
return (bu.find(rc) || bu.find(addr)); tagged Load {dst:.rd,addr:.addr}:
return (bu.find(addr)); tagged Store {value:.v,addr:.addr}:
return (bu.find(v)) || bu.find(addr)); endcaseendfunction
We need to include the following call in the mkCPU module
Bool stall = stallfunc(instr, bu); no extra gates!
February 23, 2007 L08-29http://csg.csail.mit.edu/6.375/
The findf functionfunction Bool findf (RName r, InstrTemplate it); case (it) matches tagged EAdd{dst:.rd,op1:.ra,op2:.rb}:
return (r == rd); tagged EBz {cond:.c,addr:.a}:
return (False); tagged ELoad{dst:.rd,addr:.a}:
return (r == rd); tagged EStore{value:.v,addr:.a}:
return (False); endcaseendfunction
SFIFO#(InstrTemplate, RName) bu <- mkSFifo(findf);
mkSFifo can be parameterized by the search function!
no extra gates!
February 23, 2007 L08-30http://csg.csail.mit.edu/6.375/
Fetch & Decode Rulerule fetch_and_decode(!stall); case (instr) matches tagged Add {dst:.rd,src1:.ra,src2:.rb}: bu.enq(EAdd{dst:rd,op1:rf[ra],op2:rf[rb]}); tagged Bz {cond:.rc,addr:.addr}: bu.enq(EBz{cond:rf[rc],addr:rf[addr]}); tagged Load {dst:.rd,addr:.addr}: bu.enq(ELoad{dst:rd,addr:rf[addr]}); tagged Store{value:.v,addr:.addr}: bu.enq(EStore{value:rf[v],addr:rf[addr]}); endcase pc<= predIa;endrule
February 23, 2007 L08-31http://csg.csail.mit.edu/6.375/
Fetch & Decode Rule another style
InstrTemplate newIt = case (instr) matches tagged Add {dst:.rd,src1:.ra,src2:.rb}: return EAdd{dst:rd,op1:rf[ra],op2:rf[rb]}; tagged Bz {cond:.rc,addr:.addr}: return EBz{cond:rf[rc],addr:rf[addr]}; tagged Load {dst:.rd,addr:.addr}: return ELoad{dst:rd,addr:rf[addr]}; tagged Store{value:.v,addr:.addr}: return EStore{value:rf[v],addr:rf[addr]}; endcase;
rule fetch_and_decode (!stall); bu.enq(newIt); pc <= predIa;endrule
Conceptually cleaner; hides unnecessary details
February 23, 2007 L08-32http://csg.csail.mit.edu/6.375/
Execute Rulerule execute (True); case (it) matches tagged EAdd{dst:.rd,src1:.va,src2:.vb}: begin rf.upd(rd, va+vb); bu.deq(); end tagged EBz {cond:.cv,addr:.av}: if (cv == 0) then begin
pc <= av; bu.clear(); end else bu.deq(); tagged ELoad{dst:.rd,addr:.av}: begin rf.upd(rd, dMem.read(av)); bu.deq(); end tagged EStore{value:.vv,addr:.av}: begin dMem.write(av, vv); bu.deq(); end endcaseendrule Next time simultaneous execution of
fetch and execute rules