Upload
imogen-quinn
View
215
Download
0
Embed Size (px)
Citation preview
© Wen-mei Hwu and S. J. Patel, 2005ECE 412, University of Illinois
Lecture 10-11 Instruction Execution:
Dynamic Scheduling
© Wen-mei Hwu and S. J. Patel, 2005ECE 412, University of Illinois
Outline
• General concepts– dataflow– dynamic scheduling with Tomasulo’s
Algorithm
• The P6 Execution Microarchitecture
• Dynamic Scheduling Issues
© Wen-mei Hwu and S. J. Patel, 2005ECE 412, University of Illinois
The Execution Problem
InstructionSupply
ExecutionMechanism
DataSupply
We are able to deliver instructions at high bandwidth, and we have techniquesfor high bandwidth, low-latency data supply. But nothing matters if we cannotconsume everything at high bandwidth in the execution mechanism. We need toexecute instructions in parallel.
Fundamental problem: taking things in the order prescribed by the programmerwill cause instruction dependencies to limit parallel execution of instructions.
© Wen-mei Hwu and S. J. Patel, 2005ECE 412, University of Illinois
Dynamic Scheduling
• Reservation Station
• Renaming
• Retirement/Recovery
• Memory Disambiguation
Tomasulo’s Algorithm
© Wen-mei Hwu and S. J. Patel, 2005ECE 412, University of Illinois
Dataflow Concepts
1. MUL Ra, Rb -> Rm2. ADD Rc, Rd -> Rn3. SUB Rm, Rn -> Rx4. ADD Rr, Rs -> Rm5. ADD Rt, Rv -> Rn6. DIV Rm, Rn -> Ry
x = (a * b) - (c + d);y = (r + s) / (t + v);
Source Code Machine Code
1 2
3
4 5
6Dataflow Graph
© Wen-mei Hwu and S. J. Patel, 2005ECE 412, University of Illinois
Data Dependences
• Data flow dependence– consumer-producer relationship– register bypass and interlocks
• Data output and antidependences– reuse of registers at compile time– register renaming
© Wen-mei Hwu and S. J. Patel, 2005ECE 412, University of Illinois
Interlocking
• Allow instruction to execute only when data and resources ready– simple interlocking based on bypass
logic for short pipelines– scoreboarding for deep pipelines– Tomasulo’s Algorithm for out-of-order
instruction dispatch
© Wen-mei Hwu and S. J. Patel, 2005ECE 412, University of Illinois
Tomasulo’s Algorithm
• Invented for IBM 360-91 FPU• First published in 1967(IBM
Journal)• Not for general CPU design until
1990’s.– branch prediction and exception
recovery problems solved
© Wen-mei Hwu and S. J. Patel, 2005ECE 412, University of Illinois
Tomasulo’s Algorithm
• Register renaming– tags for values
• Out-of-order execution– reservation stations
• Data forwarding– common data bus
© Wen-mei Hwu and S. J. Patel, 2005ECE 412, University of Illinois
Tomasulo’s Algorithm
• Instruction decode– fetch register file for value and tag– tag is handle for data currently being
generated– determine RS to hold the decoded
operations
© Wen-mei Hwu and S. J. Patel, 2005ECE 412, University of Illinois
Reservation Station
• Hardware mechanism that enables instructions to execute out-of-order and as early as their source operands are ready.
• An instruction waits in the RS until the tags for its source operands have been broadcast by their producers.
© Wen-mei Hwu and S. J. Patel, 2005ECE 412, University of Illinois
Tomasulo’s Algorithm
• Instruction Issue– insert operation and operands into
reservation station entry asisgned– mark destination register as not ready
© Wen-mei Hwu and S. J. Patel, 2005ECE 412, University of Illinois
Tomasulo’s Algorithm
• Operation dispatch– identify operations ready for
execution– determine highest priority operation
for each port/function unit
© Wen-mei Hwu and S. J. Patel, 2005ECE 412, University of Illinois
Tomasulo’s Algorithm
• Data forwarding– result value and tag distributed to RS
entries for associative search– result value and tag delivered to
destination register for potential update
© Wen-mei Hwu and S. J. Patel, 2005ECE 412, University of Illinois
Renaming
• Objective: want to eliminate WAR and WAW (false dependencies)
• Renaming happens in program order
• Renaming requires a table to map between architectural registers and physical registers
© Wen-mei Hwu and S. J. Patel, 2005ECE 412, University of Illinois
Retirement
• What happens if we inadvertently execute an instruction that should not have been executed (i.e., branch misprediction) or execute an instruction incorrectly (i.e., exception)?
• Need to flush all bad instructions and make it look as if they never executed.
• And then start executing from the correct point.
© Wen-mei Hwu and S. J. Patel, 2005ECE 412, University of Illinois
Retirement using Reorder Buffer
Reorder Buffer
tail pointer
head pointer
Insts, in program order
An instruction that reachesthe head and executes without exception can be safely retired
Values from Data Bus
•Flushing inflight instructions is easy – clear out RS and ROB
•Recovering RAT state is hard. That’s where the ROB comes in.
© Wen-mei Hwu and S. J. Patel, 2005ECE 412, University of Illinois
Putting it all together
Register Alias Table
Reservation Stations
FU FU
ReorderBuffer
© Wen-mei Hwu and S. J. Patel, 2005ECE 412, University of Illinois
Memory Disambiguation1. MUL Ra, Rb -> Rm2. ADD Rc, Rd -> Rn3. ST Rm -> 0(Rn)4. LD 0(Rs) -> Rm5. ADD Rt, Rv -> Rn6. DIV Rm, Rn -> Ry
1 2
3
4 5
6???
Depends if Rn == Rs
© Wen-mei Hwu and S. J. Patel, 2005ECE 412, University of Illinois
Conceptual Memory Order Buffer
L/S Addr ValueV V
Loads/Storesin program order
• Stores write into buffer and pass to memory only after they reach the head and are retired.
•What about loads?
• Could go in order (highly conservative)
•Could wait until all previous unknown store addresses are known (not so conservative)
•Could go as soon as address is known (optimistic)
© Wen-mei Hwu and S. J. Patel, 2005ECE 412, University of Illinois
The P6 Execution Microarchitecture
[making dynamic scheduling work at wide issue]
RenamingScheduling/Execution
Memory
Retirement
Fetch/Decode
in-order in-orderout-of-order
© Wen-mei Hwu and S. J. Patel, 2005ECE 412, University of Illinois
The P6 Register Alias Table
ROB Entry NumberRRF Valid
Srcs for μop0
Srcs for μop1
Srcs for μop2
Dests for μops
ROBAllocator
• If the producer has already retired, the value is in the Retirement Register File (RRF Valid is 1)
•If the producer has not retired, then the value will have to be provided by the Reorder Buffer at the ROB Entry Number indicated in the RAT (RRF Valid is 0)
From retire (Dest, ROB entry #s)
Physical sources
© Wen-mei Hwu and S. J. Patel, 2005ECE 412, University of Illinois
ReOrder Buffer (ROB) Psrc Read and Pdest Write
V Value Dest StatusPSrcs for μop0
PSrcs for μop1
PSrcs for μop2
PDests for μopsfrom allocator
Values for Psrcs
Execution results and from function units
© Wen-mei Hwu and S. J. Patel, 2005ECE 412, University of Illinois
Retirement Register FilePsrc Read
PSrcs for μop0
PSrcs for μop1
PSrcs for μop2
Values for Psrcs
Value
From ReOrder Buffer retirement
© Wen-mei Hwu and S. J. Patel, 2005ECE 412, University of Illinois
Issue
RAT
RRF
ROB
ReservationStation
Rename (RAT access)
Register Read (Also ROB allocate)
Issue(RS allocate)
© Wen-mei Hwu and S. J. Patel, 2005ECE 412, University of Illinois
P6 Reservation Station
Entry Valid
Psrc0 tag
Psrc0 data
Psrc0V
OpcodePsrc1
tagPsrc1 data
Psrc1V
ROBEntry # Up to three μops
per cycle are addedto the ResStation
© Wen-mei Hwu and S. J. Patel, 2005ECE 412, University of Illinois
Execution
Reservation Station
IntegerUnit1
IntegerUnit0
Loadaddrgen
Storeaddrgen
Floatingpointunit
Memory Order Buffer
Port0Port1Port2Port3Port4
To Reorder Buffer
Data Cache
© Wen-mei Hwu and S. J. Patel, 2005ECE 412, University of Illinois
Memory Order Buffer
Address
• Allocation happens in order, at issue.
• Store data is buffered in MOB until retirement of that store.
•STIDs correspond to the entry of the previous store.
•P6 Rule: STs must go in-order wrt other STs. LDs can go out-of-order wrt to other LDs and STs.
•LDs go as soon as address is ready. Clean up at retirement.
L/SStore
ID
ST Addr LD Addr
ST Data
© Wen-mei Hwu and S. J. Patel, 2005ECE 412, University of Illinois
Retirement
V Value Dest Status
Head Pointer
•If Status indicates all is OK, then the value is written, or committed, to the RRF. Also, the (Dest and ROB entry number) is sent to RAT to potentially set RRF Valid bit.
•If Status indicates something went wrong, then a recovery action is started.
•Up to 3 uops can be retired per cycle.
Reorder Buffer
© Wen-mei Hwu and S. J. Patel, 2005ECE 412, University of Illinois
Recovery
• ROB – flush all insts.
• RS – flush all insts.
• RRF – do nothing.
• RAT – Make all entries indicate RRF valid.
• Sent new PC to Fetch Mechanism
© Wen-mei Hwu and S. J. Patel, 2005ECE 412, University of Illinois
Reservation Station Alternative Designs
• Value capture reservation stations v.s. tag-only reservation stations– Pentium IV adjusts tags rather than
moves values when retiring an instruction
– Need to keep entries in ROB longer until they no longer safe keep retired value visible to the subsequent instructions
© Wen-mei Hwu and S. J. Patel, 2005ECE 412, University of Illinois
Other thoughts
• How many cycles for branch misprediction?
• Read Sohi and Smith for more general concepts
• Read about the MIPS 10K for details on an alternative implementation
© Wen-mei Hwu and S. J. Patel, 2005ECE 412, University of Illinois
Data Dependencies
• Read After Write– Flow
• Write After Write– Anti
• Write After Read– Output
1. MUL Ra, Rb -> Rm
3. SUB Rm, Rn -> Rx
1. MUL Ra, Rb -> Rm
4. ADD Rr, Rs -> Rm
3. SUB Rm, Rn -> Rx
4. ADD Rr, Rs -> Rm