View
218
Download
0
Tags:
Embed Size (px)
Citation preview
EENG449b/SavvidesLec 10.1
2/20/04
February 12, 2004
Prof. Andreas Savvides
Spring 2004
http://www.eng.yale.edu/courses/eeng449bG
EENG 449bG/CPSC 439bG Computer Systems
Lecture 10
Instruction Level Parallelism I
EENG449b/SavvidesLec 10.2
2/20/04
Announcements
• Homeworks returned today, solutions available from the TA
• Midterm next Thursday– Chapters 1, 2, Appendix A and 2 papers
» Paper on choosing a DSP processor» Paper & Lecture on Dynamic Voltage Scaling
• Lab office hours tomorrow 12:00 – 1:30– Stop by AKW000 if you have problems starting your
projects on the motes or OKI boards
• Homework solutions available from the TA
EENG449b/SavvidesLec 10.3
2/20/04
Instruction Level Parallelism
• Reading for this lecture: Chapter 3, pages 172 – 196
• Chapter 3: ILP in hardware• Recall
stalls Control stalls hazard Data stalls Structural CPI pipeline Ideal CPI Pipeline
ILP tries to minimize these termsthrough the overlapped execution
of instructions
EENG449b/SavvidesLec 10.4
2/20/04
Where is the maximal gain in ILP?
• Basic block – a straight line code sequence with no branches in it except the entry and the exit point
• Limited amount of parallelism within a basic block
– Instructions depend on each other so they cannot be reordered
– In typical MIPS programs dynamic branch frequency between 15 – 25% ( 4 – 7 ) instructions between a pair of branches
• Need to exploit parallelism across multiple blocks
EENG449b/SavvidesLec 10.5
2/20/04
Loops : an example for parallelism
for (i=1; i <= 1000; i=i+1)
x[i] = x[i] + y[i];
• Loop iterations can overlap – loop level parallelism
• Main technique – loop unrolling– Can be done either in hardware or software
• So what kind of dependencies do we need to worry about?
EENG449b/SavvidesLec 10.6
2/20/04
Data Dependences
• An instruction i is data depended on instruction j if:
– Instruction i produces a result used by instruction j
– Instruction j is data dependent on instruction k, and instruction k is data depended on instruction i
EENG449b/SavvidesLec 10.7
2/20/04
Data Dependences
Data dependencies are properties of programs
Detection of hazards and stalls are properties of the pipeline organization
A dependence can be overcomed by:• Maintaining the dependence and
avoiding the hazard• Transforming the code to eliminate
the dependence
EENG449b/SavvidesLec 10.8
2/20/04
Detecting Data Dependences
• Data values can flow through registers or memory
• Data dependences that flow through registers are easy to detect
– Register names are the same so it is easy to check
– More complicated when branches intervene
• Data dependences are harder to detect in memory
100(R4) and 20(R6) may point to the same memory location!!
Crucial aspect to consider in compiler techniques
EENG449b/SavvidesLec 10.9
2/20/04
Name Dependences
• Name dependence: two instructions use the same register or memory, without any flow of data that is actually associated with that register or memory location
Types of name dependences• Antidependence – instruction j writes a register
that instruction i reads• Output dependence – instruction i and
instruction j write the same memory location or register
Name dependences are not real dependences • Just change the names – register renaming –
can be done by the hardware or the compiler
EENG449b/SavvidesLec 10.10
2/20/04
Data Hazards (Revisited)
• Changes the access to the operand ordering
Read After Write (RAW) – j tries to read a source before i writes it – program order must be reserved
Write After Write (WAW) – j tried to write an operand before it is written by i – output dependence. Can only happen in pipelines that write in more than one stage or let an instruction to proceed when another instruction is stalled
Write After Read (WAR) – j tries to write an instruction before it is read by i – antidependence – mostly occurs when instructions write results early in the pipeline, or when instructions are reordered
EENG449b/SavvidesLec 10.11
2/20/04
Control Dependences
• Control dependences control the ordering of instructions with respect to branch instructions
– Instructions should execute in correct program order
– Ex. Should not execute instructions from the then clause of an if statement if not needed
• Control dependence constraints– Instructions control dependent on a branch cannot
be moved before a branch» E.g an instruction from the then component of
a statement cannot be move before the if component
– An instruction that is not control dependent on a branch cannot be moved after the branch so that is execution is depended on the branch
EENG449b/SavvidesLec 10.12
2/20/04
Control Dependence
• Control dependence is not the critical property to preserve
– May be willing to execute extra instructions if that does not compromise program correctness
• Need to preserve– Exception behavior – the way exceptions raise in
a program should not be altered– Data flow – flow of data among instructions that
produce results and those that consume them
EENG449b/SavvidesLec 10.13
2/20/04
Dynamic Scheduling
• Statically scheduled pipelines– When a data dependence cannot be hidden with
bypassing or forwarding, the processor stalls until the data is cleared
• Dynamic scheduling– Hardware reorders instructions to reduce the
stalls while maintaining data flow and instruction behavior
• Advantages– Handles dependences not known at compile time
» Simplifies compiler design– Allows code compiled for one pipeline to run
efficiently on another
• Disadvantage – hardware complexity
EENG449b/SavvidesLec 10.14
2/20/04
Dynamic Scheduled Pipelines(Lecture 5)
• Simple pipelines result in hazards that require stalling.
• Static scheduling – compilers rearrange instructions to avoid stalls.
• Dynamic scheduling – processor executes instructions out-of-order to minimize stalls
• Dynamic scheduling requires splitting the ID stage into stages:
– Issue – Decode instructions, check for structural hazards
– Read operands – Wait until there are no data hazards, then read operands
– Also need to know when each instruction begins and ends execution
• Requires a lot more bookkeeping! More when we discuss Tomasulo’s algorithm in chapter 3…
EENG449b/SavvidesLec 10.15
2/20/04
Scoreboarding
Scoreboarding – a technique that allows out-of-order execution when resources are available and there are no data dependencies – originated in CDC6600 in the mid 60s.
• Scoreboard fully responsible for instruction execution and hazard detection
– Requires changes in # of functional units and latency of operations
– Needs to keep track of status of all instructions in execution
EENG449b/SavvidesLec 10.17
2/20/04
Tomasulo’s Algorithm
• Hardware based technique for ILP– Tracks when operands are available to avoid
RAW hazards– Introduces register renaming to avoid WAW and
WAR hazards» What does this mean?
• More sophisticated approach than the scoreboard from Appendix A
• Initially designed for the IBM 360/91– Designed in the late 60s– Scoreboarding + register renaming– 4 FP registers, long memory access delays, long
FP times – compiler level optimizations were limited
EENG449b/SavvidesLec 10.18
2/20/04
Register Renaming
DIV.D F0, F2, F4
ADD.D F6, F0, F8
S.D F6, 0(R1)
SUB.D F8, F10, F14
MUL.D F6, F10, F8
Where is the antidependence (WAR)?– This is a name dependence
EENG449b/SavvidesLec 10.19
2/20/04
Register Renaming
DIV.D F0, F2, F4
ADD.D F6, F0, F8
S.D F6, 0(R1)
SUB.D F8, F10, F14
MUL.D F6, F10, F8
Where is the output dependence (WAW)?
– This is a name dependence
EENG449b/SavvidesLec 10.20
2/20/04
Register Renaming
DIV.D F0, F2, F4
ADD.D F6, F0, F8
S.D F6, 0(R1)
SUB.D F8, F10, F14
MUL.D F6, F10, F8
Where are the true data dependences (RAW)?
EENG449b/SavvidesLec 10.21
2/20/04
Getting Rid of Name Dependencies
• Assume we have 2 temporary registers S and T the code sequence can be re-written as:
DIV.D F0, F2, F4 DIV.D F0, F2, F4ADD.D F6, F0, F8 ADD.D S, F0, F8S.D F6, 0(R1) S.D S, 0(R1)SUB.D F8, F10, F14 SUB.D T, F10, F14MUL.D F6, F10, F8 MUL.D F6, F10, T
• Any subsequent uses of F8 should be replaced with register T
– Requires sophisticated compiler analysis since intervining branches may change the meaning of F8
– Tomasulo’s algorithm can handle renaming across branches
EENG449b/SavvidesLec 10.22
2/20/04
Tomasulo’s Scheme for Avoiding Name Dependences
• Use Reservation Stations– Buffer the operands of instructions waiting to
issue– Buffers the operand as soon as it is available,
eliminating the need to get an operand from a register
– Operands are renamed to the names of the reservation station, avoiding register name conflicts
– There are more reservation stations than registers
» Eliminates more hazards than the compiler
EENG449b/SavvidesLec 10.23
2/20/04
MIPS FPU with Tomasulo
Issue: In order instructions to Preserve correct data flow• If there is an empty reservation station issue the instruction with operands• Else stall –stuctural hazard
EENG449b/SavvidesLec 10.24
2/20/04
MIPS FPU with Tomasulo
If operands not available, keep track of the FUs that produce them –Register renaming
EENG449b/SavvidesLec 10.25
2/20/04
An Instruction goes through 3 basic steps
1. Issue – described in the previous slide2. Execute –
Operands placed in the reservation tables as they become availableWhen all operands available the instruction is executed- this execution delay eliminates RAW hazards
Loads and stores have 2 execution steps1. Compute the effective address and place in load or store buffer2. Execute as soon as memory unit is available
No instruction is executed until all preceding branches have been determined to preserve exception behavior
EENG449b/SavvidesLec 10.26
2/20/04
Step 3
• Write result – Results written on common data bus (CDB)
» End up in corresponding registers and reservation tables
– Write data to memory also happens at this step
EENG449b/SavvidesLec 10.27
2/20/04
Things to note about Tomasulo’s Scheme
• Data structures to detect and eliminate hazards are attached to:
– Reservation stations– Register file– Load and store buffers
• Reservation stations act as a set of virtual registers
– More than FP registers so register renaming is possible
EENG449b/SavvidesLec 10.28
2/20/04
Reservation Table Fields
To track the state of the algorithm:• Op – operation to perform on source operands• Qj, Qk – the reservation stations that will
produce the operand• Vj, Vk – The value of the source operands• A – holds information on the memory address
calculation (immediate and address calculation are stored here)
• Busy – Reservation station and its accompanying functional unit is busy
The register file also contains a field• Qi – The number of the reservation station
that contains the value that should be stored in this register
EENG449b/SavvidesLec 10.29
2/20/04
Scoreboarding vs. Tomasulo
• No checking needed for WAR or WAW as registers are renamed
• Hazard detection logic is distributed • Loads and stores are treated as basic
functional units• Has larger register sets – reservation
tables• Exploits ILP well but requires more
complex hardware
EENG449b/SavvidesLec 10.30
2/20/04
Tomasulo’s Algorithm Details
• Refer to figure 3.5 in the text for a detailed register level description of Tomasulo’s algorithm