Upload
others
View
19
Download
0
Embed Size (px)
Citation preview
1
CS359: Computer Architecture
Chapter 3 & Appendix CPart B: ILP and Its Exploitation
Yanyan ShenDepartment of Computer Science and EngineeringShanghai Jiao Tong University
2
Chapter 3: ILP and Its Exploitation
Outline
3.1 Concepts and Challenges 3.2 Basic Compiler Techniques 3.3 Reducing Branch Costs with Advanced
Branch Prediction 3.4 Overcoming Data Hazards with Dynamic
Scheduling 3.5 Hardware-based Speculation
3
Chapter 3: ILP and Its Exploitation
Background
“Instruction Level Parallelism (ILP)” refers to overlap execution of instructions Pipelining become universal technique in 1985
There are two main approaches to exploit ILPHardware-based dynamic approaches
Used in server and desktop processors Compiler-based static approaches
Not as successful outside of scientific applications
The goal of this chapter: 1) to look at the limitation imposed by data and control hazards; 2) Investigate the topic of increasing the ability of the compiler
(software) and the processor (hardware) to exploit ILP
4
Chapter 3: ILP and Its Exploitation
Pipeline CPI
When exploiting instruction-level parallelism, the goal is to minimize CPI
Pipeline CPI = Ideal pipeline CPI + Structural Stalls + Data hazard stalls + Control Stalls
Ideal pipeline CPI is a measure of the maximum performance attainable by the implementation
By reducing each of the terms in the right-hand side, we decrease the overall pipeline CPI or improve IPC (instructions per clock)
6
Chapter 3: ILP and Its Exploitation
What is Instruction-Level Parallelism
Basic block: a straight-line code sequence with no branches in except to the entry and no branches out except at the exit
Parallelism with a basic block is limited The average dynamic branch frequency is often between
15% and 25% Thus, typical size of basic block = 3-6 instructions
Conclusion: Must optimize across branches
7
Chapter 3: ILP and Its Exploitation
Dependences and Hazards
Dependences:
Determining how one instruction depends on another is critical to determining 1) how much parallelism exists in a program and 2) how that parallelism can be exploited!
Hazards:
When there are hazards, the pipeline may be stopped in order to resolve the hazards
8
Chapter 3: ILP and Its Exploitation
Three Types of Dependences
Data dependences (also called true data dependences) Data flow between instructions
Name dependences (anti-dependence and output dependence)
Control dependence
9
Chapter 3: ILP and Its Exploitation
Data Dependences
Dependencies are a property of programs
Pipeline organization determines if a dependenceresults in an actual hazard being detected and if the hazard causes a stall
Data dependence conveys: Possibility of a hazard Order in which results must be calculatedUpper bound on exploitable instruction level parallelism
10
Chapter 3: ILP and Its Exploitation
Name Dependences
Two instructions use the same register or memory location but no flow of informationNot a true data dependence, but is a problem when
reordering instructionsAntidependence: instruction j writes a register or
memory location that instruction i reads Initial ordering (i before j) must be preserved
Output dependence: instruction i and instruction jwrite the same register or memory location Ordering must be preserved
To resolve, use renaming techniques
11
Chapter 3: ILP and Its Exploitation
Control Dependence
Ordering of instruction i with respect to a branch instruction
Instruction control dependent on a branch cannot be moved before the branch so that its execution is no longer controller by the branch
An instruction not control dependent on a branch cannot be moved after the branch so that its execution is controlled by the branch
12
Chapter 3: ILP and Its Exploitation
Data Hazards
• j tries to read a source before i writes it• Program order must be preserved!
Read after write (RAW)
• j tries to write an operand before it is written by i
Write after write (WAW)
• j tries to write a destination before it is read by i, so iincorrectly gets the new value
Write after read (WAR)
13
Chapter 3: ILP and Its Exploitation
Outline
3.1 Concepts and Challenges 3.2 Basic Compiler Techniques 3.3 Reducing Branch Costs with Advanced
Branch Prediction 3.4 Overcoming Data Hazards with Dynamic
Scheduling 3.5 Hardware-based Speculation
14
Chapter 3: ILP and Its Exploitation
Latencies of FP operations
A compiler’s ability to perform pipeline scheduling is dependent both on the amount of ILP available in the program and on the latencies of the functional units in the pipeline
To avoid a pipeline stall, the execution of a dependent instruction must be separated from
the source instruction by a distance in clock cycles equal to the pipeline latency of that
source instruction
15
Chapter 3: ILP and Its Exploitation
Pipeline Stalls
Loop: L.D F0,0(R1)stallADD.D F4,F0,F2stallstallS.D F4,0(R1)DADDUI R1,R1,#-8stall (assume integer load latency is 1)BNE R1,R2,Loop
Example:for (i=999; i>=0; i=i-1)x[i] = x[i] + s;
16
Chapter 3: ILP and Its Exploitation
Pipeline Scheduling
Scheduled code:
Loop:L.D F0,0(R1)DADDUI R1,R1,#-8
ADD.D F4,F0,F2stallstallS.D F4,8(R1)BNE R1,R2,Loop
17
Chapter 3: ILP and Its Exploitation
Loop Unrolling
Loop: L.D F0,0(R1)ADD.D F4,F0,F2S.D F4,0(R1) ;drop DADDUI & BNEL.D F6,-8(R1)ADD.D F8,F6,F2S.D F8,-8(R1) ;drop DADDUI & BNEL.D F10,-16(R1)ADD.D F12,F10,F2S.D F12,-16(R1) ;drop DADDUI & BNEL.D F14,-24(R1)ADD.D F16,F14,F2S.D F16,-24(R1)DADDUI R1,R1,#-32BNE R1,R2,Loop
Unroll by a factor of 4 (assume # elements is divisible by 4)Eliminate unnecessary instructions