View
222
Download
0
Tags:
Embed Size (px)
Citation preview
EENG449b/SavvidesLec 4.1
1/22/04
January 22, 2004
Prof. Andreas Savvides
Spring 2004
http://www.eng.yale.edu/courses/eeng449bG
EENG 449bG/CPSC 439bG Computer Systems
Lecture 3
Pipelining Part II
EENG449b/SavvidesLec 4.2
1/22/04
Announcements
• Project groups and group meetings• Project topics
– A 1-page project proposal due next Friday, Jan 30 (email it to me)
• Project proposal should include:– 1 paragraph project overview. This describes what
your project will do.– 1 paragraph describing the specific tasks that you
need to do» E.g read papers, install tools, learn some special
programming language or hardware– 1 paragraph on what resources you need for your
project» E.g Are you using any special hardware?» Do you have access to lab/hardware/software
EENG449b/SavvidesLec 4.3
1/22/04
Instruction Formats Review
EENG449b/SavvidesLec 4.4
1/22/04
Implementing a MIPS Pipeline
We are developing a subset of the MIPS pipeline supporting
– Load store word– Branch equal zero– Integer ALU Operations
• Remember MIPS has register-register ALU instructions (e.g Add R1, R2, R3)
• Attention: In the homework you will have to redesign the pipeline for register-memory instructions for ALU operations (e.g Add R1,R2,(R3)!!!
EENG449b/SavvidesLec 4.5
1/22/04
MIPS Datapath Review
4;PCNPC
Mem[PC];IR
EENG449b/SavvidesLec 4.6
1/22/04
MIPS Datapath Review
IR; of field immediate extended-singImm
Regs[rt];B
Regs[rs];A
EENG449b/SavvidesLec 4.7
1/22/04
MIPS Datapath Review
0)(ACond
2) (Imm NPC ALUOutput :Branch
or Imm; op AALUOutput :Imm-Reg
or B; func A ALUOutput :ALU Reg-Reg
or Imm; AALUOutput :Ref Memory
EENG449b/SavvidesLec 4.8
1/22/04
MIPS Datapath Review
ALUOutputPC if(cond) :Branch
B;put]Mem[ALUOut
or put];Mem[ALUOutLMD :ref Mem
EENG449b/SavvidesLec 4.9
1/22/04
MIPS Basic Pipeline
Data needs to be written in the registers at the end of each cycle
Depend on instruction type
Load or ALUoperation
LMD
ALUOut
EENG449b/SavvidesLec 4.10
1/22/04
Events at every pipe stage
EENG449b/SavvidesLec 4.11
1/22/04
Events at every pipe stage
EENG449b/SavvidesLec 4.12
1/22/04
Hazards Review
From previous lecture we know the situations that would cause incorrect execution
• Structural Hazards -• Data Hazards -• Control Hazards -
EENG449b/SavvidesLec 4.13
1/22/04
• Read After Write (RAW) InstrJ tries to read operand before InstrI writes it
• Caused by a “Data Dependence” (in compiler nomenclature). This hazard results from an actual need for communication.
Three Generic Data Hazards
I: add r1,r2,r3J: sub r4,r1,r3
EENG449b/SavvidesLec 4.14
1/22/04
• Write After Read (WAR) InstrJ writes operand before InstrI reads it
• Called an “anti-dependence” by compiler writers.This results from reuse of the name “r1”.
• Can’t happen in MIPS 5 stage pipeline because:– All instructions take 5 stages, and– Reads are always in stage 2, and – Writes are always in stage 5
I: sub r4,r1,r3 J: add r1,r2,r3K: mul r6,r1,r7
Three Generic Data Hazards
EENG449b/SavvidesLec 4.15
1/22/04
Three Generic Data Hazards
• Write After Write (WAW) InstrJ writes operand before InstrI writes it.
• Called an “output dependence” by compiler writersThis also results from the reuse of name “r1”.
• Can’t happen in MIPS 5 stage pipeline because: – All instructions take 5 stages, and – Writes are always in stage 5
• Will see WAR and WAW in later more complicated pipes
I: sub r1,r4,r3 J: add r1,r2,r3K: mul r6,r1,r7
EENG449b/SavvidesLec 4.16
1/22/04
MIPS Basic Pipeline
Instruction issued
IF ID EX IF WB
Data Hazards can be detected here
EENG449b/SavvidesLec 4.17
1/22/04
Hardware Hazard Detection
• Figure A.20
EENG449b/SavvidesLec 4.18
1/22/04
Logic to Detect Load Interlocks
• Figure A.21
EENG449b/SavvidesLec 4.19
1/22/04
Forwarding of Results to the ALU
Mem output
ALU output
EENG449b/SavvidesLec 4.20
1/22/04
Control Hazards Revisited
A branch causes a 3-cycle stall in the 5-stage pipeline
Branch Instruction IF ID EX MEM WB
Branch Successor+1 IF stall stall IF ID EX MEM WB
Branch Successor+2 IF ID EX MEM WB
Branch Successor+3 IF ID EX MEM WB
Higher overhead than data hazards…
Can HW changes improve that? YES!• Try to make an early decision whether a branch is taken or not.
EENG449b/SavvidesLec 4.21
1/22/04
Improved Pipeline – Dealing with Branches
Additional adder in ID stageWrite the PC faster
Can detect branch hazard 2 cycles earlier
EENG449b/SavvidesLec 4.22
1/22/04
Improved Pipeline – Dealing with Branches
Additional adder in ID stageWrite the PC faster
Note change of order in text!Figure A.11 says a branch hazard would stall for 1 cycle. This is after the optimization in
Figure A.24!!!Note the change of order…
EENG449b/SavvidesLec 4.23
1/22/04
Reducing Branch Penalties
1. Freeze the pipeline until the outcome of a branch instruction is known
2. Treat every branch as always not-taken • You have to be careful on how to restore the
state of the pipeline back the correct place
3. Treat every branch as taken• May make sense for some machines where the
branch target address is known before the outcome this might make sense
4. Delayed branch• Execute some instructions until the outcome is
known (branch-delay slots)
EENG449b/SavvidesLec 4.24
1/22/04
Branch-Delay Slots
On a machine that needs n cycles before a branch outcome is known:
branch instruction
sequencial successor1 compiler needs to decide
sequencial successor2 on valid and useful successors …………………………………… sequencial successorn
Typically most processors have 1 delay slotLimitations of branch delay:• Restrictions on branch delay instructions• Ability to predict branch outcome at compile time
– Most hardware support nullifying branch – gives the compiler more flexibility. It can schedule the instruction and later on cancel its effects without violating program correctness
EENG449b/SavvidesLec 4.25
1/22/04
Delayed Branch• Where to get instructions to fill branch delay slot?
– Before branch instruction– From the target address: only valuable when branch taken– From fall through: only valuable when branch not taken– Canceling branches allow more slots to be filled
• Compiler effectiveness for single branch delay slot:– Fills about 60% of branch delay slots– About 80% of instructions executed in branch delay slots
useful in computation– About 50% (60% x 80%) of slots usefully filled
• Delayed Branch downside: 7-8 stage pipelines, multiple instructions issued per clock (superscalar)
EENG449b/SavvidesLec 4.26
1/22/04
Scheduling Branch DelayIndependent instruction
Cannot be used
Preferred when branch taken w/ high prob
EENG449b/SavvidesLec 4.27
1/22/04
Performance of Branch Schemes
branches from cycles stall Pipelinedepth Pipeline
speedup Pipeline
1
penalty Branch frequency Branch branches from cycles stall Pipeline
penalty Branchfrequency Branch1depth Pipeline
speedup Pipeline
Assuming an ideal CPI of 1:
EENG449b/SavvidesLec 4.28
1/22/04
Challenges in Pipeline Implementation
Exceptions: Situations that can disrupt the in-order execution of instructions (interrupt, fault, exception)
• I/O device request• Invoking an OS service from a user
program• Breakpoint• Integer arithmetic overflow or FP
arithmetic anomaly• Page fault (not in main memory)• Misaligned memory access etc…
EENG449b/SavvidesLec 4.29
1/22/04
Exceptions Requirements
• Synchronous vs. Asynchronous• User requested vs. coerced• User maskable vs. user non-maskable• With vs. between instructions• Resume vs. terminate
Major challenges:• Exceptions happening within
instructions• Exceptions that need to be restarted –
as in the case of a page fault
EENG449b/SavvidesLec 4.30
1/22/04
MIPS Exceptions
Pipeline State Problem Exceptions
IF Page fault on instruction fetch misaligned memory access memory protection violation
ID Undefined or illegal opcode
EX Arithmetic exception
MEM Page fault on data fetch; misaligned
memory access; memory protection violation
WB None
EENG449b/SavvidesLec 4.31
1/22/04
What’s next?
Next lecture:– MIPS FP Pipeline & Dynamic Scheduled
Pipelines– An embedded processor architecture: ARM
Lecture 6:– Sensor networks and applications– The connection between architecture and
networks