Upload
nyoko
View
45
Download
0
Tags:
Embed Size (px)
DESCRIPTION
COMPSYS 304. Computer Architecture Speculation & Branching. Morning visitors - Paradise Bay, Bay of Islands. Speculation. High Tech Gambling? Data Prefetch Cache instruction dcbt : data cache block touch Attempts to bring data into cache so that it will be “close” when needed - PowerPoint PPT Presentation
Citation preview
COMPSYS 304
Computer Architecture
Speculation & Branching
Morning visitors - Paradise Bay, Bay of Islands
Speculation
• High Tech Gambling?• Data Prefetch
• Cache instruction dcbt : data cache block touch
• Attempts to bring data into cache• so that it will be “close” when needed
• Allows SIU to use idle bus bandwidth• if there’s no spare bandwidth,
this read can be given low priority• Speculative because
• a branch may occur before it’s used• we speculate that this data may be needed
PowerPC mnemonic -Similar opcodes found in other architectures:SPARC v9, MIPS, …
Speculation - General
• Some functional units almost always idle• Make them do some (possibly useful) work
rather than idle• If the speculation was incorrect,
results are simply abandoned• No loss in efficiency; Chance of a gain
• Researchers are actively looking at software prefetch schemes• Fetch data well before it’s needed• Reduce latency when it’s actually needed
• Speculative operations have low priority and use idle resources
Branching
• Expensive• 2-3 cycles lost in pipeline
• All instructions following branch ‘flushed’
• Bandwidth wasted fetching unused instructions• Stall while branch target is fetched
• We can speculate about the target of a branch• Terminology
• Branch Target : address to which branch jumps
• Branch Taken : control transfers to non- sequential address (target)
• Branch Not Taken : next instruction is executed
Branching - Prediction
• Branches can be• unconditional: branch is always taken
call subroutine return from subroutine
• conditional: branch depends on state of computation, eghas loop terminated yet?
• Unconditional branches are simple• New instructions are fetched as soon as the
branch is recognized • As early in the pipeline as possible
• Branch units often placed with fetch & decode stages
Branching - Branch Unit
• PowerPC 603 logical layout
Branching - Speculation
• We have the following code: if ( cond ) s1; else s2;
• Superscalar machine • Multiple functional units• Start executing both branches (s1 and s2)• Keep idle functional units busy!
• One is speculative and will be abandoned• Processor will eventually calculate the branch
condition and select which result should be retained (written back)
• MIPS R10000 - up to 4 speculative at once
Branching - Speculation
• MIPS R10000 - • Up to 4 speculative at once• Instructions are “tagged” with a 4 bit mask
• Indicates to which branch instruction it belongs
• As soon as condition is determined,mis-predicted instructions are aborted
Branching - Prediction• We have a sequence of instructions:
addlw
sub brne L1 or st
? If you were asked to guess which branch should be preferred, which would you choose:
? Next sequential instruction (L2)? Branch target (L1)
L2
L1 Some mixture of arithmetic,load, store, etc, instructions
branch on some condition
Some more arithmetic,load, store, etc, instructions
Branching - Prediction
• Studies show that backward branches are taken most of the time!
• Because of loops:
add ;any mix of arith,lw ;load, store, etc,
sub ;instructionsbrne L1 ;branch back to loop start
or ;some more arith,st ;memory, etc instructions
L2
L1
Branching - Prediction Rule
• A simple prediction rule:• Take backward branches
works amazingly well!• For a loop with n iterations,
this is wrong in 1/n cases only!• A system working on this rule alone would
• detect the backward branch and • start fetching from the branch target
rather than the next instruction
Branching - Improving the prediction
• Static prediction systems• Compiler can mark branches
• Likely to be taken or not• Instruction fetch unit will use the marking as
advice on which instruction to fetch
• Compiler often able to give the right advice • Loops are easily detected• Other patterns in conditions can be recognized
• Checking for EOF when reading a file• Error checking
Branching - Improving the prediction
• Dynamic prediction systems• Program history determines most likely branch• Branch Target Buffers - Another cache!
Branching - Branch Target Buffer
• Instruction Add[11:3] selects BTB entry• Tag determines “hit”• Stats select taken/not taken
Pentium 4>91% prediction
accuracy -4K entry BHT
(Branch History Table)G4e – 2K entries
Branching - Branch Target Buffer
• BTB – just another cache• Works on temporal locality principle
• If this branch is taken (not taken) now, it’s likely to be taken (not taken) next time
• Replace on conflicts (newest is best)• Any cache organization could be used
• Direct mapped, associative, set-associative• No write-back needed• Flushed entries are restored
• Major difference from other caches• Status bits …………
Branching - Branch Target Buffer
• Status bits• Provide hysteresis in behaviour• Without hysteresis, behaviour change would
cause the prediction to immediately update• Example:
• If ( cond ) s1else s2
• If the program takes branch s1 a few times,the BTB will predict that s1 is more likely than s2
• If s2 is then taken, usual cache behaviour suggests that the prediction should be updated to s2
but• Program branching behaviour is a little
different ….
Branching - Branch Target Buffer
• Status bits• Common branch behaviour is like this
• List of taken branches:s1 s1 s1 s1 s1 s2 s1 s1 s1 s2 s1 …
• Usually s1 is executed,occasionally s2
eg •s2 handles errors•s2 follows a loop
• ‘Standard’ cache update policies (assume the most recent will used next) would update the prediction from s1 to s2 immediately• This would cause many mis-predictions
Branching - Branch Target Buffer
• Status bits• However, if the BTB waits until it has seen s2 a number
of times before changing the prediction, the previous stream is predicted well
• So the status bits (say 2 bits) are a count of the number of correct predictions• A correct prediction updates the count
(maybe saturating at 2 – ie counts to max 2)• A mis-prediction decrements the count• A mis-prediction and count=0 updates the prediction• This accommodates an occasional break from a
pattern (eg s1 is usually taken) without disturbing the best prediction (take s1)
• It also handles situations where behaviour changes sometimes
Branching - Branch Target Buffer• Status bits - Count correct predictions
• Handles situations where behaviour changes sometimes• Programs which move from one ‘region’ to another ..
eg
• Image processing code - looking for an orange object• Process background (non-orange) pixels,• finds the orange thing,• counts orange pixels for a while, then • reverts back to background
// search for orange object in row of pixelsfor(j=0;j<width;j++) { if ( pixel[j].colour != orange ) // s1 bg_cnt++; else { // s2 o_cnt++; if ( o_cnt > obj_width ) … // found it! } }
Branching - Branch Target Buffer
• Status bits• Count correct predictions
• Handles situations where behaviour changes sometimes
• Programs which move from one ‘region’ to another ..
• Example:
• Image processing program looking for an orange object• Process background (non-orange) pixels,
• finds the orange thing,
• counts orange pixels for a while, then
• reverts back to background
• List of taken branches:
Taken branches: s1 s1 s1 s2 s2 s2 … s2 s1 s1 s1 s1
Region: BG BG BG OR OR OR … OR BG BG BG BG
Prediction: s1 s1 s1 s1 s1 s2 … s2 s2 s2 s1 s1
Correct: …
Branching - Branch Target Buffer
• Status bits• Count correct predictions• Reasonable compromise behaviour for most situations
• Tolerates an occasional ‘error’ branch well• Changes to a new behaviour with a small delay
• Typically about 90% correct predictions• BTB with 2k – 4k entries
Speculation & Branching - Summary
• Data speculation• Try to bring data ‘closer’ to CPU (ie into cache)
before needed• Reduce memory access latency
• Techniques• Special ‘touch’ instructions
• Advice to processor – fetch if resources available
• Software• eg Dummy reference
• Instruction (Branch) speculation ..
Speculation & Branching - Summary
• Branches are expensive!!• Instruction (Branch) speculation
• Execute both branches of a conditional branch• ‘Squash’ (abandon) results from wrong branch
when branch condition eventually evaluated• Compiler can also mark most probable branch
• Branch prediction• Simplest rule: take backward branches• Branch Target Buffer
• Cache containing most recent branch target• ‘Standard’ cache, except for• Status bits
• Introduce hysteresis into behaviour• Only update branch target when it’s definitely the right choice
Superscalar - summary
• Superscalar machines have multiple functional units (FUs)eg 2 x integer ALU, 1 x FPU, 1 x branch, 1 x
load/store
• Requires complex IFU• Able to issue multiple instructions/cycle (typ 4)• Able to detect hazards (unavailability of
operands)• Able to re-order instruction issue
• Aim to keep all the FUs busy
• Typically, 6-way superscalars can achieveinstruction level parallelism of 2-3