Upload
kevin-robinson
View
215
Download
1
Embed Size (px)
Citation preview
Effects of wrong path mem. ref. in CC MP Systems
Gökay Burak AKKUŞ
Cmpe511 – Computer Architecture
About the papers R. Sendag, A. Yilmazer, J.J. Yi, and Augustus K. Uht,
Quantifying and Reducing the Effects of Wrong-Path Memory References in Cache-Coherent Multiprocessor Systems, IPDPS2006, 2006
O. Mutlu, H. Kim, D. Armstrong, and Y. Patt. Cache filtering techniques to reduce the negative impact of useless speculative memory references on processor performance. Symposium on Computer Architecture and High Performance Computing, 2004.
O. Mutlu, H. Kim, D. Armstrong, and Y. Patt. Understanding the effects of wrong-path memory references on processor performance. Workshop on Memory Performance Issues, 2004.
What is it all about?
how wrong-path memory accesses affect the cache coherence traffic state transitions, the resource utilization.
proposes a filtering mechanism and a replacement policy
Subjects
SMPs: Shared-memory MultiProcessor systems
Cache Coherence Branch Prediction and prefetching Wrong paths
Cache Coherence Solutions Snooping Solution (Snoopy Bus):
Send all requests for data to all processors Processors snoop to see if they have a copy and respond
accordingly Requires broadcast, since caching information is at
processors Works well with bus (natural broadcast medium) Dominates for small scale machines (most of the market)
Directory-Based Schemes Keep track of what is being shared in 1 centralized place
(logically) Distributed memory => distributed directory for
scalability (avoids bottlenecks)
Send point-to-point requests to processors via network Scales better than Snooping Actually existed BEFORE Snooping-based schemes
Cache Coherence Protocols
MSI (Modified, Shared, Invalid) MESI (Modified, Shared, Exclusive, Invalid) MOESI (Modified, Owned, Shared,
Exclusive, Invalid)
Wrong-path effects
Replacements Writebacks Invalidations Cache Block State Transitions Data/Bus Traffic and Coherence
Transactions Power Consumption Resource Contention
Replacements
Cause: speculatively-executed load instruction
mispredicted path a cache block brought into data cache One of the cache blocks replaced by
the new one
Writebacks When a replacement occurs by a wrong
path reference The evicted cache block may have the state M
(exclusive, dirty) or O (share, dirty) Before removing this block from cache a
writeback occurs For MSI and MESI
if a requested cache block has the state M, before it is sent to the requestor it is written back to memory
Then its state is set to S in the original owner’s cache.
Invalidations
Assume MOESI protocol A wrong-path load instruction accesses a
cache block that is modified by nother processor
The owner sets the state to O The requestor gets the block and the
state is S if the owner needs to write to that block
Changes state from O to M Then invalidates all other copies
Cache Block State Transitions
2 extra cache transitions in the owner’s cache When a modified block is requested
Cache state changes from M to O When that block is modified
Again the cache state becomes M
Data/Bus Traffic and Coherence Transactions
Due to L1 and L2 cache accesses Caused by extra replacements,
writebacks, invalidations and state transitions
Traffic also increases
Snoop or Directory requests also increase traffic
Power Consumption
As there are unnecessary snoops, Traffic overhead State transition overhead
Power consumption increases Ex:
Filtering unnecessary snoops may reduce L2 cache power by 30% (see Moshovos et al.)
Resource Contention
wrong-path memory accesses compete with correct-path memory accesses for the multiprocessor’s resources
additional cache coherence transactions may increase the frequency of full service buffers
Result: increasing chance of deadlocks
Simulation
SPLASH-2 benchmark suite em3d simulation benchmark MOSI and MOESI protocols used 16-processor SPARC v9
Statement based on experiments
mispredicted branches are resolved before 94% of wrong-path L2 misses complete.
Therefore, whether “an L2 cache miss is speculative” is usually known before the block is placed into the L2 cache. [REF2]
Reducing Cache Pollution Filtering
Filtering applied to L2 cache Observation:
if a speculatively-fetched cache block is not used while it resides in the L1 cache, then it is likely that that block will not be used at all or will not be used before being evicted from the L2 cache
In this mechanism all memory references made by wrong-path instructions
or the prefetcher are fetched only into the first-level cache
the processor monitors whether they are referenced by non-speculative (correctpath) instructions
Based on the predefined observation, the processor may choose to not write the block into the L2 cache or may adopt a policy that gives lower priority to the unused speculatively-fetched block.
Wrong Path Aware Replacement Policy
when a block is brought into the cache, it is marked as being either on the correct-path or on the wrong-path
when a block needs to be evicted wrong-path blocks are evicted first, on a
LRU basis if there are multiple wrong-path blocks.
Performance Evaluation
Conclusions & Critics IPC (instruction per cycle) can be used as the metric In some cases wrong-path executions positively effect overall
performance mcf, parser, and perlbmk
In some cases significantly negative effect vpr and gcc
To model or not to model especially for future systems with longer memory interconnect
latencies and processors with larger instruction windows.
The real effect: Cache pollution In SMP case especially
For a workload with many cache-to-cache transfers, wrong-path memory references can significantly affect the coherence actions.
Proposed solutions yet not studied deeply
References R. Sendag, A. Yilmazer, J.J. Yi, and Augustus K. Uht,
Quantifying and Reducing the Effects of Wrong-Path Memory References in Cache-Coherent Multiprocessor Systems, IPDPS2006, 2006
O. Mutlu, H. Kim, D. Armstrong, and Y. Patt. Cache filtering techniques to reduce the negative impact of useless speculative memory references on processor performance. Symposium on Computer Architecture and High Performance Computing, 2004.
O. Mutlu, H. Kim, D. Armstrong, and Y. Patt. Understanding the effects of wrong-path memory references on processor performance. Workshop on Memory Performance Issues, 2004.