Upload
joanna-tucker
View
233
Download
2
Embed Size (px)
DESCRIPTION
Outline Introduction Current ruby prefetching solution Our solution Implementation details Case of study Conclusions 3
Citation preview
An Accurate and Detailed Prefetching
Simulation Framework for gem5
Martí Torrents, Raúl Martínez, and Carlos Molina
[email protected] Architecture DepartmentUPC – BarcelonaTech
2
Outline
IntroductionCurrent ruby prefetching solutionOur solutionImplementation detailsCase of studyConclusions
3
Outline
IntroductionCurrent ruby prefetching solutionOur solutionImplementation detailsCase of studyConclusions
4
Prefetching
• Reduce memory latency• Bring to a nearest cache data required by CPU• Increase the hit ratio• Implemented in many commercial processors• Erroneous prefetching may produce slowdown• Simulation tools should include this capability
5
Outline
IntroductionCurrent ruby prefetching solutionOur solutionImplementation detailsCase of studyConclusions
6
L1 Cache controller
Current ruby prefetching solution
Current Prefetcher
Prefetch queue
MESI protocolM E S I
L1 Private Cache Memory
7
Outline
IntroductionCurrent ruby prefetching solutionOur solutionImplementation detailsCase of studyConclusions
8
Our solution
L1 Cache controller Current Prefetcher
Prefetch queue
MESI protocolM E S I
L1 Private Cache Memory
MOESI protocolM E S
I
O
Prefetch wrapper
Prefetch queue
Abstract Prefetcher
Specific PrefetchEngineTagged
Prefetcher
Global History Buffer
ReferencePrediction
Table
CurrentPrefetchEngine
Prefetch profiler
9
L2 Shared Cache Bank
Prefetch wrapper
L2 Cache controller
MOESI protocolM E S
IPrefetch queue
O
Prefetch profiler
Abstract Prefetcher
Specific PrefetchEngine
Our solution
L1 Private Cache Memory
Prefetch wrapper
L1 Cache controller
MOESI protocolM E S
IPrefetch queue
O
Prefetch profiler
Abstract Prefetcher
Specific PrefetchEngine
10
Outline
IntroductionCurrent ruby prefetching solutionOur solutionImplementation detailsCase of studyConclusions
11
Implementation details
• Receives command line options• Creates and inits the prefetch engine• Manages communication Controller –> Prefetcher• And Prefetcher –> Prefetch Queue –> Controller• Collects statistics
Prefetch wrapper
12
Implementation details
• Accumulates the statistics
Prefetch profiler
• observed_misses - numMissesObserved
• Cancelled - numDroppedPrefetches
• completed - numPrefetchAccepted
• hit• in_cache • late –
numPartialHits/numHits• overflowed
• page_faults – numPagesCrossed
• total - numPrefetchRequested
• unuseful• useful• queue_merged_requests• generated_prefetches_p
er_train - streams• numMissedPrefetchBlock
s
MISS
13
Implementation details
• Collects the requests generated by the engine• Checks the page fault error• Merges repeated requests
Prefetch queue
14
Implementation details
• Works as an abstract class• Must be inherited by the specific pref• Virtual functions must be redeclared
– Init: Initialization function– Prefetcher size, distance, aggressiveness, etc.
– Observe request: Called on each cache access – Hit/miss, prefetch/no_prefetch, accessed address, etc.
– Allocate: Called when data allocated in cache– Same as observe request
– Deallocate: Called when evicting from cache– Only evicted address
Abstract Prefetcher
15
Implementation details
• Notifies the wrapper:– Cache accesses– Cache allocation– Cache evictions
• Reads from Prefetch Queue• Prefetch issued when no Loads/Stores• Protocol modification similar to a Load operation• Very similar to the current solution
L1 Cache controller
MOESI protocolM E S
I
O
16
Implementation details
• Same as in the L1 but…
• L2 local hits– Some data that is invalid in L2 but locally allocated
• L1_GETS does not store in L2– Protocol modified to store pref requests in L2
• Pref queue generates a request for another tile– Request is forwarded to the corresponding tile
L2 Cache controller
MOESI protocolM E S
I
O
17
Outline
IntroductionCurrent ruby prefetching solutionOur solutionImplementation detailsCase of studyConclusions
18
Case of Study: NoC aware prefetch performance evaluation• We tested 3 classical prefetch engines:
– Tagged Prefetcher– Reference Prediction Table (RPT) – Global History Buffer (GHB)
• With the gem5 Simulator using – 16 tiled x86 CPUs – L1 prefetchers– Ruby memory system– MOESI coherency protocol– Garnet network simulator
• Parsecs 2.1
19
Case of Study: Results
M. Torrents, R. Martínez, C. Molina. “Network Aware Performance Evaluation of Prefetching Techniques in CMPs”. Simulation Modeling Practice and Theory (SIMPAT), 2014.
20
Outline
IntroductionCurrent ruby prefetching solutionOur solutionImplementation detailsCase of studyConclusions
21
Conclusions
• Prefetcher is important and it must be simulated• Current solution is ok• Our solution goes one step farther
– Easy to change/add new prefetch engines– Detailed statistics about prefetching– Garnet can identify prefetching traffic– Useful for statistics or traffic manipulation
• Current tool can be easily included in new solution• Current solution is ok for non prefetch researchers • Our tool is better for research related with prefetch
An Accurate and Detailed Prefetching
Simulation Framework for gem5
Martí Torrents, Raúl Martínez, and Carlos Molina
[email protected] Architecture DepartmentUPC – BarcelonaTech