Upload
shawn-oliver
View
220
Download
0
Embed Size (px)
DESCRIPTION
Previous Methodology Trace Selection: Trace the steady state execution of the benchmark suite using CPI for measuring representativeness, One trace per benchmark. Simulate the traces for different SMT knob settings recording the best setting for each pair Use regression modeling techniques to generate an analytical prediction model to predict best settings for a pair Prove model effectiveness for predicting settings for traces from other benchmarks
Citation preview
DISSERTATION RESEARCH PLANMitesh Meswani
Outline Dissertation Research Update
Previous Approach and Results Modified Research Plan Identifying Resources Identifying Signatures Performance Counters for profiling Representative tracing and validation
Previous Methodology Trace Selection: Trace the steady state
execution of the benchmark suite using CPI for measuring representativeness, One trace per benchmark.
Simulate the traces for different SMT knob settings recording the best setting for each pair
Use regression modeling techniques to generate an analytical prediction model to predict best settings for a pair
Prove model effectiveness for predicting settings for traces from other benchmarks
Recap of Previous Results Models using Decision Trees for SPEC
CPU2000 and Stream Prediction of SMT mode: 97.5% Prediction of SMT Thread Priority: 83%
Modified Plan Summary Represent the use of relevant shared
resources by a benchmark Identify signatures of shared resource
usage within benchmarks using performance counters
Use traces that represent signatures of shared resource usage that can cover 80% of the benchmarks execution
Finally, identify the best SMT knob settings of the representative traces
Shared Resources Shared Resources (seven): TLB, Cache
Memory (L2, L3), Branch Unit, FP Unit, FXU Unit, Compare-register Unit, Branch prediction hardware (history table)
How many resources to consider? :- Analyze current traces to eliminate resources contribute less than a threshold value to cycles spent in shared resources. Compare-register unit is not significant Branch unit is also not significant
Signatures How many? :-
A resource may have mild, moderate, or high contribution, to cycles spent in shared resources
Idea: If we have five resources, equal contribution would mean 100/5 = approx 20% of cycles per resource, using this as basisMild (1% to 15%)Moderate: 16% to 24%High : Greater than 24%
Finding Signatures Profile the benchmark execution to find
cycles spent in the monitored shared resources
Using performance counters sample the counters periodically
Categorize the benchmark execution (SPEC CPU2000) in one of the possible permutation
Finding signatures Continued Profiling benchmark execution:
Only six counters allowed per execution What are the Counts for a sample period? :-
Merge them from different executions ? Use the highest sampling rate ?
Perf Counters to collect data Identified Counters
FP: Completion stalls due to FPU (CMPLU_STALLS_FPU)
FXU: Completion stalls due to FXU (CMPLU_STALLS_FXU)
Derived Counters: LSU Stalls= Total Stalls in LSU – Stalls due
to d-cache miss – stalls due to d-tlb miss
Perf Counters to collect data continued Unsolved TLB:
Total d-tlb misses, Total i-tlb misses , miss resolution sites not known
Total Cycles spent for accessing d-tlb known, includes cost of hits and misses
Caches L2 , L3 hit for data and instruction known, Maybe greater than actual penalty, execution overlaps
misses, or miss down misspredicted branches Maybe use d-cache miss penalty and i-cache miss
penatly on POWER5 which are counted only if completion is stalled.
Branch History Affects prediction, Counter available to count cycles
missprediction stalls completion
Representative Traces Collect traces if required, that represent
the signatures found in benchmark profiling
Use the performance data from simulation of single traces to verify the signatures
Collect data for evaluating SMT-knobs on representative traces
Validation Use Scientific applications to verify if
they are covered by signatures for 80% of their execution
TO DO Identify test applications.