DISSERTATION RESEARCH PLAN Mitesh Meswani. Outline Dissertation Research Update Previous Approach and Results Modified Research Plan Identifying

DISSERTATION RESEARCH PLANMitesh Meswani

Outline Dissertation Research Update

Previous Approach and Results Modified Research Plan Identifying Resources Identifying Signatures Performance Counters for profiling Representative tracing and validation

Previous Methodology Trace Selection: Trace the steady state

execution of the benchmark suite using CPI for measuring representativeness, One trace per benchmark.

Simulate the traces for different SMT knob settings recording the best setting for each pair

Use regression modeling techniques to generate an analytical prediction model to predict best settings for a pair

Prove model effectiveness for predicting settings for traces from other benchmarks

Recap of Previous Results Models using Decision Trees for SPEC

CPU2000 and Stream Prediction of SMT mode: 97.5% Prediction of SMT Thread Priority: 83%

Modified Plan Summary Represent the use of relevant shared

resources by a benchmark Identify signatures of shared resource

usage within benchmarks using performance counters

Use traces that represent signatures of shared resource usage that can cover 80% of the benchmarks execution

Finally, identify the best SMT knob settings of the representative traces

Shared Resources Shared Resources (seven): TLB, Cache

Memory (L2, L3), Branch Unit, FP Unit, FXU Unit, Compare-register Unit, Branch prediction hardware (history table)

How many resources to consider? :- Analyze current traces to eliminate resources contribute less than a threshold value to cycles spent in shared resources. Compare-register unit is not significant Branch unit is also not significant

Signatures How many? :-

A resource may have mild, moderate, or high contribution, to cycles spent in shared resources

Idea: If we have five resources, equal contribution would mean 100/5 = approx 20% of cycles per resource, using this as basisMild (1% to 15%)Moderate: 16% to 24%High : Greater than 24%

Finding Signatures Profile the benchmark execution to find

cycles spent in the monitored shared resources

Using performance counters sample the counters periodically

Categorize the benchmark execution (SPEC CPU2000) in one of the possible permutation

Finding signatures Continued Profiling benchmark execution:

Only six counters allowed per execution What are the Counts for a sample period? :-

Merge them from different executions ? Use the highest sampling rate ?

Perf Counters to collect data Identified Counters

FP: Completion stalls due to FPU (CMPLU_STALLS_FPU)

FXU: Completion stalls due to FXU (CMPLU_STALLS_FXU)

Derived Counters: LSU Stalls= Total Stalls in LSU – Stalls due

to d-cache miss – stalls due to d-tlb miss

Perf Counters to collect data continued Unsolved TLB:

Total d-tlb misses, Total i-tlb misses , miss resolution sites not known

Total Cycles spent for accessing d-tlb known, includes cost of hits and misses

Caches L2 , L3 hit for data and instruction known, Maybe greater than actual penalty, execution overlaps

misses, or miss down misspredicted branches Maybe use d-cache miss penalty and i-cache miss

penatly on POWER5 which are counted only if completion is stalled.

Branch History Affects prediction, Counter available to count cycles

missprediction stalls completion

Representative Traces Collect traces if required, that represent

the signatures found in benchmark profiling

Use the performance data from simulation of single traces to verify the signatures

Collect data for evaluating SMT-knobs on representative traces

Validation Use Scientific applications to verify if

they are covered by signatures for 80% of their execution

TO DO Identify test applications.

Documents

DISSERTATION RESEARCH PLAN Mitesh Meswani. Outline Dissertation Research Update Previous Approach and Results Modified Research Plan Identifying