Upload
others
View
17
Download
0
Embed Size (px)
Citation preview
DyPO: Dynamic Pareto-Optimal Configuration Selection for Heterogeneous MpSoCs
1. Motivation
Personalized Social Computing
2. DyPO Overview and Illustration
Ujjwal Gupta, Chetan A. Patil, Ganapati Bhat, Umit Y. OgrasArizona State University
Prabhat MishraUniversity of Florida
§ Smart phones have a highly heterogeneous processor § Power and performance change as a function of:
1. Configurations2. Workloads
§ Good Power-Performance tradeoff does not mean good Energy-Performance tradeoff
§ Dynamically selecting the optimal config is challenging
Numberofconfigurationknobs4004forSamsungExynos 5422
BB_PAPI_read()BB1BB2
BB1MBB_PAPI_read()
BB(1M + 1)BB(1M + 2)
BB2MBB_PAPI_read()
Find optimal Config (𝒄𝒌∗ ){𝟐𝐋, 𝟏𝐆𝐇𝐳,𝟑𝐁, 𝟏𝐆𝐇𝐳}
Find 𝒇 𝒑𝒌 = 𝒄𝒌∗(boundary)
#LLC-misses #ins
truc
tions
-ret
d.
𝒄𝒌∗ ≈ 𝒇()
Instrumented AppsLLVM
Cla
ngIn
stru
men
tatio
n
Phase-1𝒙𝟏=10M𝒙𝟐=10k
Phase-2𝒙𝟏=10M𝒙𝟐=1k
Input (𝒙)Phase-3𝒙𝟏= 10M𝒙𝟐= 9k
Step 1: Instrumentation
{𝟒𝐋, 𝟐𝐆𝐇𝐳,𝟒𝐁, 𝟐𝐆𝐇𝐳}
Output (𝒄𝒌∗ )
OnlineNew App
𝒙𝟏= #Instr.-retd. 𝒙𝟐= #LLC-misses
Phas
e-1
Phas
e-2
Step 2: Characterization
Multiple Apps
Start:BB1BB2
BB100M End
A1A2
A3
…
……
Example:
Phase-3 is similar to Phase-1 Use the same optimal configuration as Phase-1
Step 3: Classification
{𝟐𝐋, 𝟏𝐆𝐇𝐳, 𝟑𝐁, 𝟏𝐆𝐇𝐳}
Classifier
∀𝒌New Phase P𝑪𝒏𝒆𝒙𝒕 = 𝒇{𝒑𝒌}
Step 4: Online Selection
Offline
3. Characterization and Classification
Personalized Social Computing
4. Experiment Setup
Personalized Social Computing
5. Experimental Results
Personalized Social Computing
6. Conclusions
References
Blackscholes application – 2T
k-means clustering Feature data (𝒙𝒌)
Optimal config (𝒄𝒌∗ )
Store 𝜷
𝒙𝒏
𝒙 𝒏>𝟏 𝒇 𝜷
Training Solve 𝒍(𝜷)offline to get model coeff. 𝜷
Find 𝑷𝒓(𝐂 = 𝒄𝒌∗ |𝐗 = 𝒙𝒌)to solve for 𝒄𝒌∗
Read new feature data (𝒙𝒌) and stored 𝜷
At runtimeH/W countersCPU stats
Solve for 𝑱(𝒑𝒌)
Sweeplittlecores(𝒏𝒍)
Sweepfrequencies(𝒏𝒇)
Repeatbenchmark(𝒏𝒓)
𝒏𝒃×𝒏𝒍×𝒏𝒇×𝒏𝒓
Sweepbigcores(𝒏𝒃)
ConfigurationDataperBenchmark
𝒏𝒃 Numberofbigcores𝒏𝒍 Numberoflittlecores𝒏𝒇 Numberoffrequencies𝒏𝒓 Numberofiterations𝒇𝒍 Littlecore frequency𝒇𝒃 Bigcore frequency
§ 3 iterations of each benchmark at different configuration
§ Profile 18 benchmark, leading to 4467 distinct workloadsnippets
§ Time spent for one benchmark is about 1-2 hours
Characterization data
𝐽 𝑝J = minNO[𝐸 𝑐J, 𝑝J + 𝜇𝑡VWV(𝑐J, 𝑝J)]
𝑙 𝛽 = [Pr(𝐂 = 𝑐J∗|𝐗 = 𝒙J)^
J_`
Pr 𝐂 = 𝑐J∗ 𝐗 = 𝒙J =𝑒𝜷𝒙O
1 + 𝑒𝜷𝒙O
Objective
Likelihood
Conditional Probability
Read
featureda
ta(𝒙
𝒌)
Pr 𝐂 = 𝑐c∗ 𝐗 = 𝒙J =𝑒𝜷d𝒙O
1 + 𝑒𝜷d𝒙O
Computetheconditionalprobabilityforeachclass𝑖 ∈ 1, 𝑁
𝜷c 𝒙J𝜷c𝒙J
Overheadof20𝜇𝑠
Find
mostlikelyclass𝒊,
𝒂𝒓𝒈𝒎
𝒂𝒙 𝒊𝐏𝐫(𝐂=𝒄 𝒊∗|𝐗=𝒙 𝒌)
§ Usethemodelcoefficientmatrix𝜷storedintheplatform(~282bytes)
§ Theconditionalprobabilityiscomputedatruntimeforeachofthe𝑵 classes
§ Theclasswiththehighestprobabilityisassignedtothesystem
Sysfs interfacePower
FrequencyCore Config
UtilizationPerf
Driver
CPUGovernor
Output D
ata For A
nalysis
Benchmark
DyPO
PAPI Calls
Kernel Space User Space
PMU
§ Odroid-XU3board§ Exynos 5422§ Octa-coreCPU§ UbuntuOS§ LinuxKernelv3.10§ 18Applications
Level-1
Level-2𝑐r∗𝑐`∗
𝑐s∗𝑐t∗ 𝑐u∗
§ Our technique successfully finds the Pareto-optimal configurations at runtime as a function of the workload [1]
§ DyPO-Energy achieves 25% and 55% gain in PPW compared to Aalsaud*-Offline and Aalsaud*-ADA [2], respectively
§ Experiments show 93%, 81% and 6% larger performance per watt (PPW) compared to the interactive, ondemand and powersave governors
§ The details can be found in [1].
SingleThreadedApplication
MultiThreadedApplication
BIG CLUSTER LITTLE CLUSTER
CORE 0 CORE 1
CORE 2 CORE 3
CORE 0 CORE 1
CORE 2 CORE 3
SAMSUNG EXYNOS 5422
MultipleApplications
Acknowledgement
Classifier Accuracy Level 1 Level 2
60% Training Validation 99.9 % 92.7 %
5-Fold Cross Validation 99.9 % 80.5 %
This work was supported partially by National Science Foundation (NSF) grants CNS-1526562and CNS-1526687, and Semiconductor Research Corporation (SRC) task 2721.001.
[1] Ujjwal Gupta, Chetan Arvind Patil, Ganapati Bhat, Prabhat Mishra, and Umit Y. Ogras. “DyPO: Dynamic Pareto Optimal Configuration Selection for Heterogeneous MpSoCs,” in ACM Tran. on Embedded Comp. Sys. (ESWEEK Special Issue), October 2017[2] A. Aalsaud et al., “Power–Aware Performance Adaptation of Concurrent Applications in Heterogeneous Many-Core Systems,” in Proc. of the Intl. Symp. on Low Power Elec. and Design, 2016, pp. 368–373.