Upload
magdalen-lewis
View
221
Download
1
Tags:
Embed Size (px)
Citation preview
Accurate Power and Energy Measurementon Kepler-based Tesla GPUs
Martin BurtscherDepartment of Computer Science
2
Introduction GPU-based accelerators
Quickly spreading in PCs and even handheld devices Widely used in high-performance computing
Power and energy efficiency Heat dissipation is a problem Electric bill and battery life are of growing concern Exascale requires 50x boost in performance per watt
Important research area Need to develop techniques to reduce power and energy Have to be able to measure power/energy of programs
Accurate Power and Energy Measurement on Kepler-based Tesla GPUs
3
GPU Power Sensors
Hardware High-end compute GPUs include power sensors For example, K20/K40 Tesla cards have built-in sensor These cards are the target of this talk
Software Can query sensor with NVIDIA Management Library http://developer.nvidia.com/nvidia-management-library-nvml
Accurate Power and Energy Measurement on Kepler-based Tesla GPUs
4
Problems
Power sensor data behaves strangely Running the same kernel twice yields different energy
First launch: 114 J, second launch: 147 J (29% more energy) Running a kernel 2x as long more than doubles energy
1x input: 732 J, 2x input: 1579 J (8% above doubling)
Power sensor sampling rate varies greatly Ranges from 0.266 ms to 130 ms (7.7 Hz to 3760 Hz)
Accurate Power and Energy Measurement on Kepler-based Tesla GPUs
5
Methodology Hardware
Two K20c, two K20m, two K20X, and two K40m GPUs
Measurement Query power and time in loop on “idle” CPU core
Test code Compute-intensive regular n-body kernel Constant computation rate of over 2 TFlops on a K20c No data dependences; vary n to adjust kernel runtime
Accurate Power and Energy Measurement on Kepler-based Tesla GPUs
6
Expected Power Profile
Accurate Power and Energy Measurement on Kepler-based Tesla GPUs
Kernel starts executing
Kernel stops executing
GPU idle power
Measurement loop runtime
7
Measured Power Profile
Accurate Power and Energy Measurement on Kepler-based Tesla GPUs
Power ramps up slowly
Power ramps down slowly
Switch to step shape
Idle power reached
Macroscopic phenomena
5s 3s 4s
8
Energy = Area Under Power Curve
Accurate Power and Energy Measurement on Kepler-based Tesla GPUs
Integrate to where?
Unclear how big energy is
Missing energy? Delayed
energy?
9
Ramp-up Behavior of 2 Short Runs
Accurate Power and Energy Measurement on Kepler-based Tesla GPUs
Short run same as longer run
2nd run starts higher but also follows curve
Ramp down doesn’t follow
10
Ramp-down Behavior of Several Runs
Accurate Power and Energy Measurement on Kepler-based Tesla GPUs
0
20
40
60
80
100
120
140
160
16.2 17.2 18.2 19.2 20.2 21.2 22.2 23.2
Mea
sure
d Po
wer
[W]
Shifted Runtime [s]
t2 t3 t4
Shape depends on power at t2
Power increases after kernel done
Shape always the same
Steps down every second
Driver lowers power level
11
Sampling Interval Lengths
Accurate Power and Energy Measurement on Kepler-based Tesla GPUs
0
10
20
30
40
50
60
70
80
0
20
40
60
80
100
120
140
160
10.7 12.0 13.3 14.6 15.9 17.2 18.5 19.8 21.1 22.4 23.7
Sam
plin
g In
terv
al [m
s]
Mea
sure
d Po
wer
[W]
Runtime [s]
t1 t2 t3 t4
Short intervals
Wide range of intervals
Very long interval
Driver activity can prevent sampling
12
Sampling Interval Lengths (zoomed-in)
Accurate Power and Energy Measurement on Kepler-based Tesla GPUs
0
2
4
6
8
10
12
0
20
40
60
80
100
120
12.030 12.035 12.040 12.045 12.050 12.055 12.060
Sam
plin
g In
terv
al [m
s]
Mea
sure
d Po
wer
[W]
Runtime [s]
Identical values
Many short intervals
Very long interval
Sampled power only ever changes after long interval
14
Sampling Frequency Eliminate redundant samples
Only sample once every 15 ms (66.7 Hz) Cannot accurately measure kernels under ~150 ms
Account for the variation in interval length Use high-resolution time stamps
Example: energy from t1 to t4
Dotted (fixed intervals): 1205 J Solid (variable intervals): 1066 J 13% discrepancy
Accurate Power and Energy Measurement on Kepler-based Tesla GPUs
0
20
40
60
80
100
120
140
160
10.7 12.0 13.3 14.6 15.9 17.2 18.5 19.8 21.1 22.4 23.7
Mea
sure
d Po
wer
[W]
Runtime [s]
t1 t4
15
True Power Sensor hardware
Seems to asymptotically approach true power Reminiscent of capacitor charging
True instant power Ptrue is a function of the slope of the power profile
dP/dt and the power measured by the sensor Psensor
Ptrue = Psensor + C × dPsensor/dt
“Capacitance” of sensor C ≈ 0.84 s on all tested K20 GPUs
Accurate Power and Energy Measurement on Kepler-based Tesla GPUs
16
Back-calculated from Expected Profile
Accurate Power and Energy Measurement on Kepler-based Tesla GPUs
‘Capacitor’ function matches measured
values perfectly
Minimized absolute errors to determine C
17
Corrected Power Profile
Accurate Power and Energy Measurement on Kepler-based Tesla GPUs
0
20
40
60
80
100
120
140
160
13 14 15 16 17 18 19 20 21
Pow
er [W
]
Time [s]
t1 t2 t3
Wobbles due to sampling errors
Corrected profile matches expected rectangular profile
‘Active idle’ power level
18
Correction of 2 Short Runs
Accurate Power and Energy Measurement on Kepler-based Tesla GPUs
0
20
40
60
80
100
120
140
160
111 112 113 114 115 116 117 118 119
Pow
er [W
]
Time [s]
t1a t2b t3bt1bt2a
Corrected power profile matches expected profile
19
Second K20c GPU
Accurate Power and Energy Measurement on Kepler-based Tesla GPUs
0
20
40
60
80
100
120
140
160
16.5 17.5 18.5 19.5 20.5 21.5 22.5 23.5
Pow
er [W
]
Time [s]
t1 t2 t3
Identical to original K20c
20
K20m GPU
Accurate Power and Energy Measurement on Kepler-based Tesla GPUs
0
20
40
60
80
100
120
140
160
180
62.7 63.7 64.7 65.7 66.7 67.7 68.7 69.7
Pow
er [W
]
Time [s]
t1 t2 t3
Similar profile but higher power level
21
K20X GPU
Accurate Power and Energy Measurement on Kepler-based Tesla GPUs
0
20
40
60
80
100
120
140
160
180
200
128 129 130 131 132 133 134 135 136 137
Pow
er [W
]
Time [s]
t1 t2 t4
Profile is good, no correction needed!
Huge 600 ms gap
22
K40m GPU
Accurate Power and Energy Measurement on Kepler-based Tesla GPUs
K40m again requires correction
23
Application to Full CUDA Program
Implementation of Barnes Hut n-body algorithm Taken from LonestarGPU benchmark suite Contains multiple regular and irregular kernels Highly optimized, but still suffers from load imbalance,
divergence, and uncoalesced accesses Main kernel is ‘regularized’ (warp-based)
Accurate Power and Energy Measurement on Kepler-based Tesla GPUs
NASA/JPL-Caltech/SSC
24
Barnes Hut Power Profile (1 Step)
Accurate Power and Energy Measurement on Kepler-based Tesla GPUs
Slow then fast drop-off
“Wave” in profile
Original profile is hard to interpret
25
Barnes Hut Power Profile (Kernels)
Accurate Power and Energy Measurement on Kepler-based Tesla GPUs
Slow then fast drop-off
“Wave” in profile
Original profile is hard to interpret
26
Corrected Barnes Hut Power Profile
Accurate Power and Energy Measurement on Kepler-based Tesla GPUs
0
20
40
60
80
100
120
140
160
61.7 62.7 63.7 64.7 65.7 66.7 67.7 68.7
Pow
er [W
]
Time [s]
a b cd ef
Decrease due to load imbal.
Two similar irreg. kernels
One more irreg. kernel
Very short regular kernel
Corrected profile reveals important info
Regularized main kernel
27
K20Power Tool Output
Corrected profile and corresponding ‘active’ energy Features
Computes instant power using ‘capacitor’ formula Employs high-resolution time steps Samples at true frequency of 66.7 Hz
Dissemination Open source, research license http://cs.txstate.edu/~burtscher/research/K20power/
Accurate Power and Energy Measurement on Kepler-based Tesla GPUs
28
Marcher System Tool will be part of Marcher system at Texas State
NSF-funded green computing infrastructure Marcher is a power-measurable cluster system
832 general-purpose cores 12,000 GPU and MIC cores 1.2 TB of DDR3 with power throttling and scaling 50 TB of hybrid storage with hard drives and SSDs Component-level power measurement tools (e.g.,
CPU, DRAM, Disk, GPU, Xeon Phi)
Accurate Power and Energy Measurement on Kepler-based Tesla GPUs
29
Summary Correctly measuring K20/K40 power and energy
Sample at 66.7 Hz and include time stamps Compute true power with presented formula
Use neighboring power samples to approximate slope Compute true energy by integrating true power
Over intervals where power is above ‘active idle’
K20Power tool Software tool that implements this methodology
Paper at http://cs.txstate.edu/~burtscher/papers/gpgpu14.pdf
Accurate Power and Energy Measurement on Kepler-based Tesla GPUs
30
Acknowledgments Collaborators
Ivan Zecena and Ziliang Zong U.S. National Science Foundation
DUE-1141022, CNS-1217231, and CNS-1305359 NVIDIA Corporation
Grants and equipment donations Texas State University
Research Enhancement Program
Accurate Power and Energy Measurement on Kepler-based Tesla GPUs
Nvidia