Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
July 18, 2012
An Analysis of GPU Utilization Trends on the Keeneland Initial Delivery System Tabitha K Samuel, Stephen McNally, John Wynkoop National Institute for Computational Sciences
The Keeneland Project
• 5 year track 2 D coopera.ve agreement awarded by the NSF • Partners – Georgia Tech, Oak Ridge Na.onal Lab, Na.onal
Ins.tute for Computa.onal Sciences and the University of Tennessee
• Keeneland Ini.al Delivery System (KIDS) is being used to develop programming tools and libraries for a GPGPU plaMorm
Keeneland Partners
KIDS Specifications
Node architecture HP ProLiant SL390 G7
CPU Intel Xeon X5660 (Westmere)/ 12 cores per node
Host memory per node 24 Gbytes
GPU Architecture Nvidia Tesla M2090 (Fermi)
GPUs per node 3
GPU memory per node 18 Gbytes (6 Gbytes per GPU)
CPU:GPU ra.o 2:3
Interconnect InfiniBand QDR (single rail)
Total number of nodes 120
Total CPU cores 1,440
Total GPU cores 161,280
Need for a monitoring tool
• Most applica.ons did not have the appropriate administra.ve tools and vendor support.
• GPU administra.on has largely been an aaerthought as vendors in this space are focused on gaming and video applica.ons.
• There is a compelling need to monitor GPU u.liza.on on Keeneland for the purposes of proper system administra.on and future planning for Keeneland Final System
Design of the monitoring tool
• In CUDA 4.1, NVIDIA provided enhanced func.onality for the nvidia-‐system management interface (nvidia-‐smi)
• NVML -‐ NVIDIA Management Library – C-‐based API for monitoring and managing various states of the NVIDIA GPU devices.
– It provides a direct access to the queries and commands exposed via nvidia-‐smi.
– Data is presented in plain text or xml format
Sample output of nvidia-smi -q -d utilization
Design of monitoring tool
Database
…… Bash Script
Tmp file
Python script
Compute node 60
Bash Script
Tmp file
Python script
Compute node 2
Bash Script
Tmp file
Python script
Compute node 1
Design of monitoring tool
• If script throws an excep.on, an email is sent to the system administrators
• Script run by cron on 60 service nodes on Keeneland – Run every 30 minutes – Every run produces 8kb of data
Analysis of Data
• CPU U.liza.on and Overall GPU U.liza.on
Total GPU u.liza.on, when compared to CPU u.liza.on, is rela.vely low
CPU Utilization and Overall GPU Utilization
• Possible reasons for low u.liza.on of GPUs Low U.liza.on
Applica.on’s ability to fully
u.lize all GPUs in a mul. GPU environment
Limited bandwidth per FLOP available out of a single compute node
Ability of an applica.on to fully u.lize the
performance of a single GPU.
CPU Utilization and Overall GPU Utilization – Caveats
• KIDS is a developmental system hence it is difficult to assert if lack of u.liza.on is due to deficiencies in applica.on or if the user is ar.ficially limi.ng GPU usage during tes.ng or debugging
• Further development of the toolset is intended to give more granular data, allowing more accurate conclusions
Overall GPU Utilization by Application
0 20 40 60 80
100
Perc
enta
ge U
tiliz
atio
n
Software Package
GPU and Memory Utilization by Software Package
Average GPU Utilization
Average Memory Utilization
• Several applica.ons have GPU u.liza.ons > 50% on an average
• Memory u.liza.on is significantly lower than GPU u.liza.on
• Unclear if this is due to bandwidth constraints, applica.on design or other factors
CPU Utilization and Requested GPU Utilization
0
20
40
60
80
100
Percen
tage U+liza+
on
Timeline
CPU U+liza+on vs Requested GPU U+liza+on
CPU U.liza.on
U.lza.on of Requested GPUs
• Applica.ons that do request GPUs, make reasonable u.liza.on of them
CPU Utilization and Requested GPU Utilization
0 20 40 60 80 100
Percen
tage U+liza+
on
Timeline
CPU U+liza+on vs Overall GPU U+liza+on
CPU U.liza.on Overall GPU U.liza.on
0
20
40
60
80
100
Percen
tage U+liza+
on
Timeline
CPU U+liza+on vs Requested GPU U+liza+on
CPU U.liza.on U.lza.on of Requested GPUs
• Possible reasons for this significant difference: – User could be limi.ng the scope of the applica.on for tes.ng and debugging
– Applica.ons cannot adequately scale past one GPU per process due to limita.ons in the code or the limited inter-‐node bandwidth.
Number of jobs and number of GPUs requested per job
0 200 400 600 800 1000
# of
Job
s
Number of GPUs requested
Number of Jobs vs Number of GPUs/Job (> 3 GPUs)
0 10000 20000 30000 40000
# of
Job
s
Number of GPUs Requested
Number of Jobs vs Number of GPUs/Job (Overall)
• Majority of jobs on KIDS request fewer than 3 GPUs – Due to large number of very small, short jobs being used for applica.on development
– Once system is in produc.on, this number should dras.cally increase
Issues encountered during development of toolkit
• Large volume of data generated by the output of the nvidia-‐smi u.lity – Future versions of NVML should allow administrators to select only relevant data
• Failure mode of the nvidia-‐smi tool is unpredictable when there is a poten.ally faulty GPU in the system – Tool some.mes generates erra.c output, no output or seemingly normal output
– This makes diagnosing problem GPUs on a large scale difficult
Other monitoring tools
• Provides CLI & GUI interface
• Tool that can be used for management, provisioning and monitoring hybrid HP systems.
HP Insight Cluster Management U.lity
• Uses python binding for NVML
• Allows simplified access to GPU metrics like temperature, memory usage and u.liza.on
Ganglia’s Gmond Python module
Comparison with other tools
• Gmond presents data in RRD format which is an abbreviated, averaged version of data
• Extremely high level, which is not useful if you are trying to understand u.liza.on at a par.cular moment in .me
• Our tool collects data over .me and does not average it
• Allows us to maintain granularity much farther into the future
• Useful in scenarios where you can correlate GPU usage with ECC errors
• Easy to get sta.s.cs of u.liza.on with respect to job sizes, wall clock requests, GPU requests etc.
Other considera.ons: • Commercial tools were expensive • Open source alterna.ves were early in produc.on and development • Had a pressing need to provide very specific data to our review panel
Conclusions
• This tool provides an important first step in crea.ng an open sourced tool for collec.on of u.liza.on sta.s.cs for GPU based systems
• Not many monitoring tools are available for GPU systems, few that are, are expensive or in early development
• High level study of data reveals that soaware barring a few, are s.ll CPU cycle heavy and do not take full advantage of the processing power of GPUs
Future Work
• Collec.on of other sta.s.cs such as ECC errors, power and temperature sta.s.cs
• Collec.on of sta.s.cs on a more frequent basis • Collaborate with soaware developers to mine the data generated by this tool – Data can be used to aid soaware development for GPU systems
– Data can also be used to determine appropriate CPU:GPU ra.os for jobs and assist in crea.ng scheduling policies
Questions