Upload
sabina-fitzgerald
View
214
Download
0
Tags:
Embed Size (px)
Citation preview
Motivation (General)
Optimal usage of grid resources through “smarter” meta-scheduling
Many users overestimate job requirements Reduced idle time for compute resources Save utility and energy costs Optimal resource selection for most expedient job
return time
WRF Prediction Possibilities
Over 50% of Classified Disasters Hurricanes Flash Floods Droughts
Source: adrc.asia
Without Performance Prediction
Users need additional knowledge How long will the job take? Where to send? etc.
Unfair preemption of resources
Example Scenario
3 Resources Marenostrum (10K+ core supercomputer) Mind (32 core hyperthreading cluster) GCB (8 core hyperthreading cluster)
2 jobs 1 continental US WRF simulation (urgent) 1 simulation of a 75 x 75 portion of Florida (for
benchmarking)
Example Scenario
User has no knowledge of how long either simulation will last
Intuitively, Marenostrum will be faster However, the user has have exclusive access
to Mind (i.e. no queue time) How should the jobs be allocated?
Example Scenario
CONUS Job
Benchmark Job
Marenostrum (32 nodes) Mind (all nodes) GCB (all nodes)
45 minutes 180 minutes 500 minutes
Marenostrum (32 nodes) Mind (all nodes) GCB (all nodes)
3 minutes 20 minutes 50 minutes
Example Scenario
Execution Prediction (aprof) can estimate execution time on each system
Other tools can be used for queue time prediction
With the above two, and using information from metascheduler, automatic allocation is feasible
Motivation (Storm Mitigation)
Humane Thousands of lives can be saved
Economical Millions of dollars needed to fix damages If given more time, we can minimize this
10-km WRF4-km WRF
Dashed magenta indicates approximate area of rainfall
Produced by convective parameterization Parameterized convection (on the 10 km grid) cannot differentiate different mode of convection
Why So Many Processors?
Source: NCAR(www.ncep.noaa.gov/nwp50/Presentations/Thu_06_17_04/Session_9/Kuo_50th_NWP/Kuo_50th_NWP.ppt)
To Do
Testing with different domains Testing on new platforms Cross-cluster testing Model Refinement, as necessary (GPU Programming)
Typical Tasks
Code Inspection C++ programming (for the model) Python and BASH scripting for testing Analysis of model and/or results using statistics
techniques