15
Application Performance Prediction Javier Delgado Feb. 9, 2009 X

Application Performance Prediction Javier Delgado Feb. 9, 2009 X

Embed Size (px)

Citation preview

Application Performance Prediction

Javier DelgadoFeb. 9, 2009

X

Motivation (General)

Optimal usage of grid resources through “smarter” meta-scheduling

Many users overestimate job requirements Reduced idle time for compute resources Save utility and energy costs Optimal resource selection for most expedient job

return time

WRF Prediction Possibilities

Over 50% of Classified Disasters Hurricanes Flash Floods Droughts

Source: adrc.asia

Without Performance Prediction

Users need additional knowledge How long will the job take? Where to send? etc.

Unfair preemption of resources

Example Scenario

3 Resources Marenostrum (10K+ core supercomputer) Mind (32 core hyperthreading cluster) GCB (8 core hyperthreading cluster)

2 jobs 1 continental US WRF simulation (urgent) 1 simulation of a 75 x 75 portion of Florida (for

benchmarking)

Example Scenario

User has no knowledge of how long either simulation will last

Intuitively, Marenostrum will be faster However, the user has have exclusive access

to Mind (i.e. no queue time) How should the jobs be allocated?

Example Scenario

CONUS Job

Benchmark Job

Marenostrum (32 nodes) Mind (all nodes) GCB (all nodes)

45 minutes 180 minutes 500 minutes

Marenostrum (32 nodes) Mind (all nodes) GCB (all nodes)

3 minutes 20 minutes 50 minutes

Example Scenario

Execution Prediction (aprof) can estimate execution time on each system

Other tools can be used for queue time prediction

With the above two, and using information from metascheduler, automatic allocation is feasible

Motivation (Storm Mitigation)

Humane Thousands of lives can be saved

Economical Millions of dollars needed to fix damages If given more time, we can minimize this

10-km WRF4-km WRF

Dashed magenta indicates approximate area of rainfall

Produced by convective parameterization Parameterized convection (on the 10 km grid) cannot differentiate different mode of convection

Why So Many Processors?

Source: NCAR(www.ncep.noaa.gov/nwp50/Presentations/Thu_06_17_04/Session_9/Kuo_50th_NWP/Kuo_50th_NWP.ppt)

Process

Completed Work

Prediction Experiments 3 Different Platforms 1 domain

To Do

Testing with different domains Testing on new platforms Cross-cluster testing Model Refinement, as necessary (GPU Programming)

Typical Tasks

Code Inspection C++ programming (for the model) Python and BASH scripting for testing Analysis of model and/or results using statistics

techniques

Thank You!

Any Questions?