19
A modeling approach for estimating execution time of long-running Scientific Applications Seyed Masoud Sadjadi 1 , Shu Shimizu 2 , Javier Figueroa 1,3 , Raju Rangaswami 1 , Javier Delgado 1 , Hector Duran 4 , Xabriel J. Collazo-Mojica 5 Presented by: Xabriel J. Collazo-Mojica 5 1: Florida International University (FIU), Miami, Florida, USA; 2: IBM Tokyo Research Laboratory, Tokyo, Japan; 3: University of Miami, Coral Gables, Florida, USA; 4: University of Guadalajara, CUCEA, Mexico; 5: University of Puerto Rico, Mayagüez Campus, Puerto Rico Miami, Florida – April 2008

A modeling approach for estimating execution time of long-running Scientific Applications Seyed Masoud Sadjadi 1, Shu Shimizu 2, Javier Figueroa 1,3, Raju

Embed Size (px)

Citation preview

A modeling approach for estimating execution time of long-running

Scientific ApplicationsSeyed Masoud Sadjadi1, Shu Shimizu2, Javier Figueroa1,3, Raju

Rangaswami1, Javier Delgado1, Hector Duran4, Xabriel J. Collazo-Mojica5

Presented by: Xabriel J. Collazo-Mojica5

1: Florida International University (FIU), Miami, Florida, USA; 2: IBM Tokyo Research Laboratory, Tokyo, Japan; 3: University of Miami, Coral Gables, Florida, USA; 4: University of Guadalajara, CUCEA, Mexico; 5: University of Puerto Rico,

Mayagüez Campus, Puerto Rico  

Miami, Florida – April 2008

Presentation Outline

• Motivation

• Research Approach

• Research Validation

• Related Work

• Concluding Remarks

• Future Research

HPGC '08 - April 14 - LA Grid 2

Motivation

• The impact of hurricanes is devastating• The Weather Research and Forecasting (WRF)

model• Most popular

• It is computational and storage intensive

• We need higher resolution and more precise forecast• Many organizations are willing to share resources

• But these resources are dynamic and unpredictable

HPGC '08 - April 14 - LA Grid 3

Motivation

• At the time of a hurricane, we need to act fast• What resources should we allocate?

• We need to finish in a strict deadline (i.e. on time for hurricane forecast)

• In the order of seconds, we need to make a decision

• We need to model execution time of WRF based on target resources • In our case: clusters with different parameters

HPGC '08 - April 14 - LA Grid 4

Approach to Modeling Resource Usage

WRF

HPGC '08 - April 14 - LA Grid 5

Approach to Modeling Execution Parallelism

• Platform heterogeneity

• We assume identical individual resource characteristics of computation, communication and storage power.

• Execution scale

• We add a parameter to model the number of nodes utilized during execution.

1 2 3 N…

HPGC '08 - April 14 - LA Grid 6

Application Resource Usage Model

• Characterize Applications according to their resource usage characteristics (i.e. application "profiles”)

• Assumptions:

• Execution time is based on contributors

• Product of contributors determines total execution time

• Computation nodes are homogeneous (e.g. Beowulf cluster)

• Non-ad-hoc application characteristicsHPGC '08 - April 14 - LA Grid 7

Application Resource Usage Model - Contributors

• Model aims to allow as many contributors as necessary• This paper focus: 2 contributors• First contributor: Parallelism

• Ppara = degree of parallelism• α0= constant contribution• α1 = variable contribution

• Second contributor: CPU Performance• Pclock = clock speed of compute node• ß0 = constant contribution related to CPU performance• ß1 = variable contribution related to CPU performance

HPGC '08 - April 14 - LA Grid 8

Experimental Approach - Environment

• GCB cluster: Rocks ver. 4.0, 8 nodes, each containing 32-bit x86 Intel 3.0 GHz processors, 1GB of main memory and uses a gigabit network connection

• Mind cluster: Rocks ver. 4.0, 16 nodes, each containing dual Xeon 3.6GHz processors, 2GB of main memory and uses gigabit network connection

• CPU vs. #-of-NODES:100% to 10% CPU percentages with intervals of 10%

• We use CPULimit

HPGC '08 - April 14 - LA Grid 9

Experimental Approach - Monitoring and Prediction

• Two tools were used

• Amon – A Monitoring Tool

• Daemon-like application that collects and reports exploratory variables

• Aprof – A Profiling Tool

• Statistical Prediction Program

• Listens to Amon reports from compute nodes

• Stores collected data as matrix for each application

HPGC '08 - April 14 - LA Grid 10

Experimental Approach - Monitoring and Prediction

HPGC '08 - April 14 - LA Grid 11

Application Resource Usage Model - Validation

• Intuitive Assumption that execution time decreases linearly with the inverse of total computational power.

• Predictions within a cluster (i.e. GCB to GCB)• GCB - FE 5.34% ME 5.86%• Mind - FE 5.66% ME 3.80%

• Predictions across clusters• GCB to Mind - FE 9.97% ME 5.86%• Mind to GCB - FE 5.83% ME 4.13%

• This results validate our simple model.

HPGC '08 - April 14 - LA Grid 12

Application Resource Usage Model - Mind to GCB prediction

HPGC '08 - April 14 - LA Grid 13

Concluding Remarks

• We've proposed a new approach for modeling resource usage and execution time of a distributed application

• Experimental results using WRF execution on two different clusters show good accuracy - within 10% from across cluster predictions

• Using only two parameters - CPU speed and number of nodes.

• WRF specific, we are one step closer to devising a complete solution for our goal of higher-resolution weather predictions and simulations.

HPGC '08 - April 14 - LA Grid 14

Related Work• S. Shimizu, R. Rangaswami, and H. A. Duran-Limon.

"Platform-independent Modeling and Prediction of Application Resource Usage Characteristics”

• Basis for prediction model

• It is limited to one node

• D. M. Swany and R. Wolski. “Multivariate Resource Performance Forecasting In the Network Weather Service.”

• High-accuracy prediction model

• They emphasize latency and bandwidth

HPGC '08 - April 14 - LA Grid 15

Related Work

• R. Badia, F. Escale, E. Gabriel , J. Gimenez, R. Keller, J. Labarta, M. S. Müller, Perf. “Prediction in a Grid Environment.”

• Offline prediction

• Need to link their library to the application to be profiled

HPGC '08 - April 14 - LA Grid 16

Future Research

• Extend our parallelism model to address heterogeneous resources.

• Include more resource parameters to the model

• Started joint research with Barcelona Supercomputing Center

• We acknowledge that Amon & Aprof have limitations

• We will integrate our tools with their simulation application - DIMEMAS

HPGC '08 - April 14 - LA Grid 17

Acknowledgements• National Science Foundation

• REU Grant # IIS-0552555

• PIRE Grant # OISE-0730065

• CREST Grant # HRD-0317692

• GCB Grant # OCI-0636031

• IBM Research• LA Grid• FIU SCIS

HPGC '08 - April 14 - LA Grid 18

Questions?