Presented by: Marlon Bright
19 June 2008
Advisor: Masoud Sadjadi, Ph.D.
REU – Florida International University
Outline
Grid Enablement of Weather Research and Forecasting Code (WRF)
Profiling and Prediction Tools Research Goals Project Timeline Current Progress Challenges
REU - Florida International University 2
Motivation Weather Prediction can:
Save LivesHelp Business Owners & Emergency Response
How?Accurate and Timely ResultsPrecise Location Information
What do we have?WRF – Weather Research Forecast“The Weather Research and Forecasting (WRF)
Model is a next-generation mesocale numerical weather prediction system designed to serve both operational forecasting and atmospheric research needs.”
REU - Florida International University 3
Motivation (Cont.) - WRF WRF Status
Over 160,000 lines (mostly FORTRAN and C)Single Machine/Cluster compatibleSingle DomainFine Resolution -> Resource Requirements
How to Overcome this?Through Grid Enablement
Expected Benefits to WRFMore available resources – Different DomainsFaster resultsImproved Accuracy
REU - Florida International University 4
Grid Enablement “Grid-enabling is the practice of taking existing applications,
which currently run on a single node or on a cluster of homogeneous nodes, and adapt them (either automatically or manually) so that they can be deployed over non-homogeneous computing resources connected through the Internet across multiple organizational boundaries (e.g., multiple clusters from different organizations) without major modifications to the underlying source code.”
Grid-enablement process successful if the resulting Grid-enabled application “performs better” than the original application.
Performs better can be interpreted differently Improved execution time, better resource utilization,
enabling collaboration, …REU - Florida International University 5
System Overview Web-Based Portal Grid Middleware (Plumbing)
Job-Flow ManagementMeta-Scheduling
○ Performance Prediction
Profiling and Benchmarking Development Tools and Environments
Transparent Grid Enablement (TGE)○ TRAP: Static and Dynamic adaptation of programs○ TRAP/BPEL, TRAP/J, TRAP.NET, etc.
GRID superscalar: Programming Paradigm for parallelizing a sequential application dynamically in a Computational Grid
REU - Florida International University 6
Meta - Scheduling
IMPORTANT: WRF cannot be gridified trivially! “Global” scheduler of grid environment—above
Local Resource Manager Selects resources for jobs to run on if not run on
local resources Submits user jobs to optimal remote resources
(different domain/Virtual Organization):Analyzes application and hardware characteristics to
find best matchUses application performance prediction models
REU - Florida International University 7
Performance Prediction
Allows for: Optimal usage of grid resources through
“smarter” meta-schedulingMany users overestimate job requirementsReduced idle time for compute resourcesCould save costs and energy
Optimal resource selection for most expedient job return time
REU - Florida International University 8
Better Scheduling by Modeling WRF Behavior
networkdiskmemory
k
kk bbbbbnx 443cache2CPU10
4
10
REU - Florida International University 9
Mathematical Modeling
Parameter Estimation
ProfilingCode Inspection & Modeling
Texe= ( 0 + 1 / #nodes ) ( 0 + 1 / clock )
ModelingModelingWRFWRF
BehaviorBehavior
An Iterative Process
An Incremental
Process
Start
Amon / Aprof
Amon – monitoring program that runs on each compute node recording processes
Aprof – regression analysis program running on head node; receives input from Amon to make execution time predictions (within cluster & between clusters)
REU - Florida International University 11
Amon / Aprof Monitoring and Prediction
REU - Florida International University 11
Amon / Aprof Approach to Modeling Resource Usage
12REU - Florida International University
WRF
mv wrfjob.${jobid}.out ${RESULTS_DIR}/${cpu_limit}/${i}.out
Previous Findings for Amon / AprofExperiments were performed on two clusters at FIU
—Mind (16 nodes) and GCB (8 nodes) Experiments were run to predict for different
number of nodes and cpu loads (i.e. 2,3,…,14,15 and 20%, 30%,…,90%, 100%)
Aprof predictions were within 10% error versus actual recorded runtimes within Mind and GCB and between Mind and GCB
Conclusion: first step assumption was valid. -> Move to extending research to higher number of nodes.
REU - Florida International University 14
Paraver / Dimemaso Dimemas - simulation tool for the
parametric analysis of the behavior of message-passing applications on a configurable parallel platform.
o Paraver – tool that allows for performance visualization and analysis of trace files generated from actual executions and by Dimemas
Tracefiles generated by MPItrace that is linked into execution code
REU - Florida International University 15
Paraver/Dimemas – DiP Environment
REU - Florida International University 16
Goals
1. Extend Amon/Aprof research to larger number of nodes, different archtitecture, and different version of WRF (Version 2.2.1).
2. Compare/contrast Aprof predictions to Dimemas predictions in terms of accuracy and prediction computation time.
3. Analyze if/how Amon/Aprof could be used in conjunction with Dimemas/Paraver for optimized application performance prediction and, ultimately, meta-scheduling
REU - Florida International University 17
Timeline End of June:
Get MPItrace linking properly with WRF Version Compiled on GCB, then Mind
a) Install Amon and Aprof on MareNostrum and ensure proper functioning
b) Run benchmarks on MareNostrum Early July:
Use Amon/Aprof to predict within MareNostrum (and possibly between MareNostrum, GCB, and Mind)
Use generated MPI/ OpenMP tracefiles (Paraver/Dimemas) to predict within/between Mind, GCB, and MareNostrum
Late July/Early August: Experiment with how well Amon and Aprof relate to/could possibly be
combined with Dimemas Analyze how findings relate to bigger picture. Make optimizations on grid-
enablement of WRF. Compose paper presenting significant findings.
REU - Florida International University 18
Current Progress
Familiarized and up-to-speed on current state of research
Completed reading of most essential related works papers
Functional user of Paraver In final stages of being fully functional on
Linux Platform Amon/Aprof installed on MareNostrum
REU - Florida International University 19
Current Progress (cont’d) Becoming functional Amon/Aprof driver
on MareNostrum Supercomputer Developing research plan for experiments Developing benchmarking scripts for
executing experiments
Working out bugs/becoming functional user of Dimemas on GCB and Mind Working to properly generate Dimemas
tracefiles on GCB
REU - Florida International University 20
Current Challenges Compiling version 2.2 of WRF in Mind (and
possibly MareNostrum) or:
Compiling version 2.2.1 of WRF in GCB and Mind
Linking MPItrace into compiled WRF in GCB/Mind cluster to generate accurate Paraver/Dimemas trace files
Adapting/developing benchmarking scripts to new architecture of MareNostrum
REU - Florida International University 21
References S. Masoud Sadjadi, Liana Fong, Rosa
M. Badia, Javier Figueroa, Javier Delgado, Xabriel J. Collazo-Mojica, Khalid Saleem, Raju Rangaswami, Shu Shimizu, Hector A. Duran Limon, Pat Welsh, Sandeep Pattnaik, Anthony Praino, David Villegas, Selim Kalayci, Gargi Dasgupta, Onyeka Ezenwoye, Juan Carlos Martinez, Ivan Rodero, Shuyi Chen, Javier Muñoz, Diego Lopez, Julita Corbalan, Hugh Willoughby, Michael McFail, Christine Lisetti, and Malek Adjouadi. Transparent grid enablement of weather research and forecasting. In Proceedings of the Mardi Gras Conference 2008 - Workshop on Grid-Enabling Applications, Baton Rouge, Louisiana, USA, January 2008.
http://www.cs.fiu.edu/~sadjadi/Presentations/Mardi-Gras-GEA-2008-TGE-WRF.ppt
S. Masoud Sadjadi, Shu Shimizu, Javier Figueroa, Raju Rangaswami, Javier Delgado, Hector Duran, and Xabriel Collazo. A modeling approach for estimating execution time of long-running scientific applications. In Proceedings of the 22nd IEEE International Parallel & Distributed Processing Symposium (IPDPS-2008), the Fifth High-Performance Grid Computing Workshop (HPGC-2008), Miami, Florida, April 2008.
http://www.cs.fiu.edu/~sadjadi/Presentations/HPGC-2008-WRF%20Modeling%20Paper%20Presentationl.ppt “Performance/Profiling”. Presented by
Javier Figueroa in Special Topics in Grid Enablement of Scientific Applications Class. 13 May 2008
REU - Florida International University 22
Acknowledgements
REU PIRE BSC Masoud Sadjadi, Ph. D. - FIU Rosa Badia, Ph.D. - BSC Javier Delgado – FIU Javier Figueroa - UM
REU - Florida International University 23