Upload
others
View
7
Download
0
Embed Size (px)
Citation preview
Cyber Infrastructure for
Coastal Modeling
Chirag Dekate
Department of Computer Science
Louisiana State University
Report submitted to the Faculty of theLouisiana State University
in partial fulfillment of the requirements for the degree of
Master of Science
December 2004
I would like to dedicate this thesis to my loving parents ...
Acknowledgements
During the course of the project, expertise from various domains was
employed to design and develop the framework that is presented in this
report. Coastal Studies Institute (CSI) researchers Dr. Zhang and Dr
Prasad provided coastal modeling knowledge and expertise on staging
the models on computational resources. Gridlab researchers Hartmut
Kaiser, Andre Merzsky at Albert Einstein Institute assisted in getting
familiar with the Grid Application Toolkit. Gridsphere developers
Chongjie Zhang and Ian Kelly at LSU developed the SCOOP portal
interface.
I would like to sincerely thank Dr. Gabrielle Allen and Tom Goodale
for teaching me the principles of Grid Computing which lie at the core
of making such interdisciplinary collaborative research possible.
Abstract
Coastal modeling provides a unique problem set in the sense that the
data sources are varied, modeling is performed using legacy and usu-
ally closed source code, and models are generally not versatile or stan-
dards compliant. These features have restricted current models to be
run only in constrained environments. Such practices are not limited
to Coastal modeling, and are common in many other fields including
other environmental sciences, petroleum engineering etc. Adapting
such legacy applications, to use HPC resources provides additional
challenges including dealing with primitive data transport mecha-
nisms. In this project we design and implement a prototype cyber
infrastructure for coastal modeling, using current middleware tech-
nologies including Globus, the Grid Application Toolkit, and Grid-
sphere.
The resulting modular and interoperable framework will allow LSU
researchers to better leverage available computational resources and
use the current infrastructures to the fullest extent. In doing so we
develop foundations for a more generic and extensible cyber infras-
tructure for Coastal Modeling. The software developed in this project
is currently deployed in operational (real time) mode at the WAVCIS
projects at CSI, and provides a starting point for the Grid efforts in
the SURA Coastal Ocean Observing and Prediction (SCOOP) project
at CCT and CSI.
Contents
1 Introduction 1
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Wave Model (WAM) . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3 Simulating Waves Nearshore (SWAN) Model . . . . . . . . . . . . 2
1.3.1 Physics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3.2 Computation . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3.3 Output Quantities . . . . . . . . . . . . . . . . . . . . . . 3
1.3.4 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Infrastructure for Modeling 5
2.1 Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.1 SCOOP Grid . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.2 GumboGrid . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Grid Middleware . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2.1 Globus Toolkit . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2.2 The Grid Application Toolkit . . . . . . . . . . . . . . . . 8
3 Coastal Modeling Scenarios 10
3.1 Operational SWAN Modeling . . . . . . . . . . . . . . . . . . . . 10
3.2 Coupled Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.2.1 Operational Coupled Modeling . . . . . . . . . . . . . . . 13
3.2.2 Coupled Modeling over Historical data . . . . . . . . . . . 16
i
CONTENTS
4 Current Work in Progress 19
4.1 Coupled and Nested Modeling . . . . . . . . . . . . . . . . . . . . 19
4.2 Cyber Infrastructure for Coastal Modeling . . . . . . . . . . . . . 20
References 22
ii
List of Figures
1.1 WAVCIS Satellite based data collection, archival and dissemina-
tion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2.1 GAT Architecture (image from Gridlab website ) . . . . . . . . . 9
3.1 Operational SWAN using HTTP Transport for data movement . 11
3.2 Operational SWAN using GAT where the preliminary HTTP trans-
port layer was abstracted using GAT API . . . . . . . . . . . . . 12
3.3 Coupled Modeling Region . . . . . . . . . . . . . . . . . . . . . . 14
3.4 Wind data obtained from JSU, Mississippi . . . . . . . . . . . . . 15
3.5 Coupled modeling between multiple CA Domains . . . . . . . . . 16
3.6 Historical Data based coupled modeling . . . . . . . . . . . . . . 17
4.1 Comparing observed data with model outputs for skill assessment 20
4.2 Coupled and Nested Modeling Map . . . . . . . . . . . . . . . . . 21
iii
Chapter 1
Introduction
1.1 Background
WAVCIS program at LSU operates a number of offshore sensors on platforms to
monitor atmospheric and oceanographic conditions around Louisiana coast. The
data collected from these sensors are transmitted to LSU using satellite based
transport mechanisms. Once the raw data is collected at LSU, post-processing is
performed on the data and is archived in an relational DBMS namely MySQL.
The data is displayed on the WAVCIS website using ASP and GIS .
Data stored in the RDBMS was previously used for the singular purpose
described above. In this project we leverage the above data source to run various
modeling scenarios.
1.2 Wave Model (WAM)
WAM stands for WAve Model Komen & Janssen (1994) , a widely used deep
water model used to study different aspects such as wave height, wave spectrum
etc. WAM is primarily written in fortran and is run in a single processor en-
vironment. There are no compiler restrictions for for this model as it compiles
using standard GNU compilers. Since this is a deep water model, domain or area
on which this model runs is not a limiting factor. That is the WAM model can
be used on the entire region of the atlantic coast, and for regional runs covering
1
1.3 Simulating Waves Nearshore (SWAN) Model
Figure 1.1: WAVCIS Satellite based data collection, archival and dissemination.
smaller areas such as Gulf of Mexico.
1.3 Simulating Waves Nearshore (SWAN) Model
SWAN Website stands for Simulating WAves Nearshore, a widely used shallow
water model generally used to study wave interactions near coastal shorelines.
SWAN is a relatively well designed model available in both serial and parallel
codes. SWAN can only be compiled using compilers recommended by the de-
velopers, GNU compilers do not work well with this model since . Since our
standard processor base is x86 we use the intel fortan compilers for both serial
and parallel versions of the program. Following are some of the features of SWAN
as described by the model developers (This information has been obtained from
SWAN Website :
2
1.3 Simulating Waves Nearshore (SWAN) Model
1.3.1 Physics
SWAN accounts for the following physics :
• Wave propagation in time and space, shoaling, refraction due to current
and depth, frequency shifting due to currents and non-stationary depth.
• Wave generation by wind.
• Three- and four-wave interactions.
• White-capping, bottom friction and depth-induced breaking.
• Wave-induced setup.
• Propagation from laboratory up to global scales.
• Transmission through and reflection from obstacles.
1.3.2 Computation
SWAN computations can be made on a regular and a curvi-linear grid in a Carte-
sian or spherical coordinate system. Nested runs, using input from either SWAN,
WAVEWATCH III or WAM can be made with SWAN. SWAN runs can be done
serial, i.e. one SWAN program on one processor, as well as parallel. For the
latter, two parallelization strategies are available:
• distributed-memory paradigm using MPI and
• shared-memory paradigm using OpenMP.
1.3.3 Output Quantities
SWAN generates the following output quantities (numerical files containing ta-
bles, maps and timeseries):
• one- and two-dimensional spectra,
• significant wave height and mean wave period,
3
1.3 Simulating Waves Nearshore (SWAN) Model
• average wave direction and directional spreading,
• one- and two-dimensional spectral source terms,
• root-mean-square of the orbital near-bottom motion,
• dissipation,
• wave-induced force (based on the radiation-stress gradient),
• wave-induced setup,
• and more.
1.3.4 Limitations
SWAN does not account for
• diffraction and
• scattering reflections.
4
Chapter 2
Infrastructure for Modeling
2.1 Hardware
In order to mimic real world environments where such models are usually run,
diverse hardware and software platforms were used in this project. The resources
used for this project included one processor systems, multiprocessor systems and
HPC clusters such as GumboGrid etc. A test bed of 6 pentium class worksta-
tions was used to create Gumbo Grid during the course of this project. The
multiple resources used could be classified into 2 distinct virtual organizations
since SCOOP Grid had policy restrictions on the kind of middleware available
and the CCT computational resources had different computational and policy im-
plications. Additionally each of these entities had their own CA domains much
like real world environments where simulations transcend multiple CA domains.
This section describes the hardware resources used for the scenarios, how they
are used is described in the next chapter.
2.1.1 SCOOP Grid
SCOOP Grid is a combination of single processor and multi processor machines
at the Coastal Studies Institute. Following are the details of the computational
resources used:
• Carmen , Eloise - Single processor machines with Pentium 4 2Ghz processor,
512 megabytes RAM each and 60 gigabytes storage.
5
2.1 Hardware
• Hilda - 2 processor xeon Power-Edge server with 2 gigabytes memory and
80 gigabytes storage.
• Hugo - 4 processor xeon Power-Edge server with 4 gigabytes memory and
80 gigabytes storage
Each of the above machines is running Debian Linux. Only packages from NMI
(NSF middleware initiative) were used for this resource due to policy constraints.
Globus packages from the latest NMI 5.0 release were used for this project. A
SCOOP LSU certificate authority was setup using Globus Simple CA package.
2.1.2 GumboGrid
Gumbo Grid is a 6 node computational cluster / Grid developed by CCT graduate
students to experiment with the latest grid technologies. In most cases these are
pentium III machines, with average of 192 megabyte memory per system, running
Redhat AS EL. The Grid middleware installed on Gumbo Grid includes Globus
toolkit 3.2.1, Condor, MPICH-g2 and development tools such as Intel Fortran
compilers, standard g77, gcc compilers. We also set up a CCT CA using Globus
Simple CA package. Following were some of the activities carried out in order to
setup the machines.
• Operating systems were freshly installed on the systems for the purpose of
this project and future use. Redhat AS EL was used as the base Linux
distribution for this cluster
• Configured to use existing LDAP based infrastructure for authentication of
common users.
• Installed Globus 3.2.1 source packages , GSISSH from NCSA and Condor
were deployed
• MPICH 1.2.6 was installed using globus2 as a device. Gumbo grid is one
of the few clusters to host a working installation of mpich-g2. Signifi-
cant changes from GLOBUS IO to GLOBUS XIO in the latest versions
of the Globus distributions were identified as problem sources and respec-
tive patches were applied to facilitate use of mpich-g2 with globus 3.2.1.
6
2.2 Grid Middleware
Gumbo grid was setup to use the CCT CA domain which provided transparent
access to other HPC resources such as SuperMike and SuperHelix.
2.2 Grid Middleware
Like other scientific communities, the coastal community is diverse in its technical
expertise. While some research centers similar to the ones at LSU are familiar
with latest high performance tools, others rely on more traditional ways of oper-
ating. Technologies used to perform standard tasks for instance file transfers vary
a great deal as a result. Additionally it is nearly impossible to guarantee that all
participating computational sites use a standard set of applications. For these
reason and more it is imperative that Grid middleware chosen can transcend such
capability limitations.The GAT Gabrielle Allen (2003, 2004) alleviates some of the
problems posed by heterogenous infrastructure by providing a API to facilitate
transparent access to resources and applications. Following is a brief description
of some of the grid middleware used in the project. This section describes the
software resources used for the scenarios, how they are used is described in the
next chapter.
2.2.1 Globus Toolkit
Globus toolkit provides a set of fundamental services for computational Grids.
The toolkit is composed of the following components which can be used indepen-
dently (Summarized from Globus Toolkit Website):
• GRAM Grid Resource Allocation Management : Provides resource manage-
ment functionalities including resource allocation, process creation, moni-
toring and management services. RSL (Resource Specification Language)
is used to describe the requests which GRAM uses to map to available
resources.
• GSI Grid Security Infrastructure : Provides a certificates based single sign
on run anywhere.
7
2.2 Grid Middleware
• Monitoring and Discovery Service : A Lightweight Directory Access Pro-
tocol (LDAP) based infrastructure to store information such as compute
server configuration, network status etc.
• Global Access to Secondary Storage : Provides functionality for program-
mer to manage data movement and access strategies to provide data trans-
parency.
2.2.2 The Grid Application Toolkit
The Grid Application Toolkit Gabrielle Allen (2003, 2004) is a simple yet power-
ful API that allows programmers to develop code irrespective of the underlying
infrastructure / framework. For instance a programmer wanting to move/copy
data from one location to another uses the GAT File Copy. When compiled with
the GAT libraries the GAT Engine uses the appropriate drivers available at the
location. Thus if the machine has Globus installed it uses the gridFTP adaptor
to perform the file movement, else it cycles through a list of adaptors and selects
available applications to move the file from source to destination. This allows
the programmer to focus on the problem rather than worry about underlying
infrastructure.
8
2.2 Grid Middleware
Figure 2.1: GAT Architecture (image from Gridlab website )
9
Chapter 3
Coastal Modeling Scenarios
3.1 Operational SWAN Modeling
The term operational in this context means that the data collected in real time
is used to drive models which run on a timely basis. SWAN model describes the
physics of waves near Louisiana Coast more accurately than other models. There-
fore SWAN was chosen as the model that would be run in operational mode. The
region that we ran the model on depends on the availability of in-situ observation
in the study region. The in-situ observations provided by the sensor stations
help enforce the boundary conditions by passing in important wind information
pertaining to the model.
Once the model region and model to run were selected, trial runs were carried out
with data from sensors to test the accuracy, relevance of results and help develop
a process flow. Doing so allowed us to locate areas that needed attention during
the automation phase such as automated generation of input parameters on the
basis of in-situ observations. The input files required by the model can be divided
into two sections: one which are static (non changing) and the others which are
dynamic (which are generated in real time.). The automation process needed to
be designed to account for creation of such input files. To accomplish this task a
number of scripts were written which parsed the relevant data sensors from the
sensor data file and generated the input files required by the model. The scripts
also handle inconsistencies in the datasets including flawed input data values.
10
3.1 Operational SWAN Modeling
Figure 3.1: Operational SWAN using HTTP Transport for data movement
Test runs with new scripts for data access and generation in place were carried
out to check for completeness of the process. Initially the data transport mech-
anism was solely based on HTTP protocol and wget. There are pros and cons
to such a scenario. Simple-text protocols such as HTTP and FTP are common
forms of data transport supported by nearly all in the coastal community. How-
ever the disadvantage of such a scenario is its relative inflexibility and reliance on
availability of webserver. We are limited to the sources which have an available
webserver. Clearly a more dynamic architecture which provides functionality to
cycle between different data transport mechanisms. For this purpose we investi-
gated GAT (Grid Application Toolkit) as a possible API to use in our scenarios.
Following is an image of the scenario using GAT
Several reasons motivated the use of GAT to abstract the data transport layer.
To account for future expandability the architecture should be abstract enough
that data from different sources can be fed used to stage the models. Different
sites have varied capabilities of transport mechanisms such as scp, sftp, GridFTP,
FTP, HTTP depending on the expertise and the constraints at each site. GAT
provides an abstract way to deal with such complex environments. Grid Applica-
11
3.1 Operational SWAN Modeling
tion Toolkit works by providing a simple API which invokes underlying adaptors
via the GATEngine . Each of the transport methodologies has an adaptor as-
sociated with it which binds the application API to GAT adaptor functionality.
So when the user invokes GAT File Copy(source, destination) the GAT Engine
picks the user-specified adaptor or cycles through a list of available adaptors com-
patible with the scenario at hand and performs the copy operation. Thus GAT
provides an ideal solution to provide transparency at the transport layer.
Figure 3.2: Operational SWAN using GAT where the preliminary HTTP trans-port layer was abstracted using GAT API
For our specific case we use GridFTP adaptors provided by Gridlab / LSU
developers to facilitate data transfers between different locations. The sequence
of events as they happen are as follows:
• Retrieve data from sensor data source using GAT
• Generate parameter files for model based on the sensor data.
• Stage SWAN model on computation resource
• Copy output files from computation resource to visualization resource
• Visualize using GIS enabled client
12
3.2 Coupled Modeling
3.2 Coupled Modeling
When two models with different basic physics equations interact / feed into one
another it is known as coupled modeling. In our scenarios we couple the two
models WAM and SWAN. Physics of WAM is more suited for deep-water regions
and the model is usually run over large areas (entire Atlantic or Pacific region).
In contrast the physics of SWAN is more suited for shallow water regions where
the coastline / reef breaks the progression of the wave and causes reverse current,
countering wave propagation. On the basis of this we designed a scenario where
we use WAM model for the entire Gulf of Mexico and the results generated by
the WAM model are fed into the SWAN model. The SWAN model runs over a
very limited region across Louisiana Coast. Thus facilitating researchers to study
waves in Gulf of Mexico and their impact on Louisiana coastline. Such study is
especially useful in cases when there are tropical storms or hurricanes lurking in
the gulf. Following is a map of the model region proposed.
The challenges for this section included sequential execution of WAM and
SWAN model in that order and moving boundary forcing conditions generated
by WAM and feeding them into SWAN model for further analysis. There are two
possible scenarios which utilize the coupled modeling framework.
• Operational Coupled Modeling.
• Model runs over historical data.
The two scenarios differ in the sense that coupled modeling based on historical
data uses static input (datasets from previous hurricanes etc.) where as in the
operational modeling scenario we use data from different sources in realtime and
the model is staged in a timely manner.
3.2.1 Operational Coupled Modeling
In operational coupled modeling we run the coupled WAM and SWAN models in
a periodic manner depending on the availability of input data. Lack of readily
available input data for running the models was a major impediment in achieving
quality results. Collaborations with Jackson State University (Department of
13
3.2 Coupled Modeling
Figure 3.3: Coupled Modeling Region
Meteorology) helped us overcome this problem by providing us with a suite of
atmospheric model results that we could use for our simulations. which cover the
Gulf of Mexico region (shown below).
The above data is generated twice everyday and and provides 72 hours forecast
of surface winds. Therefore our operational product is planned to run when the
latest set of input data becomes available. The various stages of events are listed
below
• Obtain Wind data files from storage repository
• Stage large area WAM model
• Copy the output results to visualization resource
14
3.2 Coupled Modeling
Figure 3.4: Wind data obtained from JSU, Mississippi
• Copy one set of output results to SWAN modeling resource
• Stage SWAN model on the smaller region using the output obtained in the
previous step as boundary condition
• Copy SWAN output to visualization resource
15
3.2 Coupled Modeling
• Visualize both outputs using a GIS enabled visualization client.
Following is a schematic describing the scenario.
Figure 3.5: Coupled modeling between multiple CA Domains
3.2.2 Coupled Modeling over Historical data
This scenario uses the same resources as above including the same scripts and
design. The only variation is that instead of using real-time input data we use
16
3.2 Coupled Modeling
archived data sets of historical hurricanes and thunder storms. Following is the
result obtained for a historical hurricane Hurricane 215 (September 1915) :
Figure 3.6: Historical Data based coupled modeling
The arrows indicate the direction of waves during the hurricane event. The
above results show that the two models appear to be in close agreement with one
another. Time series based comparison at specific points can pinpoint differences
if any at greater accuracy. We are currently working on these scenarios to display
all relevant information using graphs, charts etc.
Note on Operational Wave Modeling
Although we have the process flow issues worked out, the input data that JSU is
providing us with, is very limited in nature in that it covers only a portion of Gulf
of Mexico. Model region which covers entire the entire gulf is more relevant as it
comprehensively demonstrates the dynamics of extreme events such as hurricanes
17
3.2 Coupled Modeling
and tropical storms etc. To facilitate such a plan we are currently working with
NCEP / NOAA datasets and plan to put them into operation shortly after QA
/ QC and testing.
18
Chapter 4
Current Work in Progress
Prior to implementation of operational modeling scenarios the data obtained
from the sensors was not used to its fullest extent. In this project we have
demonstrated how such datasets can be put to innovative use. Globus and GAT
have enabled real time usage of the sensor data to produce relevant results, by
allowing us to focus on the modeling aspects and providing a transparent layer
for file movement. Running models in operational model allows researchers to
study the characteristics of the results obtained from the models with in-situ
observations and detect conditions where the model can be improved. Following
time series graph is one example of such comparisons This project is first of its
kind and activities such as operational coupled modeling have never been carried
out using current technologies at any of the research centers. Inspired by such
activities, researchers at CCT and WAVCIS are conjuring new scenarios pushing
the limits of the technologies involved. For instance the coupled and nested
modeling scenario.
4.1 Coupled and Nested Modeling
This is a far more complicated scenario that those that have been described
above. The basic idea behind this scenario a Large area model couples with a
intermediate region model which further nests with multiple instances of small
area models with finer resolution (hence higher computational cost.). Following
is a map of the planned model areas:
19
4.2 Cyber Infrastructure for Coastal Modeling
Figure 4.1: Comparing observed data with model outputs for skill assessment
4.2 Cyber Infrastructure for Coastal Modeling
Since this is an vibrant research and development project and far from conclusion,
I would like to propose a scenario for the future.
The cyber infrastructure would seek to integrate data sources, provide resource
transparency for secure execution of models and reliable archival, efficient re-
trieval of results. A future ocean modeler would be able to log in to an advanced
portal interface, use an interactive GIS based webservice to select a model re-
gion. The GIS webservice in conjunction with grid middleware would query all
the available replica catalogs for the region of interest. Using complex rules the
middleware matches users model selections to the available data and selects the
best possible data set. The ocean modeler then selects the time constraints if the
results are time critical and submits the job. An advanced resource broker stages
the jobs on the best possible resources based on the user constraints and dynam-
20
4.2 Cyber Infrastructure for Coastal Modeling
Figure 4.2: Coupled and Nested Modeling Map
ically checkpoints and migrates job to faster, newer resources as they become
available.
21
Bibliography
Gabrielle Allen, K.N.D.e.a., Kelly Davis (2003). Enabling applications
on the grid: A gridlab overview,. International Journal of High Performance
Computing Applications: Special issue on Grid Computing: Infrastructure and
Applications , 22. 2.2, 2.2.2
Gabrielle Allen, K.N.D.e.a., Kelly Davis (2004). The grid application
toolkit: Towards generic and easy application programming interfaces for the
grid. Submitted to IEEE , 16. 2.2, 2.2.2
GAT (2004). Gridlab Website. Web page: http://www.gridlab.org. (docu-
ment), 2.1
Globus Toolkit Website (2004). Globus FAQ. Web page: http://www.globus.
org. 2.2.1
Komen, C.L.D.M.H.K.H.S., G. J. & Janssen, P.A.E.M. (1994). Dynamics
and modelling of ocean waves. Cambridge University Press , 532. 1.2
SWAN Website (2004). SWAN model website. Web page: http://
fluidmechanics.tudelft.nl/swan/default.htm. 1.3
WAVCIS (2004). WAVCIS. Web page: http://wavcis.csi.lsu.edu. 1.1
22