32
17 May 2006 Rapid Prototyping Capability for Earth-Sun System Sciences Preliminary Design Robert J. Moorhead Mississippi State University

Presentation

  • Upload
    aamir97

  • View
    197

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Presentation

17 May 2006

Rapid Prototyping Capability

for Earth-Sun System Sciences

Preliminary Design Robert J. MoorheadMississippi State University

Page 2: Presentation

17 May 2006

ApproachFormulate architectures and develop baseline capacities that integrate applied sciences systems tools into configurations to support efficient evaluation of the prospects of integrating research results from NASA’s Earth observation systems (with emphasis on spacecraft instruments on missions recently launched or planned for near-term launch) and associated Earth system models

•systems engineering tools•enterprise architecture tools•information visualization and analysis tools•uncertainty characterization tools•performance assessment tools

“NASA Earth Science and Space Systems benefiting Society: Evolving Systems Engineering Capacity,” presentation by Ron Birk, August 24, 2005, SSC

Page 3: Presentation

17 May 2006

System Scope• Reduce the amount of time that has typically been required

to consider the utility of new or future data streams on model outcomes.

• Systematically evaluate research capabilities in a simulated operational environment in order to evaluate components and/or configurations that could be considered for verification, validation, and benchmarking for transition from research to operations and/or into an integrated system solution (ISS).

• Figure 1 illustrates the interface between the RPC and external systems that include the SN and ISS components of NASA’s Earth Science Application Plan.

Page 4: Presentation

17 May 2006

RPC Interface

Page 5: Presentation

17 May 2006

System Context

• The RPC will provide the capability to integrate and provide access to the tools needed to evaluate the use of a wide variety of current and future NASA sensors and research results, model outputs, and knowledge, collectively referred to as “resources”.

• It is assumed that the resources are geographically distributed and thus RPC will provide the support for the location transparency of the resources.

Page 6: Presentation

17 May 2006

RPC node

Local and remote computing and storage facilities

Remote data providers

Model configuration

Input data sets configuration Experiment design

and execution

AnalysisSystem administrationand maintenance

Page 7: Presentation

17 May 2006

System modes and states• Before an experiment can be performed (a particular model using a

particular data source) two conditions must be satisfied.– First, the model must be installed at some computing facility assessable to

RPC users, and configured to run;– Second, the data must be configured so that it can be used by the model.

The data configuration may involve developing tools for the data conversions (format translations, subsetting, deriving values of variables not included in the original data products, geo-processing, etc).

• From the point of view of performing a particular experiment and analysis, the RPC can be in two distinct states:– ready for the experiment and analysis by end users– requiring action of specialists for installing and configuring the model and

its data

• During its life cycle, new resources and tools will be integrated with the RPC node, increasing the repertoire of experiments and analyses that can be performed.

Page 8: Presentation

17 May 2006

numericalmodel

Modelresults

Modelresults

Modelresults

analysis

numericalmodel 1

Modelresults

Modelresults

analysis

numericalmodel 2

Major Categories of Experiments

Different sources Different models

Page 9: Presentation

17 May 2006

Capabilities Required

1. Discovery, semantic understanding, secure access, and transport mechanisms for data products available from known data providers (Science Data Manager)

2. Data assimilation and geo-processing tools for all data transformations needed to match a given data product (or products) to the model input requirements, and support for organizing the data processing into workflows built from reusable and interoperable modules, including both the workflow specification mechanisms and the workflow enacting engine (Interoperable Geo-processing Environment)

Page 10: Presentation

17 May 2006

Capabilities Required (cont.)3. Model management:

a. Catalog of available models, model metadata catalog (including input and output model requirements), and mechanisms for integrating new models with RPC

b. Mechanisms for creation runtime environments; data staging (in and out); job scheduling, remote execution, and monitoring

c. Mechanisms for storing model outputs together with metadata and provenance information (all information needed to recreate the output data set); the metadata necessary to enable search and discovery of model outputs

4. Tools for model output analysis (including visualizations), tools for quantitative comparing model outputs, and tools for model benchmarking (Performance Metrics Workbench)

Page 11: Presentation

17 May 2006

Major System Constraints• Only models and data made available to RPC users and

integrated with the RPC node can be used to perform experiments.

• Installation and/or integration of models, as well as integration and geo-processing of data, needs to be performed by a respective specialist, and the time needed to accomplish that task will depend on the complexity of the particular model and data set(s).

• Running a model may take a long time, depending on the complexity and configuration of the model. The experiments will not necessarily be performed in real time.

Page 12: Presentation

17 May 2006

User Categories1. System administrators – responsible for deployment, configuration,

and maintenance of the system, and its users (for access control purposes)

2. Application specialists – responsible for installation and configuration of the model on computational systems accessible to the RPC users, and integrating these models with the RPC (which includes definition of the input and output data requirements)

3. Data processing specialists – responsible for the development and the deployment of the tools for data transformations

4. Domain specialists – responsible for defining, configuring (creating workflows for data processing, setting model parameters, etc), and executing experiments

5. Domain specialist performing the data analysis

Page 13: Presentation

17 May 2006

Assumptions and Dependencies• The RPC will depend on data and models provided by

third parties.• Access to remote computational and storage facilities will

be controlled according to policies established by the facility owners (stakeholders).

• It is assumed that these policies will allow RPC users to submit and monitor jobs on these systems which may require penetrating firewalls.

• It is possible that the access privileges will be different for different users, depending on organizational membership, nationality, or other factors beyond the control of the RPC system developers.

Page 14: Presentation

17 May 2006

Operational Scenario Summary• Design of experiment – identification of models and data

sets to be used• Assessment whether the models and data are currently

integrated with the RPC node• Filling requests to model and data specialists, as needed;

the specialists issue a notification when the models and data are available

• Configuration of the experiment (setting the model parameters, configuring the data (e.g., ROI, timeframe, etc)

• Asynchronous run and monitoring of the model• Analysis

Page 15: Presentation

17 May 2006

Physical Issues• The RPC node will be installed on a dedicated, stand-alone system

consisting of standard commercially available computing nodes, data storage, and hosting middleware servers.

• Core RPC modular capabilities (SDM, IGE, MM, PMW) will be executed on separate computing nodes.

• The RPC node will be complemented with remote resources – high performance computing and storage facilities as needed by the models to be used in the experiments.

• The RPC node can be moved from one geographical location to another.

• Access to the remote resources will require standard internet connections.

Page 16: Presentation

17 May 2006

System Performance Characteristics• The primary goal of the RPC node is to provide the capability to

rapidly prototype the assimilation of new or future NASA data products and/or model derived data streams into model applications that have generated demonstrable scientific results of merit and stakeholder interest.

• However, there is no established benchmark to quantitatively specify what “rapid” means. The reference point is the current practice – manual configuration of data and models, whereas the expectation is that the RPC approach will considerably speed up the process, in particular for repeated experiments, after the baseline data and models are set up.

• However, the initial phase – setting the baseline data and models – may prove to be time consuming as it will involve model integration, data acquisition and simulation, and the development of new components for geoprocessing the data.

Page 17: Presentation

17 May 2006

System Performance Characteristics

• “Rapid Prototyping” performance benefits will be best realized through the reusability of configured geoprocessing tasks to provide model-ready input data to a model that has been fully integrated into the RPC.

• It is this “reuse” capability that will enable the rapid evaluation of new data types.

• By associating existing geoprocessing workflows with new data types, the rapid assimilation of next-generation data into configured models should be readily achievable.

Page 18: Presentation

17 May 2006

Policy and Regulation• As the RPC develops into a viable simulation system, it is

expected that activities requiring RPC resources will be requested and coordinated among those selecting an RPC for evaluation, the RPC team conducting a specific evaluation, and RPC developers who will be required to maintain and evolve the RPC to support requirements for integrating new model applications, data products, and geoprocessing tasks.

• As the RPC evolves to meet new or changing requirements, configuration management practices, version control, and developmental practices will be followed to ensure that capabilities in development will be isolated from operational RPC capabilities.

Page 19: Presentation

17 May 2006

Policy and Regulation• Simply stated, development activities, testing, and

integration of new functionalities into the RPC should be “contained” through the use of segregated physical or virtual systems that may be isolated from the operational instance of the RPC.

• As new capabilities mature through development processes, configuration “check-in” procedures will be followed to ensure the orderly integration of the new “proven” capabilities.

• It is likely that such activities will involve proactive participation of an RPC technical working group.

Page 20: Presentation

17 May 2006

System Interfaces

• The RPC node has 5 categories of users, each requiring a dedicated interface.

• In addition, the RPC interacts with two classes of external systems: data providers and remote computing and storage facilities.

• Each interface will be described in the remaining slides

Page 21: Presentation

17 May 2006

System Administrator Interface

The administrator interface must support the administrator tasks:

• registering and de-registering users and assigning roles

• maintaining the user credentials needed to access remote resources

• monitoring the system status and usage• backing up and restoring data and software;

recovery from faults• deployment of new software components and

services

Page 22: Presentation

17 May 2006

Model Specialist Interface• The model specialist is responsible for deploying and integrating the

models into the RPC environment.

• The models can be installed either locally on RPC node hardware and/or at a remote computing facility.

• To integrate the model with RPC the specialist must “register” the model, that is, generate a metadata record that describes the model in terms of its functionality, the runtime requirements (location of the executable, environmental variables, the structure of the working directory, etc.), model parameters, and definition of the input and output datasets.

• The model specialist interface must thus support the registration of new models and editing of the metadata of the existing models.

• In addition, the model specialist interface must provide support for the testing of the correctness of the model deployment.

Page 23: Presentation

17 May 2006

Data Specialist Interface• The data specialist identifies the data providers and designs the geo-

processing procedure for transforming the original data product to match the model input data requirements.

• The design of the geo-processing may require the development and deployment of software components to perform specified tasks.

• The data specialist interface must provide support for:– searching data products from known data providers

– assessing the structure and syntax of available data products

– assessing the model input data requirements

– discovering and evaluating the geo-processing modules already integrated with the RPC node

– integrating new geo-processing modules within the RPC node

– composing the geo-processing process from available components

– testing of the correctness of the geo-processing procedure

Page 24: Presentation

17 May 2006

Domain Specialist Interface• To support the design and execution of experiments, the

domain specialist interface must support:– Discovery of available models and data through the RPC facilities– Receiving and filling requests for new models and data– Configuring experiments by

• Connecting a particular model with particular data• Setting the model parameters• Configuring datasets (region of interest, timeframe, etc.)

– Submitting models for execution– Monitoring the model progress– Controlling the model execution (e.g., aborting it, if needed)– Verifying that the model completed successfully (e.g., by

examining a log file generated by the model, running a test applications, etc.)

Page 25: Presentation

17 May 2006

Analyst Interface

• The analyst analyses the experiment outcome. The analyst interface must:– Allow queries of the output data databases to find the

model outputs of interest

– Provide access to model outputs

– Provide access to model provenance (when and in what circumstances the model has been run, e.g., what input data sets has been used, the values of the model parameters, etc.)

– Provide access to tools (visualizations or otherwise) enabling access to the results of the experiments

Page 26: Presentation

17 May 2006

Data Provider Interface

• The RPC must define interfaces that allow acceptance of data streams coming from data providers.

Page 27: Presentation

17 May 2006

Remote Resources Interface

• The RPC must define interfaces for invoking Grid services such as allocating and monitoring remote resources, accepting notifications about status changes (i.e., a job has completed), and data transfers between RPC node and remote resources, as well as data transfers between remote resources.

• Defined interfaces must support delegation of user credentials to satisfy the access control requirements and policies of the remote resources.

Page 28: Presentation

17 May 2006

The End

Backup slides follow

Page 29: Presentation

17 May 2006

The baseline system. This four-tier architecture follows OGSA recommendations

Page 30: Presentation

17 May 2006

Evaluations leading to new understanding & ideas for ISS

MyRPC LIS

Functional computational capabilities of the RPC system

IGE

•Authorization•Authentication•Notification•Monitoring•Workflow•Security

•ESMF•GCMD•THREDDS•ESML•Ontology•Query

•MyRPC•Host environment•GPIR•Execution description•Application description•Grid enabled OGC Services

WorldWinds

Page 31: Presentation

17 May 2006

RPC Portal

MyRPC

GCMD

Service oriented architecture for Computational RPC Node[based on NSF LEAD (Drogemeier et. al., 2006)]

WRF, HSPFLIS, RAMS

DAACsCLASS

Evaluation

ESMF, GEOLEM

OGC Services

Page 32: Presentation

17 May 2006

CRPN

WRF ESMF

IGE

GCMD

Systems framework for CRPN, consisting of interacting subsystems in thesecure and stable RPC computational grid

[based on NSF LEAD (Drogemeier et. al., 2006)]

MyRPC workspace

LIS

WorldWinds