51
1 Data Grids for HPC: Geographical Information System Grids Marlon Pierce Geoffrey Fox Indiana University December 7 2004 Internet Seminar

1 Data Grids for HPC: Geographical Information System Grids Marlon Pierce Geoffrey Fox Indiana University December 7 2004 Internet Seminar

Embed Size (px)

Citation preview

Page 1: 1 Data Grids for HPC: Geographical Information System Grids Marlon Pierce Geoffrey Fox Indiana University December 7 2004 Internet Seminar

11

Data Grids for HPC: Geographical Information

System Grids

Marlon PierceGeoffrey Fox

Indiana UniversityDecember 7 2004Internet Seminar

Page 2: 1 Data Grids for HPC: Geographical Information System Grids Marlon Pierce Geoffrey Fox Indiana University December 7 2004 Internet Seminar

22

Overview from Previous Lectures

Page 3: 1 Data Grids for HPC: Geographical Information System Grids Marlon Pierce Geoffrey Fox Indiana University December 7 2004 Internet Seminar

33

Parallel Computing Parallel processing is built on breaking problems up

into parts and simulating each part on a separate computer node

There are several ways of expressing this breakup into parts with Software: • Message Passing as in MPI or• OpenMP model for annotating traditional languages• Explicitly parallel languages like High Performance Fortran

And several computer architectures designed to support this breakup• Distributed Memory with or without custom interconnect• Shared Memory with or without good cache• Vectors with usually good memory bandwidth

Page 4: 1 Data Grids for HPC: Geographical Information System Grids Marlon Pierce Geoffrey Fox Indiana University December 7 2004 Internet Seminar

44

What are Web Services? Web Services are distributed computer programs that

can be in any language (Fortran .. Java .. Perl .. Python) The simplest implementations involve XML messages

(SOAP) and programs written in net friendly languages like Java and Python

Here is a typical e-commerce use?

Security Catalog

PaymentCredit Card

WarehouseshippingWSDL interfaces

WSDL interfaces

Page 5: 1 Data Grids for HPC: Geographical Information System Grids Marlon Pierce Geoffrey Fox Indiana University December 7 2004 Internet Seminar

55

What Is the Connection? Both MPI and Web Services rely upon messaging to

interact. But the difference is in speed of message transmission

• MPI useful for microsecond communication speeds. Clusters, traditional parallel computing

• Web Services communicate with Internet speeds Millisecond communication times at best.

This implies that we have (at least) a two-level programming model.• Level 1: MPI within science applications on clusters and

HPC.

• Level 2: Programming between science applications.

Page 6: 1 Data Grids for HPC: Geographical Information System Grids Marlon Pierce Geoffrey Fox Indiana University December 7 2004 Internet Seminar

66

Two-level Programming I The Web Service (Grid) paradigm implicitly assumes a two-level

Programming Model We make a Service (same as a “distributed object” or “computer

program” running on a remote computer) using conventional technologies• C++ Java or Fortran Monte Carlo module perhaps running with MPI on

a parallel machine

• Data streaming from a sensor or Satellite

• Specialized (JDBC) database access

Such services accept and produce data from other services, files and databases

The Grid is used to coordinate such services assuming we have solved problem of programming the service

Service Data

Page 7: 1 Data Grids for HPC: Geographical Information System Grids Marlon Pierce Geoffrey Fox Indiana University December 7 2004 Internet Seminar

77

Two-level Programming II The Grid is discussing the composition of distributed

services with the runtime interfaces to Grid as opposed to UNIX pipes/data streams

Familiar from use of UNIX Shell, PERL or Python scripts to produce real applications from core programs

Such interpretative environments are the single processor analog of Grid Programming

Some projects like GrADS from Rice University are looking at integration between service and composition levels but dominant effort looks at each level separately

Service1 Service2

Service3 Service4

Page 8: 1 Data Grids for HPC: Geographical Information System Grids Marlon Pierce Geoffrey Fox Indiana University December 7 2004 Internet Seminar

88

3 Layer Programming Model

Application(level 1 Programming)

Application Semantics (Metadata, Ontology)Level 2 “Programming”

Basic Web Service Infrastructure

Web Service 1

Workflow (level 3) Programming BPEL

WS 2 WS 3 WS 4

MPI Fortran C++ etc.

Semantic Web

Semantic Web adds a another layer between workflow andServices representing traditional applications

Page 9: 1 Data Grids for HPC: Geographical Information System Grids Marlon Pierce Geoffrey Fox Indiana University December 7 2004 Internet Seminar

99

Data and Science Applications Two- (or three-) level programming applies to all

applications. Typically we need to bind together HPC and non-HPC

parts.• How do you provide data to your application?• How do you share data between applications?• How do you communicate results to analysis and visualization

programs? This is particularly important as the size and quality of

observational data is growing rapidly. Q: How do you easily bind together science apps and

remote data sources?• A: Web Services (and Grids) provide the unifying

architecture.

Page 10: 1 Data Grids for HPC: Geographical Information System Grids Marlon Pierce Geoffrey Fox Indiana University December 7 2004 Internet Seminar

1010

Grid Libraries Programming the Grid has many similarities with

conventional languages• In HPSearch you use similar Scripting languages

Grids are particularly good at supporting user interfaces as the browser is a particular service• Portal technology important “gift” of Grids for HPC

Most promising (and not exploited often) is building Grid “Libraries” which are collections of services which can be re-used in several applications• Mastercard service is a typical business Grid library

• Visualization, Sensor processing, GIS are naturally distributed components of a HPC application that can be developed as Grid libraries

Page 11: 1 Data Grids for HPC: Geographical Information System Grids Marlon Pierce Geoffrey Fox Indiana University December 7 2004 Internet Seminar

1111

Data Grids for HPC

Page 12: 1 Data Grids for HPC: Geographical Information System Grids Marlon Pierce Geoffrey Fox Indiana University December 7 2004 Internet Seminar

1212

Data Deluged Science In the past, we worried about data in the form of parallel I/O or

MPI-IO, but we didn’t consider it as an enabler of new algorithms and new ways of computing

Data assimilation was not central to HPCC ASC set up because didn’t want test data! Now particle physics will get 100 petabytes from CERN

• Nuclear physics (Jefferson Lab) in same situation• Use around 30,000 CPU’s simultaneously 24X7

Weather forecasting, climate, solid earth (EarthScope, Eath Systems Grid, GEON)• We discussed our project SERVOGrid in October 2004 lecture.

Bioinformatics curated databases (Biocomplexity only 1000’s of data points at present)

Virtual Observatory and SkyServer in Astronomy Environmental Sensor nets

Page 13: 1 Data Grids for HPC: Geographical Information System Grids Marlon Pierce Geoffrey Fox Indiana University December 7 2004 Internet Seminar

1313

Data Deluge @ Home In 2003, all of Marion County, IN (including Indianapolis) was surveyed

using Light Detection and Ranging (LiDAR) sensing. GRW, Inc flew a Cessna 337 airplane over the entire county to produce

digitized maps.• 1 point per square meter.• 495 square miles total.

Can be used to create high resolution contour maps…. But what do you do with all of the data?

•LiDAR data represents 3 orders of magnitude increase in data resolution over what is used today in conventional flood prediction (B. Engles, Purdue).

•Flood modeling codes thus must become HPC codes to handle the size of newly available data.

Page 14: 1 Data Grids for HPC: Geographical Information System Grids Marlon Pierce Geoffrey Fox Indiana University December 7 2004 Internet Seminar

1414

Example Data Grid: The Earth System Grid

U.S. DOE SciDAC funded R&D effort Build an “Earth System Grid” that enables

management, discovery, distributed access, processing, & analysis of distributed terascale climate research data

A “Collaboratory Pilot Project” Build upon ESG-I, Globus Toolkit, DataGrid

technologies, and deploy Potential broad application to other areas

http://www.earthsystemgrid.org

Page 15: 1 Data Grids for HPC: Geographical Information System Grids Marlon Pierce Geoffrey Fox Indiana University December 7 2004 Internet Seminar

1515

ESG Data Sets Community Climate Systems Model data

• This is data that is compatible with the National Center for Atmospheric Research (NCAR) global climate model, CCSM

Couples atmospheric, land surface, ocean, and sea ice models.

• This is a US government model for climate modeling and prediction.

• http://www.ccsm.ucar.edu/ Parallel Climate Model data

• Data compatible with extensions to CCSM.

• Uses same atmospheric model but different ocean and sea ice models.

Page 16: 1 Data Grids for HPC: Geographical Information System Grids Marlon Pierce Geoffrey Fox Indiana University December 7 2004 Internet Seminar

1616

ESG Challenges By the end of 2003, DOE-sponsored climate change research had

produced 100 TB of scientific data.• Stored across several DOE sites and NCAR.

Consequence of HPC, will only escalate as models can simulate global weather patterns at increasingly fine resolution.

Basic problems in data management• What is in the data files (metadata)?

• How were data created and by whom (provenance)?

• How data be stored and moved

between sites efficiently?

• How can data be delivered to

scientific community? ESG web portal

Page 17: 1 Data Grids for HPC: Geographical Information System Grids Marlon Pierce Geoffrey Fox Indiana University December 7 2004 Internet Seminar

1717

Page 18: 1 Data Grids for HPC: Geographical Information System Grids Marlon Pierce Geoffrey Fox Indiana University December 7 2004 Internet Seminar

1818

ESG Data Sets Community Climate Systems Model data

• This is data that is compatible with the National Center for Atmospheric Research (NCAR) global climate model, CCSM

Couples atmospheric, land surface, ocean, and sea ice models.

• This is the US government’s workhorse code for climate modeling and prediction.

• http://www.ccsm.ucar.edu/ Parallel Climate Model data

• Data compatible with extensions to CCSM.

• Uses same atmospheric model but different ocean and sea ice models.

Page 19: 1 Data Grids for HPC: Geographical Information System Grids Marlon Pierce Geoffrey Fox Indiana University December 7 2004 Internet Seminar

1919

Example Data Grid: GEON Project Goal: Prototype interpretive environments of the future

in Earth Sciences. Use advanced information technologies to facilitate

collaborative, inter-disciplinary science efforts. Scientists will be able to discover data, tools, and models via

portals, using advanced, semantics-based search engines and query tools, in a uniform authentication environment that provides controlled access to a wide range of resources. • A prototype “Semantic Grid”

A services-based environment facilitates creation of scientific workflows that are executed in the distributed environment.

Advanced GIS mapping, 3D, and 4D visualization tools allow scientists to interact with the data.

www.geongrid.org

Page 20: 1 Data Grids for HPC: Geographical Information System Grids Marlon Pierce Geoffrey Fox Indiana University December 7 2004 Internet Seminar

2020

GEON Grid Application: SYNSEIS

• SYNSEIS is a grid application that provides an opportunity for seismologists and other earth science partners to compute and study 3D seismic records to understand complex subsurface structures.

• SYNSEIS is built using a service-based architecture. While it provides users an easy-to-use GUI to access data, models and compute resources, it also provides “connectors” (APIs) for developers should they choose to utilize any of its components in other applications.

Page 21: 1 Data Grids for HPC: Geographical Information System Grids Marlon Pierce Geoffrey Fox Indiana University December 7 2004 Internet Seminar

2121

GASSGRAM

GridFTPGSI

SYNSEIS Architecture

SYNSEIS(FLASH GUI)

SYNSEIS(FLASH GUI)

IRISDMC

TeraGridNCSA

SynSeisEngine

TeraGridSDSC

LLNLMCR

GEON Portal

Cornell Map Server

CrustalModels

CrustalModels

CrustalModels

Corba

Web service

Web service

SOAP

Web

ser

vice

SO

AP

Web serviceSOAP

Waveform and seismic event catalogs: www.iris.edu

Page 22: 1 Data Grids for HPC: Geographical Information System Grids Marlon Pierce Geoffrey Fox Indiana University December 7 2004 Internet Seminar

2222

Page 23: 1 Data Grids for HPC: Geographical Information System Grids Marlon Pierce Geoffrey Fox Indiana University December 7 2004 Internet Seminar

2323

GEON SYNSEIS Conclusions Using the Grid technology, GEON team was able to bring an extremely

complex and cumbersome seismic data analysis procedure to a level that can be used by anyone efficiently and effectively, hence SYNSEIS is a first step towards faster discovery.

Democratization of community resources allows not only GEON researchers but also external community members to access state-of-the-art software and tools.

Although the tool is developed for GEON applications, it holds a tremendous potential for projects like EarthScope. SYNSEIS can be used by EarthScope researchers to conduct timely analysis of collected data

SYNSEIS also has a high potential to be used in educational environments allowing students to experiment with data and make their own earthquakes.

SYNSEIS has allowed us to practice building distributed data and computational resources.

Page 24: 1 Data Grids for HPC: Geographical Information System Grids Marlon Pierce Geoffrey Fox Indiana University December 7 2004 Internet Seminar

2424

SERVOGrid Example: GeoFEST SERVOGrid was discussed in more detail in the October lecture

of this series.• But worth another mention in this context.

GeoFEST is • Geophysical Finite Element Simulation Tool• GeoFEST solves solid mechanics forward models with these

characteristics: 2-D or 3-D irregular domains 1-D, 2-D or 3-D displacement fields Static elastic or time-evolving viscoelastic problems Driven by faults, boundary conditions or distributed loads

• GeoFEST runs in a variety of computing environments: UNIX workstations (including LINUX, Mac OS X, etc.) Web portal environment Parallel cluster/supercomputer environment

GeoFEST output can be compared directly with current and future InSAR satellite data.

Page 25: 1 Data Grids for HPC: Geographical Information System Grids Marlon Pierce Geoffrey Fox Indiana University December 7 2004 Internet Seminar

2525

GeoFEST and Data Grids GeoFEST works directly with Earth fault data. Luckily for us, there is a Web Service data source for earth

faults in California• QuakeTables: accessible for human use through

http://infogroup.usc.edu:8080/public.html http://complexity.ucs.indiana.edu:8282/jetspeed/index.jsp

• USC, UC-Irvine, and IU designed and built this as part of the SERVO project.

But GeoFEST needs programmatic access to the fault data• Users design layer and fault geometry problems and create finite element

meshes through Web portal interface. Like GEON, we use portlets. Portlets are a standard way to make Java-based (and other) portals

out of reusable components.• Must then pass this information to GeoFEST as an input file.• GeoFEST on some remote host from the data.

Page 26: 1 Data Grids for HPC: Geographical Information System Grids Marlon Pierce Geoffrey Fox Indiana University December 7 2004 Internet Seminar

2626

User Interface Server

DB Service 1

JDBC

DB

Job Sub/Mon And FileServices

Operating andQueuing Systems

WSDLWSDL

Browser Interface

WSDL

WSDL

WSDLWSDL WSDL

Viz Service

WSDL

Host 1 Host 2 Host 3

IDLGMT

SOAPSOAP

HTTP(S)

Page 27: 1 Data Grids for HPC: Geographical Information System Grids Marlon Pierce Geoffrey Fox Indiana University December 7 2004 Internet Seminar

2727

Page 28: 1 Data Grids for HPC: Geographical Information System Grids Marlon Pierce Geoffrey Fox Indiana University December 7 2004 Internet Seminar

2828

Page 29: 1 Data Grids for HPC: Geographical Information System Grids Marlon Pierce Geoffrey Fox Indiana University December 7 2004 Internet Seminar

29

a

Topography1 km

Stress Change

Earthquakes

PBO

Site-specific IrregularScalar Measurements Constellations for Plate

Boundary-Scale Vector Measurements

aaIce Sheets

Volcanoes

Long Valley, CA

Northridge, CA

Hector Mine, CA

Greenland

Page 30: 1 Data Grids for HPC: Geographical Information System Grids Marlon Pierce Geoffrey Fox Indiana University December 7 2004 Internet Seminar

3030

HPCSimulation

DataFilter

Data FilterD

ata

Filt

er

Data

Filter

Data

Filter

Distributed Filters massage dataFor simulation

Other

Grid

and W

eb

Servi

ces

AnalysisControl

Visualize

Data Deluged ScienceComputing Architecture

Grid

OGSA-DAIGrid Services

Grid Data Assimilation

Page 31: 1 Data Grids for HPC: Geographical Information System Grids Marlon Pierce Geoffrey Fox Indiana University December 7 2004 Internet Seminar

3131

Data Assimilation Data assimilation implies one is solving some optimization

problem which might have Kalman Filter like structure

Due to data deluge, one will become more and more dominated by the data (Nobs much larger than number of simulation points).

Natural approach is to form for each local (position, time) patch the “important” data combinations so that optimization doesn’t waste time on large error or insensitive data.

Data reduction done in natural distributed fashion NOT on HPC machine as distributed computing most cost effective if calculations essentially independent • Filter functions must be transmitted from HPC machine

2 2

1

min ( , ) _obsN

i iTheoretical Unknownsi

Data position time Simulated Value Error

Page 32: 1 Data Grids for HPC: Geographical Information System Grids Marlon Pierce Geoffrey Fox Indiana University December 7 2004 Internet Seminar

3232

Distributed Filtering

HPC Machine

Distributed Machine

Data FilterNobslocal patch 1

Nfilteredlocal patch 1

Data FilterNobslocal patch 2

Nfilteredlocal patch 2

GeographicallyDistributedSensor patches

Nobslocal patch >> Nfiltered

local patch ≈ Number_of_Unknownslocal patch

Send needed FilterReceive filtered data

In simplest approach, filtered data gotten by linear transformations on original data based on Singular Value Decomposition of Least squares matrix

Factorize Matrixto product oflocal patches

Page 33: 1 Data Grids for HPC: Geographical Information System Grids Marlon Pierce Geoffrey Fox Indiana University December 7 2004 Internet Seminar

3333

Standards For Geographic Data Services

Page 34: 1 Data Grids for HPC: Geographical Information System Grids Marlon Pierce Geoffrey Fox Indiana University December 7 2004 Internet Seminar

3434

The Story So Far… HPC applications generate huge amounts of data.

• Constant problem for all HPC centers, including DOD MSRCs.• Managing scientific information about these applications is just as

important as storage technology. HPC applications use observational data as input.

• Projects like the ESG, GEON, and SERVO illustrate how HPC applications need to be coupled to data sources.

• Quantity of observational data is growing rapidly, opening fields for non-traditional HPC (LiDAR and flood modeling).

Huge amounts of new data potentially drive new HPC applications (LiDAR->Flood modeling)

Earth sciences are a focus of our examples, but really, many applications have data sources that are geographically described.• Weather prediction is an obvious example.

Thus we see the importance of coupling GIS data grid services to HPC applications for both data access and visualization/interpretation.

Page 35: 1 Data Grids for HPC: Geographical Information System Grids Marlon Pierce Geoffrey Fox Indiana University December 7 2004 Internet Seminar

3535

What is GIS? Geographic Information Systems

• ESRI: commercial company with many popular GIS products.

• Open Geospatial Consortium (formerly OpenGIS Consortium).

• We will focus on OGC since they define open and interoperable standards.

What are the characteristics of a GIS system?• Need data models to represent information

• Need services for remotely accessing data.

• Need metadata for determining what is stored in the services.

Page 36: 1 Data Grids for HPC: Geographical Information System Grids Marlon Pierce Geoffrey Fox Indiana University December 7 2004 Internet Seminar

3636

GML: A Data Model For GIS GML 3.x is a interconnected suite of over 20 connected

XML schemas. GML is an abstract model for geography. With GML, you can encode

• Features: abstract representations of map entities.

• Geometry: encode abstractly how to represent a feature pictorially.

• Coordinate reference systems

• Topology

• Time, units of measure

• Observation data.

Page 37: 1 Data Grids for HPC: Geographical Information System Grids Marlon Pierce Geoffrey Fox Indiana University December 7 2004 Internet Seminar

3737

Example Use of GML The SCIGN (Southern

California Integrated GPS Network) maintains online catalogs of GPS stations.

Collective data for each site is made available through online catalogs.• Using various text formats.

This is not suitable for processing, but GML is.

GML can be used to describe GPS using Feature.xsd schema, with values encoded at GPS observations.

www.crisisgrid.org.

Page 38: 1 Data Grids for HPC: Geographical Information System Grids Marlon Pierce Geoffrey Fox Indiana University December 7 2004 Internet Seminar

3838

Open GIS Services GML abstract data models can encode data but you need

services to interact with the remote data. Some example OGC services include

• Web Feature Service: for retrieving GML encode features, like faults, roads, county boundaries, GPS station locations,….

• Web Map Service: for creating maps out of Web Features

• Sensor Grid Services: for working with streaming, time-stamped data.

Problems with OGC services• Not (yet) Web Service compliant

“Pre” web service, no SOAP or WSDL Use instead HTTP GET/POST conventions.

• Often define general Web Service services as specialized standards Information services Notification services in sensor grids

Page 39: 1 Data Grids for HPC: Geographical Information System Grids Marlon Pierce Geoffrey Fox Indiana University December 7 2004 Internet Seminar

3939

Anatomy of WFS (G. Aydin) WFS provides three major services as described in OGC specification:

• GetCapabilities: The clients (WMS servers or users) starts with requesting a document from WFS which describes it’s abilities. When a getCapabilities request arrives, the server dynamically creates a capabilities document and returns this.

• This is OGC’s formalization of metadata, so important to GEON, ESG, etc.• DescribeFeatureType: After the client receives the capabilities document he/she

can request a more detailed description for any of the features listed in the WFS capabilities document.

• The WFS returns an XML schema that describes the requested feature.• Metadata about a specific entry.

• GetFeature: The client can ask the WFS to return a particular portion of any feature data.

• GetFeature requests contain some property names of the feature and a Filter element to describe the query.

• The WFS extracts the query and bounding box from the filter and queries the feature databases.

• The results obtained from the DB query are converted the feature’s GML format and returned to the client as a FeatureCollection.

Page 40: 1 Data Grids for HPC: Geographical Information System Grids Marlon Pierce Geoffrey Fox Indiana University December 7 2004 Internet Seminar

4040

Example WFS Capability EntriesELEMENT NAME DESCRIPTION

Name A name the service provider assigns to the web feature service instance.

Title Human-readable title to briefly identify this server in menus.

Abstract Descriptive narrative for more information about the server.

Keyword Contains short words to aid catalog searching.

OnlineResource Defines the top-level HTTP URL of this service. Typically the URL of a "home page" for the service.

Fees Contains a text block indicating any fees imposed by the service provider for usage of the service or for data retrieved from the WFS. The keyword NONE is reserved to mean no fees.

AccessConstraints Text block describing any access constraints imposed by the service provider on the WFS or data

retrieved from that service. The keyword NONE is reserved to indicate no access constraints are imposed.

Page 41: 1 Data Grids for HPC: Geographical Information System Grids Marlon Pierce Geoffrey Fox Indiana University December 7 2004 Internet Seminar

4141

Sample Feature- CA Fault Lines<gml:featureMember> <fault> <name>Northridge2</name> <segment>Northridge2</segment> <author>Wald D. J.</author> <gml:lineStringProperty> <gml:LineString

srsName="null"> <gml:coordinates>

-118.72,34.243 -118.591,34.176 </gml:coordinates>

</gml:LineString> </gml:lineStringProperty> </fault> </gml:featureMember>

After receiving getFeature request, WFS decodes this request, creates a DB query from it and queries the database.

WFS then retrieves the features from the database and converts them into GML documents.

Each feature instance is wrapped as a gml:featureMember element.

WFS returns a wfs:FeatureCollection document which includes all featureMembers returned in the query result.

Page 42: 1 Data Grids for HPC: Geographical Information System Grids Marlon Pierce Geoffrey Fox Indiana University December 7 2004 Internet Seminar

42

Railroads

RiversBridges

Interstate Highways

90

WFS Server

SQL Query

Railroads

[a-b]

SQ

L Q

uery

Riv

er [a

-d]

Brid

ge [1

-5]

SQL QueryHigway [12-18]

`

ClientWMS

GetFeature

FeatureCollection

Get

Feat

ure

Feat

ureC

olle

ctio

n

•A WFS can serve multiple feature types data.•WFS returns the results of GetFeature requests as GML documents (Feature Collections).•Clients may include other services as well as humans.

Page 43: 1 Data Grids for HPC: Geographical Information System Grids Marlon Pierce Geoffrey Fox Indiana University December 7 2004 Internet Seminar

43

Schematic Interactions Between GIS Services

WMS

IS

WFS

WFS

WFScalifornia fault data

@complexity

california boundary data

@gf1

california river data

@gf1

ClientClient

Page 44: 1 Data Grids for HPC: Geographical Information System Grids Marlon Pierce Geoffrey Fox Indiana University December 7 2004 Internet Seminar

4444

Defining IS The central IS block in the proceeding diagram represents

nebulous “information services.” Information services are needed to bind together various GIS

and other services.• What are their URLs? How do you interact with them (WSDL)? What

do they do (capabilities)? The OGC defines information services, but they are specialized

to GIS.• Web Catalogue Service: state appears uncertain.• Web Registry Service: a common mechanism to classify, register,

describe, search, maintain and access information about OGC Web resources.

But if they adopt Web Service standards, they get Web Service information system solutions for free.• IS is a more general problem than just GIS.

Page 45: 1 Data Grids for HPC: Geographical Information System Grids Marlon Pierce Geoffrey Fox Indiana University December 7 2004 Internet Seminar

4545

Universal Description, Discovery and Integration

UDDI is the standard for building service registries and for describing their contents.• UDDI is part of the WS-I core: http://www.ws-i.org/

But no one seems to like it… Centralized solution

• Single point of failure Poor discovery model

• No uniform way of querying about services, service interfaces and classifications.

• Limited query capabilities: search for services restricted to WS name and its classification

Stale data in registries• Out-of-date service documents in UDDI registries. • Need a leasing system • Registry entries need to be dynamically updated

Page 46: 1 Data Grids for HPC: Geographical Information System Grids Marlon Pierce Geoffrey Fox Indiana University December 7 2004 Internet Seminar

4646

UDDI Has Other Problems Many Web Services need to maintain the concept of

state between themselves during complicated interactions.

For example, for better performance, I may wish to cache maps in a Web Map Server instead of reconstructing it via calls to a Web Feature Service every time.

This is basically a glorified HTTP Cookie problem. We need a way to store this kind of volatile session state

data in light weight data.• UDDI==heavyweight.

So IS must support both registries and contexts.

Page 47: 1 Data Grids for HPC: Geographical Information System Grids Marlon Pierce Geoffrey Fox Indiana University December 7 2004 Internet Seminar

4747

GIS Service Registries Functional capabilities of a GIS service is defined in

“capabilities.xml” file An information service can gather metadata about functional

requirements of a GIS service • By processing the capabilities file in an automated fashion when a service

is registered• By having the service provider declare these capabilities when publishing

a service• Information System API introduce a library for XML Schema Processing

of different capability files UDDI with the geospatial focus of GIS Services

• Data layers (features) of a GIS service may have varying geospatial coverage

• UDDI registries do not natively support spatial queries.• We use existing geographic taxonomies such as QuadCode taxonomy to

associate service descriptions with spatial coverage.

Page 48: 1 Data Grids for HPC: Geographical Information System Grids Marlon Pierce Geoffrey Fox Indiana University December 7 2004 Internet Seminar

4848

WS-Context: Session State Service Repository of Context Information Allows for

• Sharing Context info Info related to a particular transaction in multiple Web

Service interactions

• Sharing data Data in multiple Web service interactions

Simply put, its a Distributed Variation of Shared Memory.

See http://www.arjuna.com/library/specs/ws_caf_1-0/WS-CTX.pdf

Page 49: 1 Data Grids for HPC: Geographical Information System Grids Marlon Pierce Geoffrey Fox Indiana University December 7 2004 Internet Seminar

4949

HTTP(S)

WSDL

DB

JDBC

UDDI Registry Service

UDDI-Replica II

WMS WMS WMSWSDL WSDL WSDL

WSDL

DB

JDBC

WS-ContextService

WS-Context II

WSDL

DB

JDBC

WS-ContextService

WS-Context I

WSDL

DB

JDBC

UDDI Registry Service

UDDI-Replica I

WSDL

DB

JDBC

UDDI Registry Service

UDDI-Replica III

WS Context Replica Group

UDDI Replica Group

Information ServiceWSDL

WSDL

WSDL WSDLWSDL WSDL

SOAP SOAP SOAP SOAP

An Information Service with both WS-Registry and WS-Context capability

Page 50: 1 Data Grids for HPC: Geographical Information System Grids Marlon Pierce Geoffrey Fox Indiana University December 7 2004 Internet Seminar

5050

GIS FTHPIS Implementation Status (M. S. Aktas)

UDDI v.3E implementation metadata extension [completed] Processing geographic taxonomies to enable UDDI support spatial queries [completed] WSDL interface to UDDI v.3 [completed] WSDL interface to WS-Context 1.0 Monitoring scheme

• Leasing [completed]• Heart-beat

WS-Discovery implementation metadata extension [completed] WSDL interface to Information Service Message dissemination via Soap Handler Environment

Caching mechanism Replication mechanism

Page 51: 1 Data Grids for HPC: Geographical Information System Grids Marlon Pierce Geoffrey Fox Indiana University December 7 2004 Internet Seminar

5151

Concluding Remarks High Performance Computing will be increasingly data

driven. High volumes of observational data will push many

applications into the realms of HPC. There must be an overarching architecture to integrate

data sources, HPC applications, visualization applications, users.• Web Service architectures provide this.• Use to build Grid libraries

Large amounts of data related to the earth’s surface. GIS data and service standards need to be integrated

into HPC applications.