59
1 e-Science e-Business e-Government and their Technologies Introduction Bryan Carpenter, Geoffrey Fox, Marlon Pierce Pervasive Technology Laboratories Indiana University Bloomington IN 47404 January 12 2004 [email protected] [email protected] [email protected] http://www.grid2004.org/spring2004

1 e-Science e-Business e-Government and their Technologies Introduction Bryan Carpenter, Geoffrey Fox, Marlon Pierce Pervasive Technology Laboratories

Embed Size (px)

Citation preview

Page 1: 1 e-Science e-Business e-Government and their Technologies Introduction Bryan Carpenter, Geoffrey Fox, Marlon Pierce Pervasive Technology Laboratories

11

e-Science e-Business e-Government and their

TechnologiesIntroduction

Bryan Carpenter, Geoffrey Fox, Marlon PiercePervasive Technology Laboratories

Indiana University Bloomington IN 47404January 12 2004

[email protected]@indiana.edu

[email protected] http://www.grid2004.org/spring2004

Page 2: 1 e-Science e-Business e-Government and their Technologies Introduction Bryan Carpenter, Geoffrey Fox, Marlon Pierce Pervasive Technology Laboratories

22

Class Structure Grading based on mixture of homework and a single

final project• Up to 2 students can collaborate together on final project

Homework 70% Final Project 30% grade NO midterm or final Homework will mainly be programming based but

there may be reports either in final or one or two homework assignments• Reports will use Internet, book and “Gap Analysis”

Page 3: 1 e-Science e-Business e-Government and their Technologies Introduction Bryan Carpenter, Geoffrey Fox, Marlon Pierce Pervasive Technology Laboratories

33

What are we doing This is a semester-long course on Grids (viewed as technologies

and infrastructure) and the application – mainly to science but also to business and government

We will assume a basic knowledge of the Java language and then interweave 6 topic areas – first four cover technologies that will be used by students

1) Advanced Java: including networking, Java Server Pages and perhaps servlets

2) XML: Specification, Tools, Linkage to Java 3) Web Services: Basic Ideas, WSDL, Axis and Tomcat 4)Grid Systems: GT3/Cogkit, Gateway, XSOAP, Portlet 5) Advanced Technology Discussions: CORBA as istory, OGSA-

DAI, security, Semantic Grid, Workflow 6) Applications: Bioinformatics, Particle Physics, Engineering,

Crises, Computing-on-demand Grid, Earth Science

Page 4: 1 e-Science e-Business e-Government and their Technologies Introduction Bryan Carpenter, Geoffrey Fox, Marlon Pierce Pervasive Technology Laboratories

44

Course Topics 1 and 2 : Background/Core

Advanced Java Programming• We will assume basic Java programming proficiency• We will cover Java client/server, three-tiered and network

programming.• Ancillary but interesting Java topics to be covered include

Apache Ant, XML-Beans, and Java Message Service XML and XML Schema

• We will provide introductory material.• Necessary to understand Web Service standards• Examples include RDF (semantic web) and SOAP (Web

services) XML Tools

• XML Databases (Xindice, Sleepycat)• Search: XPath, XQuery

Page 5: 1 e-Science e-Business e-Government and their Technologies Introduction Bryan Carpenter, Geoffrey Fox, Marlon Pierce Pervasive Technology Laboratories

55

Course Topics 3 and 4: Web and Grid Services

Overview Material• Grid and Web Service Architectures

Basic Web Service Standards• WSDL, SOAP: structure and definitions• Building services in Java: Apache Axis

Advanced Web Services: Emerging capabilities• WS-ReliableMessaging, WS-Security, WS-Transaction

Computational Grids• Globus Toolkit 2• Java COG Kit for Globus programming

Grids Meet Web Services• Open Grid Service Architecture/Infrastructure• Implementations: GSX from Indiana University

The Semantic Grid: Information Models for Describing Resources• RDF, DAML-OIL, and OWL

Page 6: 1 e-Science e-Business e-Government and their Technologies Introduction Bryan Carpenter, Geoffrey Fox, Marlon Pierce Pervasive Technology Laboratories

66

Grid Computing: Making The Global Infrastructure a Reality

Based on work done in preparing book edited withFran Berman andAnthony J.G. Hey,

ISBN: 0-470-85319-0 Hardcover 1080 Pages Published March 2003 http://www.grid2002.org

Page 7: 1 e-Science e-Business e-Government and their Technologies Introduction Bryan Carpenter, Geoffrey Fox, Marlon Pierce Pervasive Technology Laboratories

77

Other See the webcast in an Oracle technology series

http://webevents.broadcast.com/techtarget/Oracle/100303/index.asp?loc=10

See also the “Gap Analysis”http://grids.ucs.indiana.edu/ptliupages/publications/GapAnalysis30June03v2.pdf

• We can send you nicely printed versions of this

• End of this is a good collection of references and it gives both a general survey of current Grids and specific examples from UK

Appendix with more details is:http://grids.ucs.indiana.edu/ptliupages/publications/Appendix30June03.pdf

See also GlobusWorld http://www.globusworld.org/ and the Grid Forum http://www.gridforum.org

Page 8: 1 e-Science e-Business e-Government and their Technologies Introduction Bryan Carpenter, Geoffrey Fox, Marlon Pierce Pervasive Technology Laboratories

88

e-moreorlessanything and the Grid e-Business captures an emerging view of corporations as

dynamic virtual organizations linking employees, customers and stakeholders across the world. • The growing use of outsourcing is one example

e-Science is the similar vision for scientific research with international participation in large accelerators, satellites or distributed gene analyses.

The Grid integrates the best of the Web, traditional enterprise software, high performance computing and Peer-to-peer systems to provide the information technology e-infrastructure for e-moreorlessanything.

A deluge of data of unprecedented and inevitable size must be managed and understood.

People, computers, data and instruments must be linked. On demand assignment of experts, computers, networks and

storage resources must be supported

Page 9: 1 e-Science e-Business e-Government and their Technologies Introduction Bryan Carpenter, Geoffrey Fox, Marlon Pierce Pervasive Technology Laboratories

99

So what is a Grid? Supporting human decision making with a network of at least

four large computers, perhaps six or eight small computers, and a great assortment of disc files and magnetic tape units - not to mention remote consoles and teletype stations - all churning away. (Licklider 1960)

Coordinated resource sharing and problem solving in dynamic multi-institutional virtual organizations

Infrastructure that will provide us with the ability to dynamically link together resources as an ensemble to support the execution of large-scale, resource-intensive, and distributed applications.

Realizing thirty year dream of science fiction writers that have spun yarns featuring worldwide networks of interconnected computers that behave as a single entity.

Page 10: 1 e-Science e-Business e-Government and their Technologies Introduction Bryan Carpenter, Geoffrey Fox, Marlon Pierce Pervasive Technology Laboratories

1010

What is a High Performance Computer? We might wish to consider three classes of multi-node computers 1) Classic MPP with microsecond latency and scalable internode

bandwidth (tcomm/tcalc ~ 10 or so) 2) Classic Cluster which can vary from configurations like 1) to 3)

but typically have millisecond latency and modest bandwidth 3) Classic Grid or distributed systems of computers around the

network• Latencies of inter-node communication – 100’s of milliseconds

but can have good bandwidth All have same peak CPU performance but synchronization costs

increase as one goes from 1) to 3) Cost of system (dollars per gigaflop) decreases by factors of 2 at

each step from 1) to 2) to 3) One should NOT use classic MPP if class 2) or 3) suffices unless

some security or data issues dominates over cost-performance One should not use a Grid as a true parallel computer – it can

link parallel computers together for convenient access etc.

Page 11: 1 e-Science e-Business e-Government and their Technologies Introduction Bryan Carpenter, Geoffrey Fox, Marlon Pierce Pervasive Technology Laboratories

1111

e-Science e-Science is about global collaboration in key areas of

science, and the next generation of infrastructure that will enable it. This is a major UK Program

e-Science reflects growing importance of international laboratories, satellites and sensors and their integrated analysis by distributed teams

CyberInfrastructure is the analogous US initiative

QuickTime™ and a decompressor

are needed to see this picture.

QuickTime™ and a decompressor

are needed to see this picture.

IMAGING INSTRUMENTS

COMPUTATIONALRESOURCES

LARGE-SCALE DATABASES

DATA ACQUISITION ,ANALYSIS

ADVANCEDVISUALIZATION

Grid Technology supports e-Science and CyberInfrastructureIt is software (middeleware) built on top of networks

Page 12: 1 e-Science e-Business e-Government and their Technologies Introduction Bryan Carpenter, Geoffrey Fox, Marlon Pierce Pervasive Technology Laboratories

1212

Global Terabit Research Network

The Grid software and resources run on top of high performance global networks

Page 13: 1 e-Science e-Business e-Government and their Technologies Introduction Bryan Carpenter, Geoffrey Fox, Marlon Pierce Pervasive Technology Laboratories

1313USA Network

Page 14: 1 e-Science e-Business e-Government and their Technologies Introduction Bryan Carpenter, Geoffrey Fox, Marlon Pierce Pervasive Technology Laboratories

1414

Terabit Networks Network performance will increase faster than Moore’s

law – partly because optical fiber has almost unlimited bandwidth and partly because there are many old networks to be replaced

Home dial-ups (56kbit) DSL/Cable Modem (2 megabits/sec) FTTP (Fiber to the Premise at gigabit performance)

2006 Goal of Global Terabit Research NetworkInternational: National Backbone: Organization;: Optical Desktop: Copper Desktop is1000:1000:100:10:1 Gigabit/sec

Page 15: 1 e-Science e-Business e-Government and their Technologies Introduction Bryan Carpenter, Geoffrey Fox, Marlon Pierce Pervasive Technology Laboratories

1515

e-Business and (Virtual) Organizations Enterprise Grid supports information system for an

organization; includes “university computer center”, “(digital) library”, sales, marketing, manufacturing …

Outsourcing Grid links different parts of an enterprise together (Gridsourcing)• Manufacturing plants with designers• Animators with electronic game or film designers and

producers• Coaches with aspiring players (e-NCAA or e-NFL etc.)

Customer Grid links businesses and their customers as in many web sites such as amazon.com

e-Multimedia can use secure peer-to-peer Grids to link creators, distributors and consumers of digital music, games and films respecting rights

Distance education Grid links teacher at one place, students all over the place, mentors and graders; shared curriculum, homework, live classes …

Page 16: 1 e-Science e-Business e-Government and their Technologies Introduction Bryan Carpenter, Geoffrey Fox, Marlon Pierce Pervasive Technology Laboratories

1616

e-Defense and e-Crisis Grids support Command and Control and provide

Global Situational Awareness • Link commanders and frontline troops to themselves and to

archival and real-time data; link to what-if simulations • Dynamic heterogeneous wired and wireless networks• Security and fault tolerance essential

System of Systems; Grid of Grids• The command and information infrastructure of each ship is

a Grid; each fleet is linked together by a Grid; the President is informed by and informs the national defense Grid

• Grids must be heterogeneous and federated Crisis Management and Response enabled by a Grid

linking sensors, disaster managers, and first responders with decision support

Page 17: 1 e-Science e-Business e-Government and their Technologies Introduction Bryan Carpenter, Geoffrey Fox, Marlon Pierce Pervasive Technology Laboratories

1717

Classes of Computing Grid Applications Running “Pleasing Parallel Jobs” as in United Devices,

Entropia (Desktop Grid) “cycle stealing systems” Can be managed (“inside” the enterprise as in Condor)

or more informal (as in SETI@Home) Computing-on-demand in Industry where jobs spawned

are perhaps very large (SAP, Oracle …) Support distributed file systems as in Legion (Avaki),

Globus with (web-enhanced) UNIX programming paradigm• Particle Physics will run some 30,000 simultaneous jobs this

way Pipelined applications linking data/instruments,

compute, visualization Seamless Access where Grid portals allow one to choose

one of multiple resources with a common interfaces

Page 18: 1 e-Science e-Business e-Government and their Technologies Introduction Bryan Carpenter, Geoffrey Fox, Marlon Pierce Pervasive Technology Laboratories

1818

Utility Computing An important business application of Grids is utility

computing Namely support a pool of computers to be assigned as

needed to take-up extra demand• Pool shared between multiple applications

One can say this application is common in academia where different simulations share resources while in industry we have • Web Servers• Financial Modeling • Data-mining• Simulation response to crisis like forest fire or

earthquake Architecture is “Farm of Grid Services” connected to

Internet not cluster of computers connected to each other

Page 19: 1 e-Science e-Business e-Government and their Technologies Introduction Bryan Carpenter, Geoffrey Fox, Marlon Pierce Pervasive Technology Laboratories

1919

Resources-on-demand Computing-on-demand uses dynamically assigned

(shared) pool of resources to support excess demand in flexible cost-effective fashion

Program AComputer

1

Program ZComputer

26

Program AComputer 27

Program ZComputer

52

Spares

PoolComputer

1

PoolComputer N

<52

Program A

Program Z

Static Assignment with redundancy

Dynamic on-demand Assignment

Page 20: 1 e-Science e-Business e-Government and their Technologies Introduction Bryan Carpenter, Geoffrey Fox, Marlon Pierce Pervasive Technology Laboratories

2020

Some Important Styles of Grids Computational Grids were origin of concepts and link

computers across the globe – high latency stops this from being used as parallel machine

Knowledge and Information Grids link sensors and information repositories as in Virtual Observatories or BioInformatics

• More detail on next slide Education Grids link teachers, learners, parents as a VO with

learning tools, distant lectures etc. e-Science Grids link multidisciplinary researchers across

laboratories and universities Community Grids focus on Grids involving large numbers of

peers rather than focusing on linking major resources – links Grid and Peer-to-peer network concepts

Semantic Grid links Grid, and AI community with Semantic web (ontology/meta-data enriched resources) and Agent concepts

Page 21: 1 e-Science e-Business e-Government and their Technologies Introduction Bryan Carpenter, Geoffrey Fox, Marlon Pierce Pervasive Technology Laboratories

2121

Information/Knowledge Grids Distributed (10’s to 1000’s) of data sources (instruments,

file systems, curated databases …) Data Deluge: 1 (now) to 100’s petabytes/year (2012)

• Moore’s law for Sensors Possible filters assigned dynamically (on-demand)

• Run image processing algorithm on telescope image• Run Gene sequencing algorithm on compiled data

Needs decision support front end with “what-if” simulations

Metadata (provenance) critical to annotate data

Integrate across experiments as in multi-wavelength astronomy

Data Deluge comes from pixels/year available

Page 22: 1 e-Science e-Business e-Government and their Technologies Introduction Bryan Carpenter, Geoffrey Fox, Marlon Pierce Pervasive Technology Laboratories

22222.4 Petabytes Today

Page 23: 1 e-Science e-Business e-Government and their Technologies Introduction Bryan Carpenter, Geoffrey Fox, Marlon Pierce Pervasive Technology Laboratories

23

Database Database

Closely Coupled Compute Nodes

Analysis and Visualization

RepositoriesFederated Databases

Sensor Nets Streaming Data

Loosely Coupled Filters

SERVOGrid for e-Geoscience

?DiscoveryServices

SERVOGrid – Solid Earth Research Virtual Observatory will link Australia, Japan, USA ……

Page 24: 1 e-Science e-Business e-Government and their Technologies Introduction Bryan Carpenter, Geoffrey Fox, Marlon Pierce Pervasive Technology Laboratories

2424

SERVOGrid Requirements Seamless Access to Data repositories and large scale

computers Integration of multiple data sources including sensors,

databases, file systems with analysis system• Including filtered OGSA-DAI (Grid database access)

Rich meta-data generation and access with SERVOGrid specific Schema extending openGIS (Geography as a Web service) standards and using Semantic Grid

Portals with component model for user interfaces and web control of all capabilities

Collaboration to support world-wide work Basic Grid tools: workflow and notification

Page 25: 1 e-Science e-Business e-Government and their Technologies Introduction Bryan Carpenter, Geoffrey Fox, Marlon Pierce Pervasive Technology Laboratories

2525

In flight data

Airline

Maintenance Centre

Ground Station

Global NetworkSuch as SITA

Internet, e-mail, pager

Engine Health (Data) Center

DAME

Rolls Royce and UK e-Science ProgramDistributed Aircraft Maintenance Environment

~ Gigabyte per aircraft perEngine per transatlantic flight

~5000 engines

Page 26: 1 e-Science e-Business e-Government and their Technologies Introduction Bryan Carpenter, Geoffrey Fox, Marlon Pierce Pervasive Technology Laboratories

2626

NASA Aerospace Engineering Grid

•Lift Capabilities•Drag Capabilities•Responsiveness

•Deflection capabilities•Responsiveness

•Thrust performance•Reverse Thrust performance•Responsiveness•Fuel Consumption

•Braking performance•Steering capabilities•Traction•Dampening capabilities

Crew Capabilities- accuracy- perception- stamina- re-action times- SOP’s

Engine Models

Airframe Models

Wing Models

Landing Gear Models

Stabilizer Models

Human Models

Whole system simulations are produced by couplingall of the sub-system simulations

It takes a distributed virtual organization to design, simulate and build a complex system like an aircraft

Page 27: 1 e-Science e-Business e-Government and their Technologies Introduction Bryan Carpenter, Geoffrey Fox, Marlon Pierce Pervasive Technology Laboratories

27

Virtual Observatory Astronomy GridIntegrate Experiments

Radio Far-Infrared Visible

Visible + X-ray

Dust Map

Galaxy Density Map

Page 28: 1 e-Science e-Business e-Government and their Technologies Introduction Bryan Carpenter, Geoffrey Fox, Marlon Pierce Pervasive Technology Laboratories

2828

e-Chemistry LaboratoryExperiments-on-demand

X-Raye-Lab

Analysis

Properties

Propertiese-Lab

SimulationVideo

Diffr

acto

mete

r

Globus

StructuresDatabase

Grid Resources

Grid-enabled Output Streams

Page 29: 1 e-Science e-Business e-Government and their Technologies Introduction Bryan Carpenter, Geoffrey Fox, Marlon Pierce Pervasive Technology Laboratories

2929

CERN LHC Data Analysis Grid

Page 30: 1 e-Science e-Business e-Government and their Technologies Introduction Bryan Carpenter, Geoffrey Fox, Marlon Pierce Pervasive Technology Laboratories

3030

Raw (HPC) Resources

Middleware

Database

PortalServices

SystemServices

SystemServices

SystemServices

Application Service

SystemServices

SystemServices

UserServices

“Core”Grid

Typical Grid Architecture

Each Blob is a Computer Program!

Page 31: 1 e-Science e-Business e-Government and their Technologies Introduction Bryan Carpenter, Geoffrey Fox, Marlon Pierce Pervasive Technology Laboratories

3131

Sources of Grid Technology Grids support distributed collaboratories or virtual

organizations integrating concepts from The Web Agents Distributed Objects (CORBA Java/Jini COM) Globus, Legion, Condor, NetSolve, Ninf and other High

Performance Computing activities Peer-to-peer Networks With perhaps the Web and P2P networks being the most

important for “Information Grids” and Globus for “Compute Grids”

Page 32: 1 e-Science e-Business e-Government and their Technologies Introduction Bryan Carpenter, Geoffrey Fox, Marlon Pierce Pervasive Technology Laboratories

3232

The Essence of Grid Technology? We will start from the Web view and assert that basic

paradigm is Meta-data rich Web Services communicating via

messages These have some basic support from some runtime

such as .NET, Jini (pure Java), Apache Tomcat+Axis (Web Service toolkit), Enterprise JavaBeans, WebSphere (IBM) or GT3 (Globus Toolkit 3)• These are the distributed equivalent of operating system

functions as in UNIX Shell

• Called Hosting Environment or platform W3C standard WSDL defines IDL (Interface

standard) for Web Services

Page 33: 1 e-Science e-Business e-Government and their Technologies Introduction Bryan Carpenter, Geoffrey Fox, Marlon Pierce Pervasive Technology Laboratories

3333

Meta-data Meta-data is usually thought of as “data about data” The Semantic Web is at its simplest considered as

adding meta-data to web pages For example, the hospital web-page has meta-data

telling you its location, phone-number, specialties which can be used to automate Google-style searches to allow planning of disease/accident treatment from web

Modern trend (Semantic Grid) is meta-data about web-services e.g. specify details of interface and useage• Such as that a bioinformatics service is free or bandwidth

input is of limited amount Provenance – history and ownership – of data very

important

Page 34: 1 e-Science e-Business e-Government and their Technologies Introduction Bryan Carpenter, Geoffrey Fox, Marlon Pierce Pervasive Technology Laboratories

3434

A typical Web Service In principle, services can be in any language (Fortran .. Java ..

Perl .. Python) and the interfaces can be method calls, Java RMI Messages, CGI Web invocations, totally compiled away (inlining)

The simplest implementations involve XML messages (SOAP) and programs written in net friendly languages like Java and Python

PaymentCredit Card

WarehouseShippingcontrol

WSDL interfaces

WSDL interfaces

Security CatalogPortalService

Web Services

Web Services

Page 35: 1 e-Science e-Business e-Government and their Technologies Introduction Bryan Carpenter, Geoffrey Fox, Marlon Pierce Pervasive Technology Laboratories

3535

Services and Distributed Objects A web service is a computer program running on either the local

or remote machine with a set of well defined interfaces (ports) specified in XML (WSDL)

Web Services (WS) have many similarities with Distributed Object (DO) technology but there are some (important) technical and religious points (not easy to distinguish)• CORBA Java COM are typical DO technologies• Agents are typically SOA (Service Oriented Architecture)

Both involve distributed entities but Web Services are more loosely coupled• WS interact with messages; DO with RPC (Remote Procedure Call)• DO have “factories”; WS manage instances internally and interaction-

specific state not exposed and hence need not be managed• DO have explicit state (statefull services); WS use context in the messages

to link interactions (statefull interactions) Claim: DO’s do NOT scale; WS build on experience (with

CORBA) and do scale

Page 36: 1 e-Science e-Business e-Government and their Technologies Introduction Bryan Carpenter, Geoffrey Fox, Marlon Pierce Pervasive Technology Laboratories

3636

Details of Web Service Protocol Stack UDDI finds where programs are

• remote (distributed) programs are just Web Services

• (not clearly a great success) WSFL links programs together

(under revision as BPEL) WSDL defines interface (methods,

parameters, data formats) SOAP defines structure of message

including serialization of information HTTP is negotiation/transport protocol TCP/IP is layers 3-4 of OSI Physical Network is layer 1 of OSI

UDDI or WSILUDDI or WSIL

WSFLWSFL

WSDLWSDL

SOAP or RMISOAP or RMI

HTTP or SMTP or IIOP or

RMTP

HTTP or SMTP or IIOP or

RMTP

TCP/IPTCP/IP

Physical Network

Physical Network

Page 37: 1 e-Science e-Business e-Government and their Technologies Introduction Bryan Carpenter, Geoffrey Fox, Marlon Pierce Pervasive Technology Laboratories

3737

Classic Grid Architecture

Database Database

Netsolve

Computing

SecurityCollaboration

CompositionContent Access

Resources

Clients Users and Devices

Middle TierBrokers Service Providers

Middle Tier becomes Web Services

Page 38: 1 e-Science e-Business e-Government and their Technologies Introduction Bryan Carpenter, Geoffrey Fox, Marlon Pierce Pervasive Technology Laboratories

3838

Grid Services for the Education Process “Learning Object” XML standards already exist Registration Performance (grading) Authoring of Curriculum Online laboratories for real and virtual instruments Homework submission Quizzes of various types (multiple choice, random parameters) Assessment data access and analysis Synchronous Delivery of Curricula including Audio/Video

Conferencing and other synchronous collaborative tools as Web Services

Scheduling of courses and mentoring sessions Asynchronous access, data-mining and knowledge discovery Learning Plan agents to guide students and teachers

Page 39: 1 e-Science e-Business e-Government and their Technologies Introduction Bryan Carpenter, Geoffrey Fox, Marlon Pierce Pervasive Technology Laboratories

3939

Grid Learning Model Education and Research Grids share some services

both for content and “process”• For example collaboration services are largely identical

• Research will use much larger simulation engines to get high resolution results

• Maybe a researcher uses a CAVE to visualize; education a Macintosh

But both can share data services but run through different filters to select for precision (research) or pedagogical value (education)

Education has “digital textbook” frontend to resources of the research Grid

Both use same workflow technologies to link services together

Page 40: 1 e-Science e-Business e-Government and their Technologies Introduction Bryan Carpenter, Geoffrey Fox, Marlon Pierce Pervasive Technology Laboratories

40

Database Database

Coarse grain simulations

Analysis and Visualization

RepositoriesFederated Databases

Field Trip Data Streaming Data

Loosely Coupled Filters

Sensors

?DiscoveryServices

SERVOGrid for e-Education

Page 41: 1 e-Science e-Business e-Government and their Technologies Introduction Bryan Carpenter, Geoffrey Fox, Marlon Pierce Pervasive Technology Laboratories

4141

Some Observations “Traditional “ Grids manage and share asynchronous resources in

a rather centralized fashion Peer-to-peer networks are “just like” Grids with different

implementations of message-based services like registration and look-up

Collaboration systems like WebEx/Placeware (Application sharing) or Polycom (audio/video conferencing) can be viewed as Grids

Computers are fast and getting faster. One can afford many strategies that used to be unrealistic including rich usually XML based messaging

Web Services interact with messages

• Everything (including applications like PowerPoint) will be a Web Service?

• Grids, P2P Networks, Collaborative Environments are (will be) managed message-linked Web Services

Page 42: 1 e-Science e-Business e-Government and their Technologies Introduction Bryan Carpenter, Geoffrey Fox, Marlon Pierce Pervasive Technology Laboratories

4242

Peer to Peer Grid

DatabaseDatabase

Peers

Peers

Peer to Peer GridA democratic organization

User FacingWeb Service Interfaces

Service FacingWeb Service Interfaces

Event/MessageBrokers

Event/MessageBrokers

Event/MessageBrokers

Page 43: 1 e-Science e-Business e-Government and their Technologies Introduction Bryan Carpenter, Geoffrey Fox, Marlon Pierce Pervasive Technology Laboratories

4343

System and Application Services? There are generic Grid system services: security, collaboration,

persistent storage, universal access• OGSA (Open Grid Service Architecture) is implementing these

as extended Web Services An Application Web Service is a capability used either by another

service or by a user• It has input and output ports – data is from sensors or other

services Consider Satellite-based Sensor Operations as a Web Service

• Satellite management (with a web front end)• Each tracking station is a service• Image Processing is a pipeline of filters – which can be

grouped into different services• Data storage is an important system service• Big services built hierarchically from “basic” services

Portals are the user (web browser) interfaces to Web services

Page 44: 1 e-Science e-Business e-Government and their Technologies Introduction Bryan Carpenter, Geoffrey Fox, Marlon Pierce Pervasive Technology Laboratories

4444

Satellite Science Grid Environment

Sensor Data as a Web

service (WS)

Data Analysis WS

Sensor Management

WS

Visualization WS

Simulation WS

Filter1WS

Filter2WS

Filter3WS

Build as multiple Filter Web Services

Prog1WS

Prog2WS

Build as multiple interdisciplinaryPrograms

Page 45: 1 e-Science e-Business e-Government and their Technologies Introduction Bryan Carpenter, Geoffrey Fox, Marlon Pierce Pervasive Technology Laboratories

4545

What is Happening? Grid ideas are being developed in (at least) three

communities• Web Service – W3C, OASIS• Grid Forum (High Performance Computing, e-Science)

Service Standards are being debated Grid Operational Infrastructure is being deployed Grid Architecture and core software being developed Particular System Services are being developed

“centrally” – OGSA framework for this in Lots of fields are setting domain specific standards and

building domain specific services There is a lot of hype Grids are viewed differently in different areas

• Largely “computing-on-demand” in industry (IBM, Oracle, HP, Sun)

• Largely distributed collaboratories in academia

Page 46: 1 e-Science e-Business e-Government and their Technologies Introduction Bryan Carpenter, Geoffrey Fox, Marlon Pierce Pervasive Technology Laboratories

4646

OGSA OGSI & Hosting Environments Start with Web Services in a hosting environment Add OGSI to get a Grid service and a component model Add OGSA to get Interoperable Grid “correcting” differences in base platform

and adding key functionalities

OGSI on Web Services

Broadly applicable services: registry,authorization, monitoring, data

access, etc., etc.

Hosting Environment for WS

More specialized services: datareplication, workflow, etc., etc.

Domain -specific services

Network

OGSAEnvironment

Possibly OGSA

Not OGSA

Given to us from on high

Page 47: 1 e-Science e-Business e-Government and their Technologies Introduction Bryan Carpenter, Geoffrey Fox, Marlon Pierce Pervasive Technology Laboratories

4747

Technical Activities of Note Look at different styles of Grids such as Autonomic (Robust

Reliable Resilient) New Grid architectures hard due to investment required Critical Services Such as

• Security – build message based not connection based• Notification – event services• Metadata – Use Semantic Web, provenance• Databases and repositories – instruments, sensors• Computing – Submit job, scheduling, distributed file

systems• Visualization, Computational Steering• Fabric and Service Management• Network performance

Program the Grid – Workflow Access the Grid – Portals, Grid Computing Environments

Page 48: 1 e-Science e-Business e-Government and their Technologies Introduction Bryan Carpenter, Geoffrey Fox, Marlon Pierce Pervasive Technology Laboratories

4848

Issues and Types of Grid Services 1) Types of Grid

• R3• Lightweight• P2P• Federation and Interoperability

2) Core Infrastructure and Hosting Environment

• Service Management• Component Model• Service wrapper/Invocation • Messaging

3) Security Services• Certificate Authority• Authentication• Authorization• Policy

4) Workflow Services and Programming Model

• Enactment Engines (Runtime)• Languages and Programming• Compiler• Composition/Development

5) Notification Services 6) Metadata and Information Services

• Basic including Registry• Semantically rich Services and meta-

data• Information Aggregation (events)• Provenance

7) Information Grid Services• OGSA-DAI/DAIT• Integration with compute resources• P2P and database models

8) Compute/File Grid Services• Job Submission• Job Planning Scheduling

Management• Access to Remote Files, Storage and

Computers• Replica (cache) Management• Virtual Data• Parallel Computing

9) Other services including• Grid Shell• Accounting• Fabric Management• Visualization Data-mining and

Computational Steering• Collaboration

10) Portals and Problem Solving Environments

11) Network Services• Performance• Reservation• Operations

Page 49: 1 e-Science e-Business e-Government and their Technologies Introduction Bryan Carpenter, Geoffrey Fox, Marlon Pierce Pervasive Technology Laboratories

4949

Data

Technology Components of (Services in)a Computing Grid

1: Job Management Service(Grid Service Interface to user or program client)

2: Schedule and control Execution

1: Plan Execution 4: Job Submittal

Remote Grid ServiceRemote Grid Service

6: File andStorage Access

3: Access to Remote Computers

Data

7: CacheData

Replicas5: Data Transfer

10: JobStatus

8: VirtualData

9: Grid MPI

Page 50: 1 e-Science e-Business e-Government and their Technologies Introduction Bryan Carpenter, Geoffrey Fox, Marlon Pierce Pervasive Technology Laboratories

5050

Approach Build on e-Science methodology and Grid

technology Science applications with multi-scale models,

scalable parallelism, data assimilation as key issues• Data-driven models for earthquakes,

climate, environment ….. Use existing code/database technology

(SQL/Fortran/C++) linked to “Application Web/OGSA services” • XML specification of models,

computational steering, scale supported at “Web Service” level as don’t need “high performance” here

• Allows use of Semantic Grid technology

Typicalcodes

WS linkingto user andOther WS

(data sources)

Application WS

Page 51: 1 e-Science e-Business e-Government and their Technologies Introduction Bryan Carpenter, Geoffrey Fox, Marlon Pierce Pervasive Technology Laboratories

5151

Raw (HPC) Resources

Middleware

Database

PortalServices

SystemServices

SystemServices

SystemServices

Application Service

SystemServices

SystemServices

GridComputing

Environments

UserServices

“Core”Grid

Application Metadata

Actual Application

Page 52: 1 e-Science e-Business e-Government and their Technologies Introduction Bryan Carpenter, Geoffrey Fox, Marlon Pierce Pervasive Technology Laboratories

5252

Why we can dream of using HTTP and that slow stuff

We have at least three tiers in computing environment Client (user portal) “Middle Tier” (Web Servers/brokers) Back end (databases, files, computers etc.) In Grid programming, we use HTTP (and used to use

CORBA and Java RMI) in middle tier ONLY to manipulate a proxy for real job• Proxy holds metadata • Control communication in middle tier only uses metadata• “Real” (data transfer) high performance communication in

back end

Page 53: 1 e-Science e-Business e-Government and their Technologies Introduction Bryan Carpenter, Geoffrey Fox, Marlon Pierce Pervasive Technology Laboratories

5353

Virtualization The Grid could and sometimes does virtualize

various concepts – should do more Location: URI (Universal Resource Identifier)

virtualizes URL (WSAddressing goes further) Replica management (caching) virtualizes file

location generalized by GriPhyn virtual data concept Protocol: message transport and WSDL bindings

virtualize transport protocol as a QoS request P2P or Publish-subscribe messaging virtualizes

matching of source and destination services Semantic Grid virtualizes Knowledge as a meta-data

query Brokering virtualizes resource allocation Virtualization implies all references can be indirect

and needs powerful mapping (look-up) services -- metadata

Page 54: 1 e-Science e-Business e-Government and their Technologies Introduction Bryan Carpenter, Geoffrey Fox, Marlon Pierce Pervasive Technology Laboratories

5454

Integration of Data and Filters One has the OGSA-DAI Data repository interface combined

with WSDL of the (Perl, Fortran, Python …) filter User only sees WSDL not data syntax Some non-trivial issues as to where the filtering compute

power is• Microsoft says filter next to data

DBFilter

WSDL

Of Filter

OGSA-DAI

Interface

Page 55: 1 e-Science e-Business e-Government and their Technologies Introduction Bryan Carpenter, Geoffrey Fox, Marlon Pierce Pervasive Technology Laboratories

5555

DatabaseService

SensorService

ComputeService

ParallelSimulation

Service

Middle Tier with XML Interfaces

VisualizationService

ApplicationService-1

Users

Database

ApplicationService-2

ApplicationService-3

CCE Control Portal Aggregation

SERVOGrid Complexity Computing Environment

XML Meta-dataService

ComplexitySimulation

Service

Page 56: 1 e-Science e-Business e-Government and their Technologies Introduction Bryan Carpenter, Geoffrey Fox, Marlon Pierce Pervasive Technology Laboratories

5656

HPCSimulation

DataFilter

Data FilterD

ata

Filt

er

Data

Filter

Data

Filter

Distributed Filters massage dataFor simulation

Other

Grid

and W

eb

Servi

ces

AnalysisControl

Visualize

SERVOGrid (Complexity) Computing Model

Grid

OGSA-DAIGrid Services

This Type of Gridintegrates with

Parallel computingMultiple HPC

facilities but only use one at a time

Many simultaneous data sources and

sinks

Grid Data Assimilation

Page 57: 1 e-Science e-Business e-Government and their Technologies Introduction Bryan Carpenter, Geoffrey Fox, Marlon Pierce Pervasive Technology Laboratories

5757

Two-level Programming I The paradigm implicitly assumes a two-level

Programming Model We make a Service (same as a “distributed object” or

“computer program” running on a remote computer) using conventional technologies• C++ Java or Fortran Monte Carlo module• Data streaming from a sensor or Satellite• Specialized (JDBC) database access

Such services accept and produce data from users files and databases

The Grid is built by coordinating such services assuming we have solved problem of programming the service

Service Data

Page 58: 1 e-Science e-Business e-Government and their Technologies Introduction Bryan Carpenter, Geoffrey Fox, Marlon Pierce Pervasive Technology Laboratories

5858

Two-level Programming II The Grid is discussing the composition of distributed

services with the runtime interfaces to Grid as opposed to UNIX pipes/data streams

Familiar from use of UNIX Shell, PERL or Python scripts to produce real applications from core programs

Such interpretative environments are the single processor analog of Grid Programming

Some projects like GrADS from Rice University are looking at integration between service and composition levels but dominant effort looks at each level separately

Service1 Service2

Service3 Service4

Page 59: 1 e-Science e-Business e-Government and their Technologies Introduction Bryan Carpenter, Geoffrey Fox, Marlon Pierce Pervasive Technology Laboratories

5959

Conclusions Grids are inevitable and pervasive Can expect Web Services and Grids to merge with a common

set of general principles but different implementations with different scaling and functionality trade-offs

e-Science will grow in importance as Science grows as an international “team sport”; affects scientists and organizations

Enough is known that one can start today We will be flooded with data, information and purported

knowledge One should be learning about Grids; understanding relevant

Web and Grid standards and developing new domain specific standards

Note many existing (standards) efforts assume client-server and not a brokered service model; these will need to change!