25
1 Wolfgang Gentzsch PNC 2007 October 2007 PNC 2007 Annual Conference Berkeley, 18 - 20 October 2007 Building and Operating Grid Infrastructures for e- Science Lessons Learned and Recommendations Wolfgang Gentzsch

PNC 2007 Annual Conference Berkeley, 18 - 20 October 2007 Building and Operating

  • Upload
    werner

  • View
    41

  • Download
    0

Embed Size (px)

DESCRIPTION

PNC 2007 Annual Conference Berkeley, 18 - 20 October 2007 Building and Operating Grid Infrastructures for e-Science Lessons Learned and Recommendations Wolfgang Gentzsch. What can we learn from our existing Service Infrastructures ? Water, Gas, Electrical Power, Transportation. - PowerPoint PPT Presentation

Citation preview

Page 1: PNC 2007 Annual Conference   Berkeley, 18 - 20 October 2007 Building and Operating

1Wolfgang GentzschPNC 2007 October 2007

PNC 2007 Annual Conference Berkeley, 18 - 20 October 2007

Building and Operating Grid Infrastructures for e-Science

Lessons Learned and Recommendations

Wolfgang Gentzsch

Page 2: PNC 2007 Annual Conference   Berkeley, 18 - 20 October 2007 Building and Operating

2Wolfgang GentzschPNC 2007 October 2007

What can we learn from our existingService Infrastructures ?

Water, Gas, Electrical Power, Transportation

• The Why: driving force, need, pain, lack, etc…• … or: desire for a ‘better’ life• The How: idea > prototype > architecture > implementation • The What: organizational and operational structure, market

concepts, providers and consumers, business models (QoS, SLA, ROI, TCO), and so on

. . . to finally result in a sustainable infrastructure

Page 3: PNC 2007 Annual Conference   Berkeley, 18 - 20 October 2007 Building and Operating

EGEE 2007 Wolfgang Gentzsch October 2007

e-Infrastructure

1. Resources: Networks with computing and data nodes, etc.

2. Development/support of standard middleware & grid services

3. Internationally agreed AAA infrastructure

4. Discovery services and collaborative tools

5. Data provenance, curation and preservation

6. Open access to data and publications via interoperable repositories

7. Remote access to large-scale facilities: Telescopes, LHC, ITER, ..

8. Application- and community-specific portals

9. Industrial collaboration

10. Service Centers for maintenance, support, training, utility, applications, etc.

Courtesy Tony Hey

Page 4: PNC 2007 Annual Conference   Berkeley, 18 - 20 October 2007 Building and Operating

4Wolfgang GentzschPNC 2007 October 2007

Many Grid Projects:

Grid5000

Page 5: PNC 2007 Annual Conference   Berkeley, 18 - 20 October 2007 Building and Operating

5Wolfgang GentzschPNC 2007 October 2007

Before we started with D-Grid, we have studied other Grid Initiatives

Initiative Time Funding, E People *) Users **)

UK e-Science-I: 2001 - 2004 140M 900 Res.UK e-Science-II: 2004 - 2006 160M 1100 Res. Ind.

TeraGrid-I: 2001 - 2004 70M 500 Res.TeraGrid-II: 2005 - 2007 120M *) 850 Res.

ChinaGrid-I: 2003 - 2006 4M 400 Res. ChinaGrid-II: 2007 - 2010 4M *) 1000 Res.

NAREGI-I: 2003 - 2005 25M 150 Res. NAREGI-II 2006 - 2010 40M *) 250 Res. Ind.

EGEE-I: 2004 - 2006 30M 800 Res.EGEE-II: 2006 - 2008 35M 1000 Res. Ind.

For Comparison:D-Grid-1: 2005 - 2008 25M 220 Res. D-Grid-2: 2007 - 2010 35M 220 (= 440) Res. Ind.D-Grid-3: 2008 - 2011 Ind. Res.

*) estimated **) Res = Research, Ind = Industry

Page 6: PNC 2007 Annual Conference   Berkeley, 18 - 20 October 2007 Building and Operating

6Wolfgang GentzschPNC 2007 October 2007

Before we started with D-Grid, we have studied other Grid Initiatives

Initiative Time Funding People *) Users **)

UK e-Science-I: 2001 - 2004 $180M 900 Res.UK e-Science-II: 2004 - 2006 $220M 1100 Res. Ind.TeraGrid-I: 2001 - 2004 $90M 500 Res.TeraGrid-II: 2005 - 2010 $150M 850 Res. ChinaGrid-I: 2003 - 2006 $4M 400 Res. ChinaGrid-II: 2007 - 2010 $5M *) 1000 Res.

NAREGI-I: 2003 - 2005 $25M 150 Res. NAREGI-II 2006 - 2010 $40M *) 250 Res. Ind.EGEE-I: 2004 - 2006 $40M 800 Res.EGEE-II: 2006 - 2008 $45M 1000 Res. Ind.For Comparison:D-Grid-1: 2005 - 2008 $35M 220 Res. D-Grid-2: 2007 - 2010 $45M 220 (= 440) Res. Ind.D-Grid-3: 2008 - 2011 Ind. Res.

*) estimated **) Res = Research, Ind = Industry

Report available from www.RENCI.org

Page 7: PNC 2007 Annual Conference   Berkeley, 18 - 20 October 2007 Building and Operating

7

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Workload ManagementData Management

SecurityInformation & Monitoring

Access

To 2, 3, 4: gLite Grid Middleware

API

ComputingElement

WorkloadManagement

MetadataCatalog

StorageElement

DataMovement

File & ReplicaCatalog

Authorization

Authentication

Information &Monitoring

Application

MonitoringAuditing

JobProvenance

PackageManager

CLI

Accounting

Site Proxy

Page 8: PNC 2007 Annual Conference   Berkeley, 18 - 20 October 2007 Building and Operating

To 6, 7, 8, Access: e.g. Biomedical Scenario

Bioinformatics scientists have to execute complex tasks

Tools

Computational Power

Storage and DataServices

(SOA)

There is the need to orchestrate these services in workflows

Courtesy Livia Torterolo

Mauri
Too much written, it would be great to have PICTURES describing such statements
Page 9: PNC 2007 Annual Conference   Berkeley, 18 - 20 October 2007 Building and Operating

Tools

Computational Power

Storage and DataServices

(SOA)

Grid

Gridified Scenario

Appl.

Grid Portal/ Gateway

Grid technology leverages both the computational and data management resources Providing optimisation, scalability, reliability, faul tolerance, QoS,…

Courtesy Livia Torterolo

Page 10: PNC 2007 Annual Conference   Berkeley, 18 - 20 October 2007 Building and Operating

10Wolfgang GentzschEGEE 2007 October 2007

Building a National e-Infrastructure for Research and Industry

• 01/2003: Pre-D-Grid Working Groups Recommendation to Government• 09/2005: D-Grid-1: early adopters, ‘Services for Science’• 07/2007: D-Grid-2: new communities, ‘Service Grids’ • …/2008: D-Grid-3: Service Grids for research and industry

• D-Grid-1: 25 MEuro > 100 Orgs > 200 researchers• D-Grid-2: 30 MEuro > 100 addl Orgs > 200 addl researchers and industry• D-Grid-3: Call in 2007

Important: Sustainable production grid infrastructure after the end of the funding Integration of new communities Evaluating business models (operational models) for grid services

*) funded by the German Federal Ministry for Science and Education

Example D-Grid e-Infrastructure *)

Page 11: PNC 2007 Annual Conference   Berkeley, 18 - 20 October 2007 Building and Operating

11Wolfgang GentzschEGEE 2007 October 2007

D-Grid-12005 - 2008

Generic Grid Middleware and Grid Services

Integration Project DGI

As

tro

-Gri

d

C3

-Gri

d

HE

P-G

rid

IN-G

rid

Me

diG

rid

ON

TO

VE

RS

E

WIK

ING

ER

WIS

EN

T

Te

xtg

rid

. . . . . .

Im W

iss

en

sne

tz

User-friendly Access Layer, Portals

Page 12: PNC 2007 Annual Conference   Berkeley, 18 - 20 October 2007 Building and Operating

12Wolfgang GentzschEGEE 2007 October 2007

D-Grid -1, -2, -32005 - 2011

Generic Grid Middleware and Grid Services

Integration Project DGI-2

As

tro

-Gri

d

C3

-Gri

d

HE

P-G

rid

IN-G

rid

Me

diG

rid

ON

TO

VE

RS

E

WIK

ING

ER

WIS

EN

T

Te

xtg

rid

. . . . . .

Im W

iss

en

sne

tz

Knowledge Management

Business Services, SLAs, SOA Integration, Virtualization

User-friendly Access Layer, Portals

Page 13: PNC 2007 Annual Conference   Berkeley, 18 - 20 October 2007 Building and Operating

13Wolfgang GentzschEGEE 2007 October 2007

Nutzer

ApplicationDevelopment

and User Access

GAT API

Data/Software

Resourcesin D-Grid

High-levelGrid

Services

Basic Grid Services

DistributedData Archive

User

NetworkInfrastructur

LCG/gLite

Globus 4.0.1

AccountingBilling

User/VO-Mngt

SchedulingWorkflow Management

Data management

Security

Plug-In

UNICORE

DistributedCompute Resources

GridSphere

Monitoring

D-Grid Middleware

Page 14: PNC 2007 Annual Conference   Berkeley, 18 - 20 October 2007 Building and Operating

D-Grid-Integrationsprojekt DGI 14

Die DGI-Infrastruktur (10/2007)

2.20

0 C

PU

-Co

res,

800

TB

Dis

k, 1

.400

TB

Tap

e

Page 15: PNC 2007 Annual Conference   Berkeley, 18 - 20 October 2007 Building and Operating

15Wolfgang GentzschEGEE 2007 October 2007

Challenges

SustainableSustainableCompetitiveCompetitiveAdvantageAdvantage

CULTURALCULTURAL

TECHNICALTECHNICAL

LEGAL &LEGAL &REGULATORYREGULATORY

Page 16: PNC 2007 Annual Conference   Berkeley, 18 - 20 October 2007 Building and Operating

16Wolfgang GentzschEGEE 2007 October 2007

• Sensitive data, sensitive applications (medical patient records)• Different organizations have different ROI• Accounting, who pays for what (sharing!)• Security policies: consistent and enforced across the grid !• Lack of standards prevent interoperability of components• Current IT culture is not predisposed to sharing resources• Not all applications are grid-ready or grid-enabled• Open source is not equal open source (read the little print)• SLAs based on open source (liability?)• “Static” licensing model don’t embrace grid• Protection of intellectual property • Legal issues (privacy, national laws, multi-country grids)

Potential Grid Inhibitors

Page 17: PNC 2007 Annual Conference   Berkeley, 18 - 20 October 2007 Building and Operating

17Wolfgang GentzschEGEE 2007 October 2007

Lessons Learned and Recommendations

– During development, operation, the grid infrastructure should be modified and improved in large cycles only: all applications depend on this infrastructure !

– Continuity especially for the infrastructure part of grid projects is important. Therefore, funding should be available after the project, to guarantee services, support and continuous improvement and adjustment to new developments.

– Interoperability: Use software components and standards from open-source and standards initiatives especially in the infrastructure and application middleware layer.

– Close collaboration is mandatory between developers of the grid infrastructure and the applications to best utilize grid services and to avoid application silos.

– Infrastructure should be user-friendly for easy adoption for new communities. The infrastructure group should offer installation/operation service and support.

– Centers of Excellence should specialize on specific services, e.g. integration of new communities, grid operation, utility services, training, support, etc.

Page 18: PNC 2007 Annual Conference   Berkeley, 18 - 20 October 2007 Building and Operating

18Wolfgang GentzschEGEE 2007 October 2007

Lessons Learned and Recommendations

– For complex projects (infrastructure and application projects), a management board (consisting of the leaders of the different projects) should steer coordination and collaboration among the projects.

– New projects should utilize the general infrastructure, and focus on an application or on a specific service, to avoid complexity, re-inventing wheels, and building grid application silos.

– Participation of industry has to be industry-driven. Push from outside, even with government funding, is not promising. Success will come only from real needs e.g. through existing collaborations with research and industry, as a first step.

– Implement utility computing in small steps, enhancing existing service models moderately, testing utility models first as pilots. Often, today’s government funding models are counter-productive for utility services.

More Info: www.renci.org Publications Reports

Page 19: PNC 2007 Annual Conference   Berkeley, 18 - 20 October 2007 Building and Operating

19Wolfgang GentzschEGEE 2007 October 2007

… resulting in D-Grid-3 Call in 2007

User-friendly access: intuitive, interactive, informative, participative, collaborative, collective => Portals und Web 2.0

Community Service Grids: new application communities and service providers in research and industry; using the D-Grid platform as the basis; industrial consortium leader

Business Layer: Service Level Agreements; sustainable support of requirements of users in research and industry

Grid based Knowledge Layer: integration of content digital with suitable technologies and tools

Transformation of D-Grid into a sustainable service infrastructure for research and industry (DGI => DGI-2, gap-projects in agreements with DGI)

Page 20: PNC 2007 Annual Conference   Berkeley, 18 - 20 October 2007 Building and Operating

20Wolfgang GentzschEGEE 2007 October 2007

Challenge: D-Grid and Industry Grids vs SOA

department enterprise global

industryinterest

SOA

researchactivity

Grid

direction of technology adaptation

Page 21: PNC 2007 Annual Conference   Berkeley, 18 - 20 October 2007 Building and Operating

21Wolfgang GentzschEGEE 2007 October 2007

Challenge: D-Grid and User-Friendly AccessWeb 2.0: SciVee: YouTube for Scientists

SciVee is a collaboration between the- National Science Foundation- Public Library of Science- San Diego Supercomputing Center

SciVee is about the free and widespread dissemination and comprehension of science. Created for scientists, by scientists, SciVee moves science beyond the printed word and lecture theater taking advantage of the internet as a communication medium where scientists have a place and a voice.

Page 22: PNC 2007 Annual Conference   Berkeley, 18 - 20 October 2007 Building and Operating

Wolfgang GentzschEGEE 2007 October 2007

Persistent

Identifier Resolver

LZA-Dienst

e

RepositorySysteme

Info-Extraktion

disziplinübergreifende Werkzeuge und Infrastruktur

Dienste-katalog, Service Registry

Visuali-sierung

Ontology

Registry und

Dienste

Metadata

Registry und

Dienste

Grid-/VO-Such

e

Daten: Redundanz-vermeidung,

Replikat- verwaltung

Annotation und

Referenzierung von

Objekten und Objektteilen

Challenge: D-Grid and Knowledge Management

Diensteinfrastruktur„Informationsvermittlung“ Daten-Lebenszyklus-Management

Courtesy Dr. Lossau

Page 23: PNC 2007 Annual Conference   Berkeley, 18 - 20 October 2007 Building and Operating

23Wolfgang GentzschEGEE 2007 October 2007

Challenge: D-Grid and new Application Communities

• Sciences• Business• Healthcare• Education (K-20)• Sicial science, social systems• Arts and humanities• Web 2.0, from peer reviews to interactive masses• Grid service providers, Application service provider• Etc.

Page 24: PNC 2007 Annual Conference   Berkeley, 18 - 20 October 2007 Building and Operating

24Wolfgang GentzschEGEE 2007 October 2007

Challenge: Towards a Sustainable Infrastructure for Science and Industry

D-Grid is the Core of the German e-Science Initiative

3nd Call: Focus on Service Provisioning for Sciences & Industry

Close collaboration with: Globus Project, EGEE, Deisa, CrossGrid, CoreGrid, GridCoord, GRIP, UniGrids, NextGrid, …, EGI

Application and user-driven, not infrastructure-driven => NEED

Focus on implementation and production, not grid research, in a multi-technology environment (Globus, Unicore, gLite, etc)

Govt is (thinking of) changing policies for resource acquisition (HBFG ! ) to enable a service model

Page 25: PNC 2007 Annual Conference   Berkeley, 18 - 20 October 2007 Building and Operating

25Wolfgang GentzschEGEE 2007 October 2007

Grid Engine

[email protected]

Report is available atwww.renci.org => Reports

Thank You ! Slides are available

Combustion Engine

Steam Engine

19th Century

20th Century

21th Century