23
Geoff Cawood, Terry Sloan Edinburgh Parallel Computing Centre (EPCC) Telephone: +44 131 650 5155 Email: [email protected] EPCC Sun Data and Compute Grids NeSC Review 18 March 2004

NeSC Review 18 March 2004

Embed Size (px)

DESCRIPTION

EPCC Sun Data and Compute Grids. Geoff Cawood, Terry Sloan Edinburgh Parallel Computing Centre (EPCC) Telephone: +44 131 650 5155 Email: [email protected]. NeSC Review 18 March 2004. Overview. Description and Aims Project Status Technical Achievements - PowerPoint PPT Presentation

Citation preview

Page 1: NeSC Review 18 March 2004

Geoff Cawood, Terry SloanEdinburgh Parallel Computing Centre (EPCC)

Telephone: +44 131 650 5155

Email: [email protected]

EPCC Sun Data and Compute Grids

NeSC Review

18 March 2004

Page 2: NeSC Review 18 March 2004

Overview

Description and AimsProject StatusTechnical Achievements Dissemination/ExploitationFuture Plans

Page 3: NeSC Review 18 March 2004

Description and Aims

Page 4: NeSC Review 18 March 2004

Project Goal

“Develop a fully Globus-enabled compute and data scheduler based around Grid Engine, Globus and a wide variety of data technologies” Partners

– Sun Microsystems– National e-Science Centre represented by EPCCTimescales

– 23 (+2) months duration– Due to project staff involvement in ODDGenes

– Start Feb 2002, end Feb 2004

Grid Engine open source distributed resource management (DRM) systemGlobus integration enables sharing of resources amongst collaborating enterprises

Page 5: NeSC Review 18 March 2004

Project Scenario

If enterprises A and B could expose some of their machines to each other across the internet …

Both A and B could enjoy throughput efficiency improvementsLarge gains when one enterprise is busy and the other is idle

Grid Enginea b c d

e f g h

Grid Enginee f g h

a b c d

A BUsers (A) Users (B)

Page 6: NeSC Review 18 March 2004

Functional Aims

What does the project goal mean in practice?Identify five key functional aims

1. Job scheduling across Globus to remote Grid Engines2. File transfer between local client site and remote jobs3. File transfer between any site and remote jobs4. Allow 'datagrid aware' jobs to work remotely5. Data-aware job scheduling

Derived from questioning existing Grid Engine users during Requirements WP

Page 7: NeSC Review 18 March 2004

Project Status

Page 8: NeSC Review 18 March 2004

Workpackages

WP 1: Analysis of existing Grid components– WP 1.1: UML analysis of core Globus 2.0 – WP 1.2: UML analysis of Grid Engine – WP 1.3: UML analysis of other Globus 2.0– WP 1.4: Globus toolkit V3.0 Investigations– WP 1.5: Data Technologies Investigations

WP 2: Requirements Capture & Analysis WP 3: Prototype DevelopmentWP 4: Hierarchical Scheduler DesignWP 5: Hierarchical Scheduler Development

Page 9: NeSC Review 18 March 2004

Deliverables

WP 3: Prototype Development (FINISHED)D3.1 Prototype Development: RequirementsD3.2 Prototype Development: DesignD3.3 Prototype Development: Test planD3.4 Prototype Development: TOG softwareD3.6 Prototype Development: How-To

WP 4: Hierarchical Scheduler Design (FINISHED)D4.1 JOSH Functional Specification D4.2 JOSH Systems Design

WP5: Hierarchical Scheduler Development (FINISHED)JOSH User GuideJOSH SoftwareJOSH Client Install GuideJOSH Server Install Guide JOSH Known Problems & Solutions

All WPs finishedDeliverables available from project public web site

http://www.epcc.ed.ac.uk/sungrid

Or from the Grid Engine community web site (for software)http://gridengine.sunsource.net/

WP 1: Analysis of existing Grid components (FINISHED)

– D1.1 Analysis of Globus Toolkit V2.0– D1.2 Grid Engine UML Analysis– D1.3 Globus Toolkit 2.0 GRAM Client API

Functions – D1.4 Globus 3.0 Features and Use – D1.5.2 Datagrids In Practice – D1.5.3 GridFTP – D1.5.4 OGSA-DAI – D1.5.5 Storage Resource Broker (SRB)

WP 2: Requirements Capture & Analysis (FINISHED)

– D2.1 Use cases and requirements– D2.2 Questionnaire Report

Page 10: NeSC Review 18 March 2004

Technical Achievements

"From Sun's perspective, the SunDCG project has been tremendously successful.  Together, EPCC and Sun have produced very high quality software and documents, providing real added value to Sun's Grid Engine suite and addressing some of the key issues in robust and usable Grid middleware."

Fritz Ferstl, Sun Microsystems

Page 11: NeSC Review 18 March 2004

TOG (Transfer-queue Over Globus)

Grid Enginea b c d

e

Grid Enginee f g h

d

Site A Site B

Glo

bus 2

.2.x

User A User B

– WP 3 deliverable – prototype compute scheduler– Integrates GE and Globus 2.2.x/2.4 (Software library)– Supply GE execution methods (starter method etc.) to implement a

'transfer queue' which sends jobs over Globus to a remote GE– GE complexes used for configuration– Globus GSI for security, GRAM for interaction with remote GE– GASS for small data transfer, GridFTP for large datasets– Written in Java - Globus functionality accessed through Java COG kit

Transfer queue

Page 12: NeSC Review 18 March 2004

TOG Software

Functionality1. Job scheduling across Globus to remote Grid Engines2. File transfer between local client site and remote jobs

Add special comments to job script to specify set of files to transfer between local site and remote site

4. Allow 'datagrid aware' jobs to work remotely Use of Globus GRAM ensures proxy certificate is present in remote

environment

Absent3. File transfer between any site and remote jobs

Files are transferred between remote site and local site only

5. Data-aware job scheduling

Page 13: NeSC Review 18 March 2004

TOG SoftwarePros

– Simple approach– Usability

● Existing Grid Engine interface● Only addition is Globus certificate for authentication/authorisation

– Remote administrators still have full control over their resources

Cons– Low quality scheduling decisions

● State of remote resource – is it fully loaded?● Ignores data transfer costs

– Scales poorly - one local transfer queue for each remote queue– Manual set-up

● Configuring the transfer queue with same properties as remote queue

Java virtual machine invocation per job submission

Page 14: NeSC Review 18 March 2004

JOSH (JOb Scheduling Hierarchically)

Developing JOSH softwareAddress the shortcomings of TOGIncorporate Globus 3 and grid servicesWP 5 deliverable - compute/data scheduler

Adds a new 'hierarchical' scheduler above Grid Engine

hiersched submit_ge Takes GE job script as input

(embellished with data requirements) Queries grid services at each compute

site to find best match and submits job

User

Job Spec

Hierarchical Scheduler

hiersched user Interface

Grid Engine

Grid Service Layer

Input Data Site Output Data Site

Grid Engine

Grid Service Layer

Page 15: NeSC Review 18 March 2004

JOSH

ProsSatisfies the 5 functionality goals Fulfills the project goalRemote administrators still have full control over their GEsMakes use of existing GE functionality eg. 'can run'

ConsLatency in decision makingNot so much 'scheduling' as 'choosing'Grid Engine specific solution

Page 16: NeSC Review 18 March 2004

Dissemination/Exploitation

Page 17: NeSC Review 18 March 2004

PresentationsErnst & Young, WestInfo Services, Strategy & Performance Associates, SingTel Optus, Executive Briefing Centre, Curtin Business School, Curtin University of Technology, Perth Australia, February 24th, 26th, 2004. Curtin Business School Information Systems Seminar, Curtin University of Technology, Perth, Australia, February 20th 2004GlobusWORLD 2004, San Francisco, USA, January 22nd, 2004White Rose Grid, EPCC Sun Data & Compute Grids, UCL Workshop, York University, November 11th, 2003Sun HPC Consortium, Phoenix, USA, November 2003 Open Issues in Grid Scheduling, National e-Science Centre, Edinburgh, UK, October 21st 2003 2nd Grid Engine Workshop, Regensburg, Germany, September 22-24 2003 SunLabs Europe, Edinburgh, September 1st, 2003Sun HPC Consortium, Grid and Portal Computing SIG, Heidelberg, Germany, June 21st 2003 Resource Management and Scheduling for the Grid, National e-Science Centre, Edinburgh, UK, February 13th 2003 Sun HPC Consortium, Grid and Portal Computing SIG, Baltimore, USA, November 15th 2002 EPCC Sun Data and Compute Grids / White Rose Computational Grid Meeting, EPCC, Edinburgh, UK, November 7th 2002 Sun HPC Consortium, Grid and Portal Computing SIG, Glasgow, UK, July 18th 02 Grid Engine Workshop, Regensburg, Germany, April 22-24 2002

Page 18: NeSC Review 18 March 2004

Software Take-upTransfer-queue Over Globus (TOG) take–up includes

ODD-Genes Uses SunDCG TOG and OGSA-DAI to demonstrate a scientific use for

the grid (bioinformatics), presented at– UK All Hands Meeting 2003 in Sept 2003– Supercomputing 2003 in Nov 2003 on Sun, UK e-Science and

Globus Alliance booths– Poster/Demo at Globusworld 2– Numerous visitors to Edinburgh University

INWA Uses Sun DCG TOG, OGSA-DAI and FirstDIG browser to demonstrate

data mining of commercial bank and telco data over the grid with Curtin Business School, Perth Australia

Liverpool University’s ULGrid Using Sun DCG TOG to enable users to access resources from various

departments

Raytheon Inc (USA) Use SunDCG TOG in grid evaluations

Sun Singapore

Page 19: NeSC Review 18 March 2004

Software Take-up

Job Scheduling Hierarchically (JOSH) known interest includes

White Rose GridRaytheon Inc.Academic Technology Services at UCLASchool of Pharmaceutical Sciences at the University of NottinghamTexas Advanced Computing CenterForecast Systems Laboratory of NOAA

Page 20: NeSC Review 18 March 2004

Downloads

10,300 document downloads between Feb 27th 2003 and Feb 26th 2004No specific figures on TOG/JOSH software downloads

Hosted at Grid Engine community web siteFigures are not available

BUT from EPCC web site …> 400 TOG Requirements document> 400 JOSH Functional Specification> 300 JOSH Systems design

JOSH documents only available since *Feb 3rd 2004*Community Scheduler Framework

Does not have data aware schedulingPlatform have asked if they could get the JOSH algorithms included

So LOTS of interest in JOSH

Page 21: NeSC Review 18 March 2004

Future Plans

Page 22: NeSC Review 18 March 2004

Future Plans

Effort budget ran out in February 2004Sun will integrate TOG/JOSH into Grid Engine source from March 2004Open Source development via Grid Engine community web site If funds made available

WS-RF update Access to other DRMS eg Loadleveller, LSFWS-Agreement compliance, JSDLFurther functionality

All are straightforward due to good design in JOSH

"I just recommended TOG and JOSH as a starting point for a partner who wants to build Grid middleware for nuclear plants."

Fritz Ferstl, Sun Microsystems

Page 23: NeSC Review 18 March 2004

Demo