41
Text SAGA-based Frameworks: Supporting Application Usage Modes Shantenu Jha Director, Cyber-Infrastructure Development, CCT Asst Research Professor, CS e-Science Institute, Edinburgh http://www.cct.lsu.edu/~sjha http://saga.cct.lsu.edu

SAGA-based Frameworks: Supporting Application Usage Modes

  • Upload
    mahlah

  • View
    32

  • Download
    0

Embed Size (px)

DESCRIPTION

SAGA-based Frameworks: Supporting Application Usage Modes. Text. Shantenu Jha Director, Cyber-Infrastructure Development, CCT Asst Research Professor, CS e-Science Institute, Edinburgh http://www.cct.lsu.edu/~sjha http://saga.cct.lsu.edu. Outline (1). - PowerPoint PPT Presentation

Citation preview

Page 1: SAGA-based Frameworks:  Supporting Application Usage Modes

Text

SAGA-based Frameworks: Supporting Application Usage Modes

Shantenu Jha

Director, Cyber-Infrastructure Development, CCT

Asst Research Professor, CS

e-Science Institute, Edinburgh

http://www.cct.lsu.edu/~sjha

http://saga.cct.lsu.edu

Page 2: SAGA-based Frameworks:  Supporting Application Usage Modes

Text

Outline (1)

Understanding Distributed Applications (DA) Differ from HPC or || App, Challenges of DA DA Development Objectives (IDEAS)

Understanding SAGA (and the SAGA-Landscape) Rough Taxonomy of Distributed Applications Using SAGA to develop Distributed Applications Examples: Application & Application Frameworks

Discuss how IDEAS are met Some SAGA-based Tools and Projects Adv. Of Standards

Derive (Initial) User Requirements for FutureGrid

Page 3: SAGA-based Frameworks:  Supporting Application Usage Modes

Understanding Distributed ApplicationsCritical Perspectives

The number of applications that utilize multiple sites sequentially, concurrently or asynchronously is low (~5%):

Not referring to tightly-coupled across multiple-sites Distributed CI: Is the whole > than the sum of the parts?

Managing data and applications across multiple resources is (increasingly) hard:

Distributed Data/Jobs vs Bring it to the Computing Compute where data is or Data to where computing is

Challenges qualitatively and quantitatively set to get worse: Increasing complexity, heterogeneity and scale

Page 4: SAGA-based Frameworks:  Supporting Application Usage Modes

Understanding Distributed Applications Distributed Applications Require:

Coordination over Multiple & Distributed sites: Scale-up and Scale-out

Peta/Exa/Atta - Scientific Applications requiring multiple-runs, ensembles, workflows etc.

Core characteristics of logically and physically distributed applications are the SAME

Application Usage Mode: Composed using Application as the UNIT of execution Not a workflow (i.e., composed using control and data flow)

Usage Mode: Closer to an Abstract Workflow (template) Examples: Run once; or Set of copies of an application with

varied input data (Ensemble); Loosely-Coupled ensembles..

Page 5: SAGA-based Frameworks:  Supporting Application Usage Modes

Text

• Fundamentally a hard problem:• Dynamical Resource, Heterogeneous resources• Add to it: Complex underlying infrastructure

• Programming Systems for Distributed Applications:• Incomplete? Customization? Extensibility?• What should end-user control? Must control?

• Computational Models of Distributed Computing• Range of DA, no clear taxonomy• More than (peak) performance• Application Usage Mode

• Inter-play of Application, Infrastructure, Usage Mode

Understanding Distributed Applications Development Challenges

Page 6: SAGA-based Frameworks:  Supporting Application Usage Modes

Understanding Distributed Applications Implicit vs Explicit ?

Which approach (implicit vs explicit) is used depends: How the application is used?

Need to control/marshall more than one resource? Why distributed resources are being used? How much can be kept out of the application?

Can’t predict in advance? Not obvious what to do, application-specific metric

If possible, Applications should not be explicitly distributed GATEWAYS approach:

Implicit for the end-users Supporting Applications? Or Application Usage Modes?

Page 7: SAGA-based Frameworks:  Supporting Application Usage Modes

Understanding Distributed Applications Development Objectives

Interoperability: Ability to work across multiple distributed resources

Distributed Scale-Out: The ability to utilize multiple distributed resources concurrently

Extensibility: Support new patterns/abstractions, different programming systems, functionality & Infrastructure

Adaptivity: Response to fluctuations in dynamic resource and availability of dynamic data

Simplicity: Accommodate above distributed concerns at different levels easily…

Challenge: How to develop DA effectively and efficiently with the above as first-class objectives?

Page 8: SAGA-based Frameworks:  Supporting Application Usage Modes

Text

SAGA: Basic Philosophy There exists a lack of Programmatic approaches that:

Provide general-purpose common grid functionality for applications and thus hide underlying complexity, varying semantics..

Hides “bad” heterogeneity, means to address “good” heterogeneity Building blocks upon which to construct higher-levels of

functionality and abstractions Meets the need for a Broad Spectrum of Application:

Simple Distributed Scripts, Gateways, Smart Applications and Production Grade Tooling, Workflow…

Simple, integrated, stable, uniform and high-level interface Simple and Stable: 80:20 restricted scope and Standard Integrated: Similar semantics & style across commonly used

distributed functional requirements Uniform: Same interface for different distributed systems

SAGA: Provides Application* developers with basic units required to compose high-functionality across different distributed systems

(*) One person’s Application is another person’s Tool

Page 9: SAGA-based Frameworks:  Supporting Application Usage Modes

SAGA: In a Thousand Words

Page 10: SAGA-based Frameworks:  Supporting Application Usage Modes

Text

SAGA: Job SubmissionRole of Adaptors (middleware binding)

Page 11: SAGA-based Frameworks:  Supporting Application Usage Modes

SAGA Job API: Example

Page 12: SAGA-based Frameworks:  Supporting Application Usage Modes

SAGA Job Package

Page 13: SAGA-based Frameworks:  Supporting Application Usage Modes

SAGA File Package

Page 14: SAGA-based Frameworks:  Supporting Application Usage Modes

File API: Example

Page 15: SAGA-based Frameworks:  Supporting Application Usage Modes

SAGA Advert

Page 16: SAGA-based Frameworks:  Supporting Application Usage Modes

SAGA Advert API: Example

Page 17: SAGA-based Frameworks:  Supporting Application Usage Modes

SAGA: Other Packages

Page 18: SAGA-based Frameworks:  Supporting Application Usage Modes

SAGA: Implementations Currently there are several implementations under active

development: C++ Reference Implementation (LSU) -- OMII-UK

http://saga.cct.lsu.edu/cpp/ Java Implementation (VU Amsterdam), part of the

OMII-UK projecthttp://saga.cct.lsu.edu/java/

JSAGA (IN2P3/CNRS)http://grid.in2p3.fr/jsaga/

DEISA (partial) job, file package C++: Currently at v1.3.3 (October 2009) Python bindings to the C++ available

Good faith effort to keep things working

Page 19: SAGA-based Frameworks:  Supporting Application Usage Modes

SAGA: Available Adaptors

Job Adaptors Fork (localhost), SSH, Condor, Globus GRAM2, OMII

GridSAM,Amazon EC2, Platform LSF

File Adaptors Local FS, Globus GridFTP, Hadoop Distributed Filesystem

(HDFS),CloudStore KFS, OpenCloud Sector-Sphere

Replica Adaptors PostgreSQL/SQLite3, Globus RLS

Advert Adaptors PostgreSQL/SQLite3, Hadoop H-Base, Hypertable

Page 20: SAGA-based Frameworks:  Supporting Application Usage Modes

SAGA: Available Adaptors

Other Adaptors Default RPC / Stream / SD

Planned Adaptors CURL file adaptor, gLite job adaptor

Open issues: Consolidating the Adaptor code base and adding

rigorous tests in order to improve adaptor quality Capability Provider Interface (CPI - the ‘Adaptor

API’) is not documented or standardized (yet), but looking at existing adaptor code should get you started if you want to develop your own adaptor

Proof by example..

Page 21: SAGA-based Frameworks:  Supporting Application Usage Modes

SAGA and Distributed Applications

Page 22: SAGA-based Frameworks:  Supporting Application Usage Modes

Taxonomy of Distributed Application Example of Distributed Execution Mode:

Implicitly Distributed 1000 job submissions on the TG

SAGA shell example/tutorial Example of Explicit Coordination and Distribution

Explicitly Distributed DAG-based Workflows EnKF-HM application

Example of SAGA-based Frameworks MapReduce, Pilot-Jobs

Page 23: SAGA-based Frameworks:  Supporting Application Usage Modes

Development Distributed Application Frameworks

Frameworks: Logical structure for Capturing Application Requirements, Characteristics & Patterns

Pattern: Commonly recurring modes of computation Programming, Deployment, Execution, Data-access..

Abstraction: Mechanism to support patterns and application characteristics

Frameworks designed to either:• Support Patterns: Map-Reduce, Master-Worker,

Hierarchical Job-Submission• Provide the abstractions and/or support the requirements

& characteristics of applications• i.e. Encode a Usage-Mode using a Framework

Page 24: SAGA-based Frameworks:  Supporting Application Usage Modes

Abstractions for Distributed Computing (1) BigJob: Container Task

Adaptive:

Type A: Fix number of replicas; vary cores assigned

to each replica.

Type B: Fix the size of replica, vary number of replicas

(Cool Walking)

-- Same temperature range (adaptive sampling)

-- Greater temperature range (enhanced

dynamics)

Page 25: SAGA-based Frameworks:  Supporting Application Usage Modes

Abstractions for Distributed Computing (2)SAGA Pilot-Job (Glide-In)

Page 26: SAGA-based Frameworks:  Supporting Application Usage Modes

Coordinate Deployment & Scheduling of Multiple Pilot-Jobs

Page 27: SAGA-based Frameworks:  Supporting Application Usage Modes

Distributed Adaptive Replica Exchange (DARE)Scale-Out, Dynamic Resource Allocation and Aggregation

Page 28: SAGA-based Frameworks:  Supporting Application Usage Modes

Multi-Physics Runtime FrameworksExtensibility

Coupled Multi-Physics require two distinct, but concurrent simulations

Can co-scheduling be avoided?

Adaptive execution model: Yes

Load-balancing required. Capability comes for free!

First demonstrated multi-platform Pilot-Job:

TG(MD) – Condor (CFD)

Page 29: SAGA-based Frameworks:  Supporting Application Usage Modes

Dynamic Execution Reduced Time to Solution

Page 30: SAGA-based Frameworks:  Supporting Application Usage Modes

Ensemble Kalman Filters Heterogeneous Sub-Tasks

Ensemble Kalman filters (EnKF), are recursive filters to handle large, noisy data; use the EnKF for history matching and reservoir characterization

EnKF is a particularly interesting case of irregular, hard-to-predict run time characteristics:

Page 31: SAGA-based Frameworks:  Supporting Application Usage Modes

Results: Scale-Out Performance

Using more machines decreases the TTC and variation between experiments

Using BQP decreases the TTC & variation between experiments further

Lowest time to completion achieved when using BQP and all available resources

Page 32: SAGA-based Frameworks:  Supporting Application Usage Modes

Performance Advantage from Scale-Out

But Why does BQP Help?

Page 33: SAGA-based Frameworks:  Supporting Application Usage Modes

Understanding Distributed Applications Development Objectives Redux

Interoperability: Ability to work across multiple distributed resources

SAGA: Middleware Agnostic Distributed Scale-Out: The ability to utilize multiple

distributed resources concurrently Support Multiple Pilot-Jobs: Ranger, Abe, QB

Extensibility: Support new patterns/abstractions, different programming systems, functionality & Infrastructure

Pilot-Job also Coupled CFD-MD, Integrated BQP Adaptivity: Response to fluctuations in dynamic resource

and availability of dynamic data Simplicity: Accommodate above distributed concerns at

different levels easily…

Page 34: SAGA-based Frameworks:  Supporting Application Usage Modes

SAGA: Bridging the Gap between Infrastructure and Applications

Focus on Application Development and

Characteristics, not infrastructure details

Page 35: SAGA-based Frameworks:  Supporting Application Usage Modes

Text

SAGA-based Tools and Projects

JSAGA from IN2P3 (Lyon) http://grid.in2p3.fr/jsaga/index.html Slides Ack: Sylvain Renaud

GANGA-DIANE (EGEE) http://faust.cct.lsu.edu/trac/saga/wiki/Applications/GangaSAGA Slides Ack: Jackub Mosciki, Massimo L, O. Weidner

NAREGI/KEK (Active) DESHL

DEISA-based Shell and Workflow library XtreemOS SD Specification

With gLite adaptors

Advantage of Standards

Page 36: SAGA-based Frameworks:  Supporting Application Usage Modes

36

JSAGA: Implementer and user of SAGAJSAGA: Implementer and user of SAGA

JSAGA JSAGA uses SAGA uses SAGA in a module, in a module, which hides heterogeneity of which hides heterogeneity of grid infrastructuresgrid infrastructures

JSAGA JSAGA implements SAGA implements SAGA to hide to hide heterogeneity of heterogeneity of middlewaresmiddlewares

ApplicationsApplications

jobsjobscollectioncollection

JSAGAJSAGA

SAGASAGA

core enginecore engine+ plug-ins+ plug-insJSAGAJSAGA

Legacy APIsLegacy APIs

Page 37: SAGA-based Frameworks:  Supporting Application Usage Modes

JSAGA 37

Projects using JSAGAProjects using JSAGA

Elis@– a web portal for submitting jobs to industrial and

research grid infrastructures

SimExplorer– a set of tools for managing simulation experiments– includes a workflow engine that submit jobs to

heterogeneous distributed computing resources

JJS– a tool for running efficiently short-life jobs on EGEE

JUX– a multi-protocols file browser

//

Page 38: SAGA-based Frameworks:  Supporting Application Usage Modes

DIANE INTEGRATIONDIANE INTEGRATION cont. cont.

Diane without SAGA Diane with SAGA

Page 39: SAGA-based Frameworks:  Supporting Application Usage Modes

Maste

r

Maste

r

Agents

scheduling

Agents

schedulingHeterogeneous resourcesallocation (Ganga + Ganga/SAGA)

Applications on heterogeneous Applications on heterogeneous resourcesresources

Ganga/gLite

Ganga/SAGA (to TeraGrid)

Ganga/SAGA (to *)

Payload distribution

Payload distribution

Application-aware

(and resource-

aware) scheduling

Federating

resources!

(Not in this demo: cloud

resources, additional Grid

infrastructures…)

Page 40: SAGA-based Frameworks:  Supporting Application Usage Modes

AcknowledgementsSAGA Team and DPA Team and the UK-EPSRC (UK EPSRC: DPA, OMII-UK , OMII-UK PAL)

People:SAGA D&D: Hartmut Kaiser, Ole Weidner, Andre Merzky, Joohyun Kim, Lukasz

Lacinski, João Abecasis, Chris Miceli, Bety Rodriguez-MillaSAGA Users: Andre Luckow, Yaakoub el-Khamra, Kate Stamou, Cybertools

(Abhinav Thota, Jeff, N. Kim), Owain KenwayGoogle SoC: Michael Miceli, Saurabh Sehgal, Miklos ErdelyiCollaborators and Contributors: Steve Fisher & Group, Sylvain Renaud

(JSAGA), Go Iwai & Yoshiyuki Watase (KEK)DPA: Dan Katz, Murray Cole, Manish Parashar, Omer Rana, Jon Weissman

Page 41: SAGA-based Frameworks:  Supporting Application Usage Modes