Upload
timothy-beasley
View
22
Download
1
Tags:
Embed Size (px)
DESCRIPTION
Runtime Fault-Handling for Job-Flow Management In Grid Environments. Team: Gargi Dasgupta 3 , Onyeka Ezenwoye 2 , Liana Fong 1 , Selim Kalayci 2 , S. Masoud Sadjadi 2 , Balaji Viswanathan 3 , 1: IBM T. J. Watson Research Center, Yorktown Heights, NY 10598, - PowerPoint PPT Presentation
Citation preview
Generic Proxy sits between Job-Flow Manager and Meta-Scheduler, and transparently intercepts the calls between these two layers. It employs recovery policies upon a failed invocation.
Recovery Policies contain rules to detect failures and a sequence of recovery actions to follow on failure detection. These policies are defined in the form of patterns that comprise common reusable recovery actions.
Two sample fault-tolerant patterns:
Runtime Fault-Handling for Job-Flow Management In Grid Environments
Motivation Provide flexible job-flow orchestration and submission for
scientists
Integrate service-based flow orchestration with job scheduling
Provide fault-tolerance handling that is transparent to the job-flow
Approach Isolate job-flow orchestration from fault handling using
layered design
Top Layer: Job-flow (Service) orchestration; using BPEL for job-flows
Second layer: Job trapping & fault handling; using JSDL for jobs
Introduce fault-handling proxy to implement job recovery policies
Use the Generic Proxy from TRAP/BPEL framework
Fault handling policies can be job specific depending on
Type of job
Type of failure
Level of fault-tolerance specified by user
Team: Gargi Dasgupta3, Onyeka Ezenwoye2, Liana Fong1, Selim Kalayci2,
S. Masoud Sadjadi2, Balaji Viswanathan3, 1: IBM T. J. Watson Research Center, Yorktown Heights, NY 10598,
2: Florida International University, Miami, FL 33199, 3: IBM India Research Labs, New Delhi
IBM FIU
Job-Flow Manager
Job-Flow Manager
Peer-to-peerProtocols
Application or Portal
Application or Portal
User
Local Resources
Local Resources
Local Resources
Local Resources
Meta-Scheduler
Meta-Scheduler
Local scheduler
Local scheduler
Local scheduler
Localscheduler
1
2 3 4
5 6
7
1
4
6
12357 1 4 6
Resource Policies
Resource Policies
User
1
2 3
5
7
4. Prototypical Architecture
2. Two-Layered Architecture
GlobusGridwayMeta-Scheduler
ActiveBPELEngine
Portal C
lient
IBM TDWB Meta-Scheduler
Re-submit job to remote domain
Portal C
lientPortal C
lient
IBM TDWB Meta-Scheduler
Re-submit job to remote domain
Generic Proxy Generic Proxy
IBM’s WebsphereProcess Server
Local SchedulerLocal Scheduler
Local SchedulerLocal Scheduler
Local Scheduler
Local Scheduler
Local Scheduler
Domain 1: IBM Domain 2: FIU
Re-poll job at remote domain
Portal C
lientPortal C
lient
1. Overview
3. Runtime Fault-Handling
Re-submit Job Pattern Re-stage Data Pattern