Upload
miguel-crowley
View
213
Download
0
Tags:
Embed Size (px)
Citation preview
O.W
äld
rich
5
.10
.20
04
- 1
CoreGRID Summer School Bonn, July 25, 2006
Resource Orchestration in Grids
Wolfgang Ziegler
Department of Bioinformatics
Fraunhofer Institute SCAI
O.W
äld
rich
5
.10
.20
04
- 2
Outlineo What is Orchestration and (why) do we need it ?
o Overview on existing Brokers & MetaSchedulers- General Architecture
- GridWay
- EGEE Workload Manager Service
- Condor-G
- Nimrod & Co
- Grid Service Broker
- Calana
- MP-Synergy
- SUN N1 & SGE
- MOAB
- Platform CSF
- KOALA
- MetaScheduling Service
o Grid Scheduling Architecture- OGF Standardisation Activities
- GSA Research Group
- OGSA-BES & OGSA-RSS Working Groups
- Examples (VIOLA-ISS/PHOSPHORUS/BODEGA/SAROS)
o VIOLA MetaScheduling Serviceo What next
O.W
äld
rich
5
.10
.20
04
- 3
What is Orchestration and (why) do we need it ?
o It is one of the Grid/Web buzz words
o Here is a definition for the term:
Orchestration or arrangement is the study and practice of arranging music for an orchestra or musical ensemble. In practical terms it consists of deciding which instruments should play which notes in a piece of music.
o In the Web-Service domain the term Orchestration is used for execution of specific business processes using WS-BPEL is a language for defining processes
o In the Grid domain the term Orchestration is used for coordination (usually including reservation) of multiple resources for usage by one single application or a workflow.
o No (de-facto) standards are available so far for description and management,WS-Agreement + JSDL + other term-languages could become one.
o The basic idea is: planning trough negotiation for controlled (better) QoS instead of queuing for best effort
O.W
äld
rich
5
.10
.20
04
- 4
Scenarios where Orchestration is needed
o Running applications with time constraints
o Applications must deliver results at a certain time
o Application may not start before a fixed time
o Co-allocation
o Applications requiring more than one resource, e.g.
o distributed visualisation
o multi-physics simulations
o data intensive applications requiring more storage than locally available
o dedicated QoS of network connections for distributed applications
o Workflows
o Individual interdependent components
O.W
äld
rich
5
.10
.20
04
- 5
Example: What might happen without Orchestration
UNICORE Client
Local Scheduler
UNICORE Gateway
TSI
Primary NJS
TSI
NJS
UNICORE Gateway
TSI
NJS
Local Scheduler
Local Scheduler
Job Queue Job QueueJob Queue
Site A Site BThe user describes
his Job
The Job is passed to the UNICORE
System The Primary NJS
distributes the Job to all sites
The Job is submitted to the local batch-
queues of all systems
Cluster Cluster Cluster
The components of the Job are started - depending on the state of the local batch-
queuesThe quality of the network
connections depends on the actual load
O.W
äld
rich
5
.10
.20
04
- 6
Constraints of Orchestration
o Process has to respect site autonomy and site policies
done through negotiation and use of local RMS/scheduling systems
o Reservation on behalf of the requesting user
Done through mapping to local id of the user
o Without SLA no guarantee at all, with SLA guarantees are in place, bur may be cancelled
failure of resources or service providers decision to prefer another, e.g. more profitable, job may cause unforeseen break of contract
if one SLA fails, what happens to the other ones?
o Penalties agreed upon as part of the SLA can cut one's losses
o Need to Identify acting and responsible parties beforehand
e.g. broker, local scheduling system, adapter, other instance of service/resource provider, client side process/component
o Need a tool/service to manage the orchestration - might be either local or remote
o Local RMS/schedulers must provide support for advance reservation
O.W
äld
rich
5
.10
.20
04
- 7
Full backfill algorithm Estimation of worst case start/stop for each job (preview) Node range specification Start time specification Special resource requirement specification „very low priority“ jobs (Z-jobs) Communication friendly node allocation strategy Portable: available on different parallel machines Graphical user interface Status information available via WEB interface Priority scheme (project, resources, waited time) Reserved slots are fixed and are no longer subject for
scheduling
Crucial Properties of local Scheduling systems
O.W
äld
rich
5
.10
.20
04
- 8
Overview on Brokers & MetaSchedulers
Possible components/interaction of a scheduling infrastructure of Global Grids (OGF GSA-RG)
O.W
äld
rich
5
.10
.20
04
- 9
Overview - GridWay
Environment: Globus GT2.4, GT4
Features & Scope:- Works on top of multiple local schedulers (LSF, PBS (Open, Pro) SGE, N1)- Supports migration of jobs based on monitoring of job-performance. - Support for self-adaptive applications (modification of requirements and migration request)- Provides the OGF DRMMA API for local systems able to provide the DRMMA bindings (currently SGE and N1)
License: Open Source, GPL v2 Support: GridWay on-line support forum
O.W
äld
rich
5
.10
.20
04
- 1
0
Overview - EGEE Workload Manager Service
Environment: LCG, gLite (Globus GT2.4, GT4)
Features & Scope:- Two modes of job-scheduling:
- submits jobs through Condor-G pull-mode, the computational Grid takes
- the jobs from the queue- Eager or lazy policy for job scheduling: early binding to resources (one job/multiple resources) or matching against one resource becoming free (one resource/multiple jobs)- Works on top of multiple local schedulers (LSF, PBS (Open, Pro) SGE, N1)- Supports definition of workflows with specification of dependencies - Support for VOs, accounting
License: Open Source license Support: mailing lists and bug reporting tools
O.W
äld
rich
5
.10
.20
04
- 1
1
Overview - Condor-G
Environment: Globus GT2.4 - GT4, UNICORE, NorduGrid
Features & Scope:- Fault tolerant job-submission system, supports Condor’s ClassAD match-making for resource selection- Can submit jobs to local scheduling systems (Condor, PBS (Open, Pro) SGE, N1)- Supports workflow interdependency specification through DAGman- Allows query of job status and provides callback mechanisms for termination or problems- Provides the OGF DRMMA API for local systems with DRMMA bindings
License: Open Source Support: free (mailing list) & fee-based (telephone, email)
O.W
äld
rich
5
.10
.20
04
- 1
2
Overview - Nimrod/G & Co
Environment: Globus, Legion, Condor
Features & Scope:- Focused on parametric experimentsCollaborates with multiple local schedulers (LSF, PBS (Open, Pro), SGE)- Follows an economic approach based on auctioning mechanisms including resource providers and resource consumers- API to write user-defined scheduling policies
License: Royalty free license for non-commercial use. Enfuzion is a commercially licensed variant provided by Axceleon. Support: Limited support by email
O.W
äld
rich
5
.10
.20
04
- 1
3
Overview - Grid Service Broker
Environment: Globus GT2.4, GT4, UNICORE in preparation
Features & Scope:- MetaScheduler supporting multiple heterogeneous local schedulers (Condor, PBS (Open, Pro), SGE) from the GridBus project- Interface to local systems either SSH or GRAM- Supports integration of user-defined custom scheduler- Supports scheduling based on deadline and budget constraints
License: GNU Lesser General Public License
O.W
äld
rich
5
.10
.20
04
- 1
4
Overview - Calana
Environment: Globus Toolkit 4, UNICORE
Features & Scope:- Agent based MetaScheduler for research and commercial environments.- Follows an economic approach based on auctioning mechanisms including resource providers (auctioneer) and resource consumers (bidders).- Collaboration with different local resourcespossible through implementation of appropriate agents.
License: Research prototype used in the Fraunhofer Resource Grid
O.W
äld
rich
5
.10
.20
04
- 1
5
Overview - MP Synergie
Environment: Relies on Globus Toolkit 2.4. Enterprise Grids
Features & Scope:- Scheduling decisions based on various parameters, e.g. load of the local scheduling systems, data transfer time
- Accounting Mechanism
- Supports other local schedulers (LSF, PBS (Open/Pro), UD GridMP, SGE, Loadleveler, Condor)
- Job submission may be respect availability of e.g. licenses
License: Proprietary commercial License
Support: Paid support by United Devices
O.W
äld
rich
5
.10
.20
04
- 1
6
Overview - SUN N1 & SGE
Environment: Stand alone, can optionally be integrated with other Grid middleware.
Features & Scope:- Allows exchange of the built-in scheduler with a user-provided scheduler- Supports advance reservation if the built-in scheduler is replaced by an appropriate scheduler
License: Proprietary commercial License for N1, SGE is Open Source
Support: Paid support by SUN for N1
O.W
äld
rich
5
.10
.20
04
- 1
7
Overview - MOAB Grid Scheduler
Environment: Stand alone, can optionally rely on Globus Toolkit middleware for security or and user account management. Enterprise Grids.
Features & Scope:- Bundle of MOAB, Torque, and MOAB workload manager builds a complete stack for computational Grids- Supports other local schedulers (LSF, OpenPBS, SGE, N1)- Supports advance reservation, query of reservations of various resources, e.g. hosts, software licenses, network bandwidth- Local schedulers must support advance reservation
License: Proprietary commercial LicenseMaui Grid Cluster Scheduler (limited variant) available on a specific Open Source license
Support: Paid support by Cluster Resources Inc.
O.W
äld
rich
5
.10
.20
04
- 1
8
Overview - Platform CSF & CSF Plus
Environment: Globus GT4
Features & Scope:- Coordinates communication among multiple heterogeneous local schedulers (LSF, OpenPBS, SGE, N1)- Supports advance reservation, query of reservations of various resources, e.g. hosts, software licenses, network bandwidth- Local schedulers must support advance reservation- API to write user-defined scheduling policies
License: Open Source Support: Paid support by Platform and other companies
O.W
äld
rich
5
.10
.20
04
- 1
9
Overview - KOALA
Environment: Globus Toolkit
Features & Scope:- MetaScheduler of the Dutch DAS-2 multicluster system. Supports co-allocation of compute and disk resources.-Collaboration with the local schedulers openPBS, SGE.- Local schedulers must support advance reservation
- Support for MPI-Jobs (mpich-g2)
License:
O.W
äld
rich
5
.10
.20
04
- 2
0
Overview - MetaScheduling Service
Environment: UNICORE, GT4 in preparation
Features & Scope:- MetaScheduling Web-Service supporting reservation, co- allocation and SLAs between the MetaScheduler and the client.- Collaboration with different local schedulers through adapters (EASY, PBS (open, Pro); SGE in preparation- Local schedulers must support advance reservation- Supports orchestration of arbitrary resources, e.g. compute resources and network; storage and licenses in preparation- multiple MSS may be organised hierarchically- Support for MPI-jobs (MetaMpich)- Support for workflows under work- End-to-end SLAs between service provider and service consumer in the next version
License: First version used in the VIOLA Grid testbed available for collaborating partners.
Support: by email and bug reporting tools
O.W
äld
rich
5
.10
.20
04
- 2
1
Grid Scheduling Architectures (1)
Integrating Calana with other Schedulers
Workload Manager
Broker
PhastGridAgent
PhastGridResource
AdapterAgent
OtherScheduler
Adapter
Broker
PhastGridAgent
PhastGridResource
Unicore Agent
SourceScheduler
Calana submits jobs to another scheduler
Another scheduler submits job to Calana
O.W
äld
rich
5
.10
.20
04
- 2
2
Grid Scheduling Architectures (2)
GridWay
Federation of Grid Infrastructures with GridGateWays (Grid4Utility Project)
GLOBUS GRID
GridWay
ExecutionDrivers
TransferDrivers
InformationDrivers
SchedulingModule
GridWay core
CLI & API
USER 1 USER m
GLOBUS GRID INFRASTRUCTURE B (VO B)
RESOURCE 1
SGE Cluster
RFT MDSGRAM
RESOURCE 2
PBS Cluster
RFT MDSGRAM
RESOURCE n
LSF Cluster
RFT MDSGRAM
USER n USER s
VO META-SCHEDULER
GLOBUS GRID INFRASTRUCTURE A (VO A)
RESOURCE 1
SGE Cluster
RFT MDSGRAM
RESOURCE 2
PBS Cluster
RFT MDSGRAM
RESOURCE n
LSF Cluster
RFT MDSGRAM
ExecutionDrivers
TransferDrivers
InformationDrivers
SchedulingModule
GridWay core
CLI & API
USER 1 USER m
VO META-SCHEDULER
RFT MDSGRAM
globus-job-run, Condor/G, Nimrod/G …
GridGateWay
Standard protocols & interfaces (GT GRAM, OGSA BES…)
O.W
äld
rich
5
.10
.20
04
- 2
3
Grid Scheduling Architectures (3)
Viola MetaScheduling Service
Multi-level MetaScheduling
O.W
äld
rich
5
.10
.20
04
- 2
4
Open Grid Forum Standardisation Activities (1)
o Grid Scheduling Architecture Research Group (GSA-RG)
o Addressing the definition a scheduling architecture supporting all kind of resources,
o interaction between resource management and data management,
o co-allocation and the reservation of resources, including the integration of user or provider defined scheduling policies.
o Two sub-groups of the Open Grid Service Architecture Working Group:
o Basic Execution Service working Group (OGSA-BES-WG)
o OGSA-Resource Selection Services Working Group (OGSA-RSS-WG)
o will provide protocols and interface definitions for the Selection Services portion of the Execution Management Services (components: CSG and EPS)
o Grid Resource Allocation Agreement Protocol Working Group (GRAAP-WG)
o Addressing proposed recommendation for Service Level Agreements
o WS-Agreement template and protocol
o Allows definition of Guarantee Terms, e.g. SLOs, Business Values, KPI, Penalties
O.W
äld
rich
5
.10
.20
04
- 2
5
Open Grid Forum Standardisation Activities (2)
Current diagram focuses on the creation of jobs
Provisio ning • Deployme
nt • Configurati
on Informati onServices
Service Container
Data Container
Accounti ngServices
Executi onPlanni ngServices
Candidate Set Generator (Work -
Resourcemapping)
Job Manager
™ Zur Anzeige wird der QuickTime „“ Dekompressor
.benötigt
™ Zur Anzeige wird der QuickTime „“ Dekompressor.benötigt ™ Zur Anzeige wird der QuickTime „“ Dekompressor.benötigt
™ Zur Anzeige wird der QuickTime „“ Dekompressor.benötigt
Reservat i on
o Execution Management Services (OGSA-EMS-WG) (focusing on creation of jobs)
o Subset: Basic Execution Service working Group (OGSA-BES-WG)
O.W
äld
rich
5
.10
.20
04
- 2
6
Resource Pre-selection
o Resource pre-selection necessary to reduce the number resources/service providers to negotiate with
o RSS can exploit multiple criteria, e.g.
o User/Application supplied selection criteria
o Orchestration Service focuses on negotiation, reservation and resulting SLAs
o Final selection of resources from the set provided by the RSS, e.g.
o Availability of resources
o Costs depending on possible reservation times or computing environment
o Costs caused by delay Monitoring data from Grid monitoring services
o Ongoing or planned national/EU projects
o VIOLA-ISS (pre-selection based on monitoring data of previous application runs)
o PHOSPHORUS / BODEGA (pre-selection based on semantic annotation of applications)
o SAROS (pre-selection based on actual Grid monitoring data)
O.W
äld
rich
5
.10
.20
04
- 2
7
VIOLA MetaScheduling Service
o Developed in a German Project for the evaluation of the Next Generation of NREN
o Focus on Co-allocation and support for MPI-Applications
o Compute Resources (nodes of different geographically dispersed clusters)
o End-to-End network bandwidth between cluster nodes
o Implements WS-Agreement for SLAs
o Negotiation Protocol will be incorporated OGF-draft (WS-Negotiation, extending WS-Agreement protocol)
O.W
äld
rich
5
.10
.20
04
- 2
8
Allocation Agreement Protocol
O.W
äld
rich
5
.10
.20
04
- 2
9
Site n
Local Scheduler
Site n
Adapter
HTTPS
Partial job n
MetaScheduler - Integration of local Schedulers
• Negotiation of timeslot & nodes withlocal schedulers for each job
• UNICORE initiates the reservation and submits the job-data
• UNICORE Client / MetaScheduler Service interface using WS-Agreement protocol
• Interface MetaScheduler / Adapters based on HTTPS/XML (SOAP)
• Interface between MetaScheduler Service and local RMS implemented
with adapter pattern
• Authentication and Communication of Adapter and local Scheduler with ssh
MetaScheduler
HTTPS / XML HTTPS / XML
…
Site 1
HTTPS
Local Scheduler
Site 1
Adapter
Partial job 1
Network
Network RMS
Adapter
Switch/Router
HTTPS / XML
UNICORE
Submission of job data
WS-Agreement
HTTPS
O.W
äld
rich
5
.10
.20
04
- 3
0
Example: What happens with SLA
UNICORE Client
Local Scheduler
UNICORE Gateway
TSI
Primary NJS
TSI
NJS
UNICORE Gateway
TSI
NJS
Local Scheduler
Local Scheduler
Adapter
Job Queue
Adapter Adapter
Job QueueJob Queue
MetaScheduler
Site A Site B
Network RMS ARGON
Link Usage
The user describes his Job
MetaScheduling Request (WS-Agreement
template)
MetaScheduler Response (WS-
Agreement)
Adapter
Negotiations and Reservations
Cluster Cluster Cluster
The Job is passed to the UNICORE
System
All Components of the Job are started at the point in time
agreed upon, at the same time the network connections are switched on
O.W
äld
rich
5
.10
.20
04
- 3
1
MetaScheduler – Running Jobs
• UNICORE generates UNICORE Wrapper with Job Data
• Local adapter generates local wrapper for the MetaScheduler and for the execution of the UNICORE Job
• Local Adapter submits Job with MetaScheduler Wrapper
• Local Scheduler generates Wrapper for the Execution of the MetaScheduler Wrapper
Local Wrapper
MetaScheduler Wrapper
UNICORE Wrapper
Lokale Site
LocalScheduler Adapter
Request forMetaScheduling
MetaSchedulerWrapper
LocalWrapper
UNICOREWrapper
Generation
Reservation
UNICORE: Submission of Job Data
and Generation of Job Wrapper
Generationand
Execution
O.W
äld
rich
5
.10
.20
04
- 3
2
Network Resource Management System
MetaScheduler
LocalScheduler
NetworkResource Manager
1. Submit of Reservation with QoS specification and acknowledgement of reservation
2. Bind of IP-Addresses at run-time
1.) Reservation of required Resources
• Submit of a Reservation to the Network Resource Manager
• Acknowledgement of Reservation
2.) Bind of IP-Addresses at Run-time
• IP-Addresses are published at run-time of the job through the local Adapter
• Bind of the IP-Addresses by the Network Resource Manager
• Without explicit Bind the QoS Parameters for the Site-to-Site Interconnection are used
Site n
Site B
Netz NetzR
R
R
Site A
2 GB/s1 GB/s
O.W
äld
rich
5
.10
.20
04
- 3
3
Network Resource Manager – Supported Features
a
vaila
ble
bandw
idth
t
book aheadtime
currenttime
reservations
• Immediate Reservation / Advance Reservation
• Reservation: within the „book ahead“-Timeframe (i.e. the timeframe the system manages reservations in future)
• Class: Determines the QoS-Level
• Network User: id of user the QoS shall be guaranteed
• Data for a Reservation:• Job ID, Start-time, Duration, Network
user• List of 3-Tupels {Start-/Endpoint,
Class}
O.W
äld
rich
5
.10
.20
04
- 3
4
Network Resource Manager Application Interface
Network Resource Manager
Ressou
rceA
vailab
leA
t
Sub
mit
Cance
l
Sta
tus
Necessary Functions:
• ResourceAvailableAt (Preview)• Returns time slots when a Resource (End-to-end
connection with QoS Level) will be available
• Submit• Start-time, Duration, Class, Start-/End-pointt (Site),
User,• Returns a Resource Identifier (RESID)
• Cancel <RESID>• Resource Manager frees the Resources attached to
Resource Identifier (RESID)
• Status <RESID>• Returns state of a connection (submitted, active,
released, Class, start-time, end-time, user, etc.)
• Bind <RESID>• Binding of IP-Addresses of nodes
Bin
d
O.W
äld
rich
5
.10
.20
04
- 3
5
MetaScheduler – UNICORE Client Interface
O.W
äld
rich
5
.10
.20
04
- 3
6
What Next ?
Grid - SOA convergenceSupporting resources as services
Composition of small services need agile and lightweight orchestration services
Timing problems with dynamic apps e.g. when doing parallel IO an demand
Currently high latency
Full support for workflows based on which description language?
Semantic supportGSO: scheduling ontology for automatic determination of scheduler capabilities and selection of appropriate one
O.W
äld
rich
5
.10
.20
04
- 3
7
Acknowledgements
Some of the work presented in this lecture is funded
by the German Federal Ministry of Education and
Research through the VIOLA project under grant
#01AK605F. This presentation also includes work
carried out jointly within the CoreGRID Network of
Excellence funded by the European Commission’s IST
programme under grant #004265.