37
Enabling Grids for E- sciencE www.eu-egee.org Workload Management System on gLite middleware Matthieu Reichstadt CNRS/IN2P3 ACGRID School, Hanoi (Vietnam) November 5th, 2007 Credits: Valeria Ardizzone and other EGEE colleagues…

Enabling Grids for E-sciencE Workload Management System on gLite middleware Matthieu Reichstadt CNRS/IN2P3 ACGRID School, Hanoi (Vietnam)

Embed Size (px)

Citation preview

Page 1: Enabling Grids for E-sciencE  Workload Management System on gLite middleware Matthieu Reichstadt CNRS/IN2P3 ACGRID School, Hanoi (Vietnam)

Enabling Grids for E-sciencE

www.eu-egee.org

Workload Management System on gLite middlewareMatthieu Reichstadt CNRS/IN2P3

ACGRID School,

Hanoi (Vietnam) November 5th, 2007

Credits: Valeria Ardizzone and other EGEE colleagues…

Page 2: Enabling Grids for E-sciencE  Workload Management System on gLite middleware Matthieu Reichstadt CNRS/IN2P3 ACGRID School, Hanoi (Vietnam)

ACGrid School 5-9/11 2007

Enabling Grids for E-sciencE

Outline

Overview of WMS Architecture Task Queue, Information Supermarket, MatchMaker, Scheduling

Policies, Job Submission Service, Job Logging & Bookkeeping.

Job Description Language Overview Basic attributes Advanced attributes

Practice Command line Exercises

Page 3: Enabling Grids for E-sciencE  Workload Management System on gLite middleware Matthieu Reichstadt CNRS/IN2P3 ACGRID School, Hanoi (Vietnam)

ACGrid School 5-9/11 2007

Enabling Grids for E-sciencE

Workload Management System (WMS)

• Is the gLite3 component that allows users to submit jobs.

• Performs all tasks required to execute jobs.

• Comprises a set of Grid middleware components responsible for distribution and management of tasks across Grid resources.

• Hides to the user the complexity of the Grid.

Page 4: Enabling Grids for E-sciencE  Workload Management System on gLite middleware Matthieu Reichstadt CNRS/IN2P3 ACGRID School, Hanoi (Vietnam)

ACGrid School 5-9/11 2007

Enabling Grids for E-sciencE

WMS’s Architecture

Page 5: Enabling Grids for E-sciencE  Workload Management System on gLite middleware Matthieu Reichstadt CNRS/IN2P3 ACGRID School, Hanoi (Vietnam)

ACGrid School 5-9/11 2007

Enabling Grids for E-sciencE

WMS’s Architecture

Job managementJob managementrequests (submission, requests (submission, cancellation) expressedcancellation) expressed

via a Job Descriptionvia a Job DescriptionLanguage (JDL)Language (JDL)

Page 6: Enabling Grids for E-sciencE  Workload Management System on gLite middleware Matthieu Reichstadt CNRS/IN2P3 ACGRID School, Hanoi (Vietnam)

ACGrid School 5-9/11 2007

Enabling Grids for E-sciencE

WMS’s Architecture

Finds an appropriateFinds an appropriateCE for each submission CE for each submission

request, taking into account request, taking into account job requests and preferences, job requests and preferences, Grid status, utilization policies Grid status, utilization policies

on resources on resources

Page 7: Enabling Grids for E-sciencE  Workload Management System on gLite middleware Matthieu Reichstadt CNRS/IN2P3 ACGRID School, Hanoi (Vietnam)

ACGrid School 5-9/11 2007

Enabling Grids for E-sciencE

WMS’s Architecture

Repository of resourceRepository of resource informationinformation

available to matchmakeravailable to matchmaker

Updated via notifications Updated via notifications and/or active and/or active

polling on resourcespolling on resources

Page 8: Enabling Grids for E-sciencE  Workload Management System on gLite middleware Matthieu Reichstadt CNRS/IN2P3 ACGRID School, Hanoi (Vietnam)

ACGrid School 5-9/11 2007

Enabling Grids for E-sciencE

WMS’s Architecture

Keeps submission Keeps submission requestsrequests

Requests are keptRequests are kept for a whilefor a while

if no resources are if no resources are immediately availableimmediately available

Page 9: Enabling Grids for E-sciencE  Workload Management System on gLite middleware Matthieu Reichstadt CNRS/IN2P3 ACGRID School, Hanoi (Vietnam)

ACGrid School 5-9/11 2007

Enabling Grids for E-sciencE

WMS’s Architecture

Performs the actual Performs the actual job submission job submission and monitoring and monitoring

Page 10: Enabling Grids for E-sciencE  Workload Management System on gLite middleware Matthieu Reichstadt CNRS/IN2P3 ACGRID School, Hanoi (Vietnam)

ACGrid School 5-9/11 2007

Enabling Grids for E-sciencE

WMS Components (1)

• Network Server NS - WMProxy

Accepts incoming requests from the UI (job submission, job removal)

If valid, passes them to the Workload Manager

Page 11: Enabling Grids for E-sciencE  Workload Management System on gLite middleware Matthieu Reichstadt CNRS/IN2P3 ACGRID School, Hanoi (Vietnam)

ACGrid School 5-9/11 2007

Enabling Grids for E-sciencE

WMS Components (2)

• Workload Manager WM

Core component of the WMS

Takes appropriate actions to satisfy requests – Resource Broker (MatchMaker) RB

Finds the resources that best match the request– Information SuperMarket ISM

Repository of resource information available in readonly mode to the RB

– Task Queue Give the possibility to keep the request if no

resources are immediately available Not matching request will be retried periodically

(eager scheduling) Or wait for notification of avalaible resources (lazy

scheduling)

Page 12: Enabling Grids for E-sciencE  Workload Management System on gLite middleware Matthieu Reichstadt CNRS/IN2P3 ACGRID School, Hanoi (Vietnam)

ACGrid School 5-9/11 2007

Enabling Grids for E-sciencEWMS Components (3)

eager scheduling (“push” model)eager scheduling (“push” model)

a job is bound to a resource as soon as possible. Once the

decision has been taken, the job is passed to the selected

resource for execution.

lazy scheduling (“pull” model)lazy scheduling (“pull” model)

the job is held by the WM until a resource becomes

available. When this happens the resource is matched

against the submitted job.

Page 13: Enabling Grids for E-sciencE  Workload Management System on gLite middleware Matthieu Reichstadt CNRS/IN2P3 ACGRID School, Hanoi (Vietnam)

ACGrid School 5-9/11 2007

Enabling Grids for E-sciencE

WMS Components (4)

WMS components handling the job during its lifetime and performs the submission

• Job Adapter (JA)

– is responsible for making the final touches to the JDL expression for a job, before it

is passed to CondorC for the actual submission creating the job wrapper script that creates the appropriate

execution environment in the CE worker node

• transfer of the input and of the output sandboxes

• CondorC

– responsible for performing the actual job management operations

• job submission, job removal

Page 14: Enabling Grids for E-sciencE  Workload Management System on gLite middleware Matthieu Reichstadt CNRS/IN2P3 ACGRID School, Hanoi (Vietnam)

ACGrid School 5-9/11 2007

Enabling Grids for E-sciencE

WMS Components (5)

• Log Monitor (LM)– is responsible for

watching the CondorC log file intercepting interesting events concerning active jobs

• Proxy Renewal Service– is responsible to assure that,

for all the lifetime of a job, a valid user proxy exists within the WMS

MyProxy Server is contacted in order to renew the user's credential

• Logging & Bookkeeping (LB)– is responsible to

Store events generated by the variuos components of the WMS

Querying the LB user can retrieve information about the job status

Page 15: Enabling Grids for E-sciencE  Workload Management System on gLite middleware Matthieu Reichstadt CNRS/IN2P3 ACGRID School, Hanoi (Vietnam)

ACGrid School 5-9/11 2007

Enabling Grids for E-sciencE

Jobs State Machine (1/9)

Submitted job is entered by the user to the User Interface but not yet transferred to Network Server for processing

Page 16: Enabling Grids for E-sciencE  Workload Management System on gLite middleware Matthieu Reichstadt CNRS/IN2P3 ACGRID School, Hanoi (Vietnam)

ACGrid School 5-9/11 2007

Enabling Grids for E-sciencE

Jobs State Machine (2/9)

Waiting job accepted by NS and waiting for Workload Manager processing or being processed by WMHelper modules.

Page 17: Enabling Grids for E-sciencE  Workload Management System on gLite middleware Matthieu Reichstadt CNRS/IN2P3 ACGRID School, Hanoi (Vietnam)

ACGrid School 5-9/11 2007

Enabling Grids for E-sciencE

Jobs State Machine (3/9)

Ready job processed by WM but not yet transferred to the CE (local batch system queue).

Page 18: Enabling Grids for E-sciencE  Workload Management System on gLite middleware Matthieu Reichstadt CNRS/IN2P3 ACGRID School, Hanoi (Vietnam)

ACGrid School 5-9/11 2007

Enabling Grids for E-sciencE

Jobs State Machine (4/9)

Scheduled job waiting in the queue on the CE.

Page 19: Enabling Grids for E-sciencE  Workload Management System on gLite middleware Matthieu Reichstadt CNRS/IN2P3 ACGRID School, Hanoi (Vietnam)

ACGrid School 5-9/11 2007

Enabling Grids for E-sciencE

Jobs State Machine (5/9)

Running job is running on Worker Node.

Page 20: Enabling Grids for E-sciencE  Workload Management System on gLite middleware Matthieu Reichstadt CNRS/IN2P3 ACGRID School, Hanoi (Vietnam)

ACGrid School 5-9/11 2007

Enabling Grids for E-sciencE

Jobs State Machine (6/9)

Done job exited or considered to be in a terminal state by CondorC (e.g., submission to CE has failed in an unrecoverable way).

Page 21: Enabling Grids for E-sciencE  Workload Management System on gLite middleware Matthieu Reichstadt CNRS/IN2P3 ACGRID School, Hanoi (Vietnam)

ACGrid School 5-9/11 2007

Enabling Grids for E-sciencE

Jobs State Machine (7/9)

Aborted job processing was aborted by WMS (waiting in the WM queue or CE for too long, expiration of user credentials).

Page 22: Enabling Grids for E-sciencE  Workload Management System on gLite middleware Matthieu Reichstadt CNRS/IN2P3 ACGRID School, Hanoi (Vietnam)

ACGrid School 5-9/11 2007

Enabling Grids for E-sciencE

Jobs State Machine (8/9)

Cancelled job has been successfully canceled on user request.

Page 23: Enabling Grids for E-sciencE  Workload Management System on gLite middleware Matthieu Reichstadt CNRS/IN2P3 ACGRID School, Hanoi (Vietnam)

ACGrid School 5-9/11 2007

Enabling Grids for E-sciencE

Jobs State Machine (9/9)

Cleared output sandbox was transferred to the user or removed due to the timeout.

Page 24: Enabling Grids for E-sciencE  Workload Management System on gLite middleware Matthieu Reichstadt CNRS/IN2P3 ACGRID School, Hanoi (Vietnam)

ACGrid School 5-9/11 2007

Enabling Grids for E-sciencE

Job Description Language

Page 25: Enabling Grids for E-sciencE  Workload Management System on gLite middleware Matthieu Reichstadt CNRS/IN2P3 ACGRID School, Hanoi (Vietnam)

ACGrid School 5-9/11 2007

Enabling Grids for E-sciencE

The JDL language

• The Job Description Language (JDL)Job Description Language (JDL) describes jobs for execution on Grid.

• The JDL adopted within the gLite middleware is based upon Condor’s CLASSified Advertisement language (ClassAd)CLASSified Advertisement language (ClassAd).

• A ClassAd is a record-like structure composed of a finite number of attributes separated by semi-colon (;)

• A ClassAd is highly flexible and can be used to represent arbitrary services

• The JDL file is processed by the “Match-making process” to select the best resource that satisfy the job’s requirements

Page 26: Enabling Grids for E-sciencE  Workload Management System on gLite middleware Matthieu Reichstadt CNRS/IN2P3 ACGRID School, Hanoi (Vietnam)

ACGrid School 5-9/11 2007

Enabling Grids for E-sciencE

The JDL file linesJDL file lines have the format :

Attribute = expressionAttribute = expression;;

2 categories of attributes:

1. Job Attributes define the job itself

2. Resources indicate the job constraints in terms of:

• Computing Resource

• Data and Storage resources

The JDL language

Comments are indicated by # or //The JDL is sensitive to blank characters and tabs.No blank characters or tabs should follow the semicolon at the end of a line.

Page 27: Enabling Grids for E-sciencE  Workload Management System on gLite middleware Matthieu Reichstadt CNRS/IN2P3 ACGRID School, Hanoi (Vietnam)

ACGrid School 5-9/11 2007

Enabling Grids for E-sciencE

• In a JDL, some attributes are mandatory while others

are optional.

• An “essential” JDL is the following:

If needed, arguments to the executable can be passed:

Arguments = “arguments list”;

[ Executable = “test.sh”; StdOutput = “std.out”; StdError = “std.err”; InputSandbox = {“test.sh”}; OutputSandbox = {“std.out”,”std.err”}; ]

JDL : basic attributes

Page 28: Enabling Grids for E-sciencE  Workload Management System on gLite middleware Matthieu Reichstadt CNRS/IN2P3 ACGRID School, Hanoi (Vietnam)

ACGrid School 5-9/11 2007

Enabling Grids for E-sciencE

Executable = “test.sh”; StdOutput = “std.out”; StdError = “std.err”; InputSandbox = {“test.sh”}; OutputSandbox = {“std.out”,”std.err”};

Executable = < string > (mandatory)

• represents the execetable/command name

• you can specify an executable that:

• already exixts on the remote WN

• will be copied from the UI to the WN

• the arguments are reported in a specific attribute

JDL : basic attributes

Page 29: Enabling Grids for E-sciencE  Workload Management System on gLite middleware Matthieu Reichstadt CNRS/IN2P3 ACGRID School, Hanoi (Vietnam)

ACGrid School 5-9/11 2007

Enabling Grids for E-sciencE

Arguments = < string > (optional)

• arguments for executable file:

• “-out outputfile.dat”

with: Executable = “execprog”;

on the Worker Node (WN) we will have:

$ execprog -out outputfile.dat

• the characters “” should be preceded by \

“ -a \”quoted string\” -bcd” becomes:

$ execprog -a ”quoted string” –bcd

Special characters (&, |, >, <) should be preceded by triple \ : Arguments = "-f file1\\\&file2";

JDL : basic attributes

Page 30: Enabling Grids for E-sciencE  Workload Management System on gLite middleware Matthieu Reichstadt CNRS/IN2P3 ACGRID School, Hanoi (Vietnam)

ACGrid School 5-9/11 2007

Enabling Grids for E-sciencE

StdOutput, StdError, StdInput = < string > (optional)

• paths of the output / error / input files

• StdOutput and StdError:

• must be also in Output Sandbox

• could have the same value

Executable = “test.sh”; StdOutput = “std.out”; StdError = “std.err”; InputSandbox = {“test.sh”}; OutputSandbox = {“std.out”,”std.err”};

JDL : basic attributes

Page 31: Enabling Grids for E-sciencE  Workload Management System on gLite middleware Matthieu Reichstadt CNRS/IN2P3 ACGRID School, Hanoi (Vietnam)

ACGrid School 5-9/11 2007

Enabling Grids for E-sciencE

InputSandbox = < string | string list > (optional)

• contains the input files to be copied from the UI on the WN before the

job execution

• only local UI files (for LFNs use the InputData attribute)

• the files can’t be over 10 MB each

• different files with different names (the destination dir is the same)

Executable = “test.sh”; StdOutput = “std.out”; StdError = “std.err”; InputSandbox = {“test.sh”};

OutputSandbox = {“std.out”,”std.err”};

JDL : basic attributes

Page 32: Enabling Grids for E-sciencE  Workload Management System on gLite middleware Matthieu Reichstadt CNRS/IN2P3 ACGRID School, Hanoi (Vietnam)

ACGrid School 5-9/11 2007

Enabling Grids for E-sciencE

OutputSandbox = < string | string list >

• contains the output files to be transferred from the WN on the UI after

the job execution

• different files with different names (the destination dir is the same)

Executable = “test.sh”; StdOutput = “std.out”; StdError = “std.err”; InputSandbox = {“test.sh”};

OutputSandbox = {“std.out”,”std.err”};

JDL : basic attributes

Page 33: Enabling Grids for E-sciencE  Workload Management System on gLite middleware Matthieu Reichstadt CNRS/IN2P3 ACGRID School, Hanoi (Vietnam)

ACGrid School 5-9/11 2007

Enabling Grids for E-sciencE

RequirementsRequirements (mandatory)• Job requirements on Grid resources (CE,SE,…) • Evaluation performed by the Match Maker• Specified using attributes published by the Information

Service• If not specified, the default value is: Requirements = other.GlueCEStateStatus ==

"Production“;

Examples: Requirements = other.GlueCEUniqueID ==

“clrlcgce01.in2p3.fr:2119/jobmanager-lcgpbs-auvergrid” Requirements = Member(“AUVERGRID-3.07.01”,

other.GlueHostApplicationSoftwareRunTimeEnvironment);

Requirements = other.GlueCEInfoTotalCPUs > 2 && other.GlueCEPolicyMaxRunningJobs < 2;

JDL : advanced attributes

Page 34: Enabling Grids for E-sciencE  Workload Management System on gLite middleware Matthieu Reichstadt CNRS/IN2P3 ACGRID School, Hanoi (Vietnam)

ACGrid School 5-9/11 2007

Enabling Grids for E-sciencE

JDL : advanced attributes

RankRank (mandatory)

• Floating-Point expression used to rank CEs that have already met the Requirements expression.

• can contain attributes that describe the CE in the Information System (IS).

• evaluation performed by the Resource Broker (RB) during the match-making phase.

• A higher numeric value equals a better rank.

• If not specified, the default value is:

Rank = -other.GlueCEStateEstimatedResponseTime;

E.g.: Rank = Rank = other.GlueCEStateFreeCPUs;other.GlueCEStateFreeCPUs;

Page 35: Enabling Grids for E-sciencE  Workload Management System on gLite middleware Matthieu Reichstadt CNRS/IN2P3 ACGRID School, Hanoi (Vietnam)

ACGrid School 5-9/11 2007

Enabling Grids for E-sciencE

Environment = < string | string list > (optional)

• environment variables

• strings format: < variable name > = < string >

• example:

• Environment = { “JOB_LOG_FILE=/tmp/job.log”,

“INP_DIR=/tmp/input_files” };

JDL : advanced attributes

Page 36: Enabling Grids for E-sciencE  Workload Management System on gLite middleware Matthieu Reichstadt CNRS/IN2P3 ACGRID School, Hanoi (Vietnam)

ACGrid School 5-9/11 2007

Enabling Grids for E-sciencE

References

• EGEE User Guidehttps://edms.cern.ch/file/722398//gLite-3-UserGuide.pdf

• JDL Attributeshttps://edms.cern.ch/file/555796/1/EGEE-JRA1-TEC-555796-JDL-Attributes-v0-8.pdf

Page 37: Enabling Grids for E-sciencE  Workload Management System on gLite middleware Matthieu Reichstadt CNRS/IN2P3 ACGRID School, Hanoi (Vietnam)

ACGrid School 5-9/11 2007

Enabling Grids for E-sciencE

Thank you