Upload
awesomesos
View
1.104
Download
2
Embed Size (px)
DESCRIPTION
A brief guide to DAGMan
Citation preview
A Guide to the DAGMan (7.0) “Specification”
Information provided by the folks at Condor
WARNING!!! This presentation lacks images
2
DAGMan
• “DAGMan (Directed Acyclic Graph Manager) is a meta-scheduler for Condor”
• Manages dependencies between compute and data jobs at a high level
What this means to us?• Provides users a simple way to denote
simple dependencies between jobs
3
An Example# Filename: aBoringExample.dag JOB A a.condor JOB B b.condor JOB C c.condor JOB D d.condor PARENT A CHILD B C PARENT B C CHILD D
# Filename: a.condorExecutable = foo Requirements = Memory >= 32 Meg Error = err.$(Process) Input = in.$(Process) Output = out.$(Process) Log = foo.log Queue 150
4
Nodes
• A node is composed of– A “cluster” of compute or data jobs defined by
one Condor or Stork description file respectively
• A group of executions defined by one queue command (i.e. 150 instances of the same program)
– (optionally) associated pre or post scripts• Only one cluster can be defined per
submit file for use with DAGMan
5
Directed Links
• Simple Dependencies– Tells Condor that children nodes can not be
executed until their parents are executed
• No complex relationships / dependencies can be given to DAGMan
6
Specification (the basics)JOB / DATA
{JOB | DATA} jobName jobDescFile.condor [DONE][DIR WD]
SCRIPTSCRIPT {PRE|POST} jobName scriptName.sh [arguments]
PARENT..CHILDPARENT p1 [p2 …] CHILD c1 [c2 …]
RETRYRETRY jobName numRetries [UNLESS-EXIT value]
Others: priority, category, vars, maxjobs, abort-dag-on, config (see documentation or feel free to ask)
7
Other Features
• When DAG is submitted, a submit description file is produced– Optionally use this file to build a hierarchy of dags
(dags within dags)• Can monitor watching myFile.dag.dagman.out• Job Recovery
– If failure, DAGMan produces a new “recover” dag– Can be used to restart DAG at nodes where failure
occurred