7
A Guide to the DAGMan (7.0) “Specification” Information provided by the folks at Condor WARNING!!! This presentation lacks images

A Guide to DAGMan

Embed Size (px)

DESCRIPTION

A brief guide to DAGMan

Citation preview

Page 1: A Guide to DAGMan

A Guide to the DAGMan (7.0) “Specification”

Information provided by the folks at Condor

WARNING!!! This presentation lacks images

Page 2: A Guide to DAGMan

2

DAGMan

• “DAGMan (Directed Acyclic Graph Manager) is a meta-scheduler for Condor”

• Manages dependencies between compute and data jobs at a high level

What this means to us?• Provides users a simple way to denote

simple dependencies between jobs

Page 3: A Guide to DAGMan

3

An Example# Filename: aBoringExample.dag JOB A a.condor JOB B b.condor JOB C c.condor JOB D d.condor PARENT A CHILD B C PARENT B C CHILD D

# Filename: a.condorExecutable = foo Requirements = Memory >= 32 Meg Error = err.$(Process) Input = in.$(Process) Output = out.$(Process) Log = foo.log Queue 150

Page 4: A Guide to DAGMan

4

Nodes

• A node is composed of– A “cluster” of compute or data jobs defined by

one Condor or Stork description file respectively

• A group of executions defined by one queue command (i.e. 150 instances of the same program)

– (optionally) associated pre or post scripts• Only one cluster can be defined per

submit file for use with DAGMan

Page 5: A Guide to DAGMan

5

Directed Links

• Simple Dependencies– Tells Condor that children nodes can not be

executed until their parents are executed

• No complex relationships / dependencies can be given to DAGMan

Page 6: A Guide to DAGMan

6

Specification (the basics)JOB / DATA

{JOB | DATA} jobName jobDescFile.condor [DONE][DIR WD]

SCRIPTSCRIPT {PRE|POST} jobName scriptName.sh [arguments]

PARENT..CHILDPARENT p1 [p2 …] CHILD c1 [c2 …]

RETRYRETRY jobName numRetries [UNLESS-EXIT value]

Others: priority, category, vars, maxjobs, abort-dag-on, config (see documentation or feel free to ask)

Page 7: A Guide to DAGMan

7

Other Features

• When DAG is submitted, a submit description file is produced– Optionally use this file to build a hierarchy of dags

(dags within dags)• Can monitor watching myFile.dag.dagman.out• Job Recovery

– If failure, DAGMan produces a new “recover” dag– Can be used to restart DAG at nodes where failure

occurred