Transcript
Page 1: Dan Bradley Computer Sciences Department University of Wisconsin-Madison danb@cs.wisc.edu  Schedd On The Side

Dan BradleyComputer Sciences Department

University of [email protected]

http://www.cs.wisc.edu/condor

Schedd On The Side

Page 2: Dan Bradley Computer Sciences Department University of Wisconsin-Madison danb@cs.wisc.edu  Schedd On The Side

www.cs.wisc.edu/condor

Schedd

ScheddOn The

Side

What is it?Specialized scheduler operating on schedd’s jobs.

Job 1Job 2Job 3Job 4Job 5…Job 4*

job queue

Page 3: Dan Bradley Computer Sciences Department University of Wisconsin-Madison danb@cs.wisc.edu  Schedd On The Side

www.cs.wisc.edu/condor

Condor Farm Story

Schedd

StartdResources

RandomSeedRandomSeed

RandomSeedRandomSeed

RandomSeedRandomSeed

RandomSeed

RandomSeed

RandomSeed

RandomSeed

RandomSeed

RandomSeed

RandomSeed

Application

condor_submit

job queue

•Now that this is working, howcan I use my collaborator’sresources too?

Page 4: Dan Bradley Computer Sciences Department University of Wisconsin-Madison danb@cs.wisc.edu  Schedd On The Side

www.cs.wisc.edu/condor

Option #1: Merge Farms

› Combine machines with collaborator into one Condor resource pool.o Everything works just like it did before.o Excellent option for small to medium clusters.

o Requires bidirectional connectivity to all startds, or equivalent via GCB.

o Requires some administrative coordination (e.g. upgrades, negotiator policy, security, etc.)

Page 5: Dan Bradley Computer Sciences Department University of Wisconsin-Madison danb@cs.wisc.edu  Schedd On The Side

www.cs.wisc.edu/condor

Option #2: Flocking Together

Schedd

LocalStartds

RemoteStartds

•full featured(std universe etc)•automatic matchmaking•easy to configure

•requires bidirectionalconnectivity•both sites must runcondor

RandomSeedRandomSeed

RandomSeedRandomSeed

RandomSeedRandomSeed

RandomSeedRandomSeed

RandomSeedRandomSeed

RandomSeedRandomSeed

RandomSeed

RandomSeed

RandomSeed

RandomSeed

Page 6: Dan Bradley Computer Sciences Department University of Wisconsin-Madison danb@cs.wisc.edu  Schedd On The Side

www.cs.wisc.edu/condor

Gatekeeper

X

Option #3: Grid Universe

Schedd

Startds

RandomSeedRandomSeed

RandomSeedRandomSeed

RandomSeedRandomSeed Random

SeedRandomSeed

RandomSeedRandomSeed

RandomSeedRandomSeed

RandomSeed

RandomSeed

RandomSeed

RandomSeed

•easier to live with private networks•may use non-Condor resources

•restricted Condor feature set(e.g. no std universe over grid)•must pre-allocating jobsbetween vanilla and grid universe

vanilla site X

Page 7: Dan Bradley Computer Sciences Department University of Wisconsin-Madison danb@cs.wisc.edu  Schedd On The Side

www.cs.wisc.edu/condor

Option #4: Routing Jobs

Schedd

LocalStartds

RandomSeedRandomSeed

RandomSeedRandomSeed

RandomSeedRandomSeed

RandomSeed Random

SeedRandomSeed

RandomSeed Random

SeedRandomSeed

RandomSeed Random

SeedRandomSeed

RandomSeed

RandomSeed

RandomSeed

RandomSeed

RandomSeed

ScheddOn The

Side Gatekeeper

X

Y

Z

vanilla site X

RandomSeed

RandomSeed

site Y site Z

•dynamic allocation of jobsbetween vanilla and grid universes.•not every job is appropriate fortransformation into a grid job.

Page 8: Dan Bradley Computer Sciences Department University of Wisconsin-Madison danb@cs.wisc.edu  Schedd On The Side

www.cs.wisc.edu/condor

What About Flow Control?

› May restrict routing to jobs which have been rejected by negotiator.

› May limit maximum actively routed jobs on a per site basis.

› May limit maximum idle routed jobs per site.

› Periodic remove of idle routed jobs is possible, but no guarantee of optimal rescheduling.

› Routing table may be reconfigured dynamically.

› Multicast? Might be interesting to try.

Page 9: Dan Bradley Computer Sciences Department University of Wisconsin-Madison danb@cs.wisc.edu  Schedd On The Side

www.cs.wisc.edu/condor

What About I/O?›Jobs must be sandboxable (i.e. specifying input/output via transfer-files mechanism).

›Routing of standard universe is not supported.

›Additional restrictions may apply, depending on site network and disk.

Page 10: Dan Bradley Computer Sciences Department University of Wisconsin-Madison danb@cs.wisc.edu  Schedd On The Side

www.cs.wisc.edu/condor

What Types of Grids?›Routing table may contain any combination of grid types supported by the grid universe.

›Example: Condor-C

Schedd

ScheddOn The

Side

Schedd X

RandomSeedRandomSeed

RandomSeedRandomSeed

RandomSeedRandomSeed

RandomSeed

site X

•for two Condor sites, schedd-to-scheddsubmission requires no additional software•however, still not as trivial to use as flocking

Page 11: Dan Bradley Computer Sciences Department University of Wisconsin-Madison danb@cs.wisc.edu  Schedd On The Side

www.cs.wisc.edu/condor

Routing Behind the Scenes

Gatekeeper

XSchedd

ScheddOn The

Side

Schedd X3

X2

•navigate internal firewalls•provide custom routesfor special users•improve scalability•However, keep in mindI/O requirements etc.

Page 12: Dan Bradley Computer Sciences Department University of Wisconsin-Madison danb@cs.wisc.edu  Schedd On The Side

www.cs.wisc.edu/condor

Future Step: Glidein Factory

Gatekeeper

X

Schedd

Startds

RandomSeedRandomSeed

RandomSeedRandomSeed

RandomSeedRandomSeed

RandomSeedRandomSeed

RandomSeedRandomSeed

RandomSeedRandomSeed

RandomSeed

RandomSeed

RandomSeed

RandomSeed

•true late binding of jobs to resources•may run on top of non-Condor sites•supports full feature set of Condor(e.g. standard universe)

•requires GCB on network boundary(initiated by schedd-on-the-side?)

homesite X

ScheddOn The

Side

glidein jobs

Page 13: Dan Bradley Computer Sciences Department University of Wisconsin-Madison danb@cs.wisc.edu  Schedd On The Side

www.cs.wisc.edu/condor

Glideing in the Works

Schedd

ScheddOn The

Side

glidein factory

site X

schedd-to-schedd

schedd-to-gatekeeper

•hierarchical strategy for scalabilityand reliability•better match for private networks

•may require some additional horsepowerfrom gatekeeper machine, perhaps adedicated element for “edge services”.

RandomSeed

RandomSeed

RandomSeed

RandomSeed

RandomSeed

Page 14: Dan Bradley Computer Sciences Department University of Wisconsin-Madison danb@cs.wisc.edu  Schedd On The Side

www.cs.wisc.edu/condor

Thanks

Interested?Let us know.

We are currentlyusing job routingfor specific usersat UW. Dan Bradley

[email protected]

Future developmentwill focus on moreuse-cases.


Recommended