26
Apr 2012 Remote Condor 1 UCSD HEP Group Trainings Wedding convenience and control with RemoteCondor by Igor Sfiligoi RemoteCondor co-developed with J. Dost UC San Diego

Wedding convenience and control with RemoteCondor

Embed Size (px)

DESCRIPTION

This presentation explains why Condor is not suitable for use on user-owned machines, and why RemoteCondor is the best available solution to the problem.

Citation preview

Page 1: Wedding convenience and control with RemoteCondor

Apr 2012 Remote Condor 1

UCSD HEP Group Trainings

Weddingconvenience and control

withRemoteCondor

by Igor SfiligoiRemoteCondor co-developed with J. Dost

UC San Diego

Page 2: Wedding convenience and control with RemoteCondor

Apr 2012 Remote Condor 2

The Condor Batch System

● Condor is a Workload Management System● i.e. a batch system

● Strong points● Fault tolerant● Robust feature set● Flexible

● Large community base● Both commercial and scientific

http://research.cs.wisc.edu/condor/

Page 3: Wedding convenience and control with RemoteCondor

Apr 2012 Remote Condor 3

Condor Architecture

● Clearly separates● Resource providers

from● Resource consumers

● Each has a daemonprocess to represent it● Startd for resource provides● Schedd for resource consumers

● A central service connects them all● Managed by a Collector/Negotiator pair

Machines (aka worker nodes)CPUs, Memory, IO,...

Job queues (aka submit nodes)Jobs submitted by users

Page 4: Wedding convenience and control with RemoteCondor

Apr 2012 Remote Condor 4

Startd

Condor Architecture

Schedd

Schedd Startd

..

....

CollectorNegotiator

in a picture

Page 5: Wedding convenience and control with RemoteCondor

Apr 2012 Remote Condor 5

The truth about submit nodes

● Corollary● The submit node is a server!

● There is no real “Condor client”● The cmdline tools are just a convenience

to talk to the daemon process

Schedd

condor_submitcondor_q

Submit node

CollectorNegotiator

Startd

Page 6: Wedding convenience and control with RemoteCondor

Apr 2012 Remote Condor 6

Implications

● Being a server has several implications● Security implications

● Will have incoming connectivity● All security configuration on the submit node● Submit node controls user

authentication and authorization

● Unfriendly to non-dedicated hardware● Requires always on operation● Must be on a public&static IP address

Page 7: Wedding convenience and control with RemoteCondor

Apr 2012 Remote Condor 7

Implications

● Being a server has several implications● Security implications

● Will have incoming connectivity● All security configuration on the submit node● Submit node controls user

authentication and authorization

● Unfriendly to non-dedicated hardware● Requires always on operation● Must be on a public&static IP address

High exploit risk

Requires high trustbetween all nodes

in the cluster

Impossible touse on a laptop

Page 8: Wedding convenience and control with RemoteCondor

Apr 2012 Remote Condor 8

Implications

● Being a server has several implications● Security implications

● Will have incoming connectivity● All security configuration on the submit node● Submit node controls user

authentication and authorization

● Unfriendly to non-dedicated hardware● Requires always on operation● Must be on a public&static IP address

High exploit risk

Requires high trustbetween all nodes

in the cluster

Impossible touse on a laptop

Not suitablefor an unmanaged

user machine

Page 9: Wedding convenience and control with RemoteCondor

Apr 2012 Remote Condor 9

What are the alternatives?

● Out of the box, Condor provides● Remote submission● Condor-C

● In the contrib sections, you can find● RemoteCondor

Page 10: Wedding convenience and control with RemoteCondor

Apr 2012 Remote Condor 10

What are the alternatives?

● Out of the box, Condor provides● Remote submission● Condor-C

● In the contrib sections, you can find● RemoteCondor

This presentationargues that this isthe best solution

Page 11: Wedding convenience and control with RemoteCondor

Apr 2012 Remote Condor 11

What are the alternatives?

● Out of the box, Condor provides● Remote submission● Condor-C

● In the contrib sections, you can find● RemoteCondor

This presentationargues that this isthe best solution

So what is wrong with these?

Page 12: Wedding convenience and control with RemoteCondor

Apr 2012 Remote Condor 12

Schedd

Schedd node

Remote submission

● Essentially, connecting to a remote Schedd● condor_submit -remote … + condor_transfer_data

and● condor_q -name ..., condor_rm -name ..., …

● So no daemon processes on the submit node● A true client solution!

Scheddcondor_submit

condor_qcondor_transfer_data

Submit node

CollectorNegotiator

StartdAu

thhttp://research.cs.wisc.edu/condor/manual/v7.6/condor_submit.html

http://research.cs.wisc.edu/condor/manual/v7.6/condor_transfer_data.html

Page 13: Wedding convenience and control with RemoteCondor

Apr 2012 Remote Condor 13

So, what's the problem?

● No local user log file● Must use

condor_qto monitor progress

● Fully Condor-based user authentication● While rich, not what users expect

(e.g. no user/password)

● Hard to tie into campus-wide auth

● Staged input data not shared

● Annoying at best● High monitoring load● And it does not work

with DAGMan

Could be a problem with large datasets

Page 14: Wedding convenience and control with RemoteCondor

Apr 2012 Remote Condor 14

Condor-C

● Based on the Grid paradigm● Submit locally, then delegate to remote Schedd

● Still running a daemon process● But requires no incoming connections

Schedd

Schedd node

Schedd

condor_submitcondor_q

Submit node

CollectorNegotiator

StartdAu

th

● Secure● Laptop

friendly

Schedd

http://research.cs.wisc.edu/condor/manual/v7.6/5_3Grid_Universe.html#sec:Condor-C

Page 15: Wedding convenience and control with RemoteCondor

Apr 2012 Remote Condor 15

What are the drawbacks?

● Awkward syntax● At least compared to Vanilla universe● See the Condor manual for examples

● Has scalability problems● Could likely be improved,

but this is the current state-of-the-art

● Fully Condor-based user authentication● Staged input data not shared

Same as remotesubmissions

Can be mitigatedwith Job Router

(but adds anotherlayer of complexity)

Page 16: Wedding convenience and control with RemoteCondor

Apr 2012 Remote Condor 16

Introducing

RemoteCondor

Page 17: Wedding convenience and control with RemoteCondor

Apr 2012 Remote Condor 17

What's the big idea?

● Let the users login into a remote machine● And run the cmdline tools there True client

approach

Page 18: Wedding convenience and control with RemoteCondor

Apr 2012 Remote Condor 18

What's the big idea?

● Let the users login into a remote machine● And run the cmdline tools there

Advantages:● True local Condor experience● Standard system authentication and authorization

● No admin privileges for the users

● Trust based on “central” Schedd admin skills● Can regulate and transform Condor submissions

● Minimize security risk● Central handling● Familiar to users

No exceptions

Page 19: Wedding convenience and control with RemoteCondor

Apr 2012 Remote Condor 19

What's the big idea?

● Let the users login into a remote machine● And run the cmdline tools there

Advantages:● True local Condor experience● Standard system authentication and authorization

● No admin privileges for the users

● Trust based on “central” Schedd admin skills● Can regulate and transform Condor submissions

● Minimize security risk● Central handling● Familiar to users

No exceptions

Big deal!

Where's the news?

Page 20: Wedding convenience and control with RemoteCondor

Apr 2012 Remote Condor 20

What's the big idea?

● Let the users login into a remote machine● And run the cmdline tools there

● … while preserving the local look-and-feel● RemoteCondor provides

● Wrappers around major Condor cmdline tools● Integration with sshfs

https://condor-wiki.cs.wisc.edu/index.cgi/wiki?p=RemoteCondor

Page 21: Wedding convenience and control with RemoteCondor

Apr 2012 Remote Condor 21

RemoteCondor wrappers

● Provide wrappers that use ssh under the hood● Users (almost) unaware of the trick

● But may be prompted for a password● Works best with public key authentication

sshd

Schedd node

Schedd

condor_submitcondor_q

Submit nodeCollector

Negotiator

StartdAu

th

condor_submitcondor_q

Page 22: Wedding convenience and control with RemoteCondor

Apr 2012 Remote Condor 22

RemoteCondor and sshfs

● But being able to talk to Condor is not enough● Users must be able to create and read data!

● Using sshfs solves the problem● Schedd-local disk mounted on submit node● Using ssh as a tunnel● All in user space (FUSE)

● RemoteCondor will properly convert paths(within certain limits)

http://fuse.sourceforge.net/sshfs.html

Disk local to Scheddfor maximum performance

Page 23: Wedding convenience and control with RemoteCondor

Apr 2012 Remote Condor 23

RemoteCondor and sshfs

● But being able to talk to Condor is not enough● Users must be able to create and read data!

● Using sshfs solves the problem● Schedd-local disk mounted on submit node

sshd

Schedd node

Schedd

Submit nodeCollector

Negotiator

StartdAu

th

Real disksshfs

Page 24: Wedding convenience and control with RemoteCondor

Apr 2012 Remote Condor 24

Using RemoteCondor

● Distributed in the Condor src tarball● In the Contrib section

● Requires a “make install”● To put the proper files in place

● Plus minimal configuration● Where is the remote Schedd node?● What username to use?● Where to mount the sshfs partition?

https://condor-wiki.cs.wisc.edu/index.cgi/wiki?p=RemoteCondor

Page 25: Wedding convenience and control with RemoteCondor

Apr 2012 Remote Condor 25

Summary

● Traditional Condor not suitable for user machines● Keeping Schedd nodes professionally maintained

highly desirable● To minimize security risks and control job flow

● RemoteCondor allows this operation modewhile preserving the local look-and-feel● Requires minimal local install

Page 26: Wedding convenience and control with RemoteCondor

Apr 2012 Remote Condor 26

Acknowledgements

This work is partially sponsored by ● the US National Science Foundation under Grants No. OCI-0943725 (STCI) and PHY-0612805 (CMS Maintenance & Operations),

and ● the US Department of Energy under Grant No. DE-FC02-06ER41436 subcontract No. 647F290 (OSG).