Hunter of Idle Workstations Miron Livny Marvin Solomon University of Wisconsin-Madison Email: condor-admin@cs.wisc.educondor-admin@cs.wisc.edu URL:

  • View
    214

  • Download
    0

Embed Size (px)

Text of Hunter of Idle Workstations Miron Livny Marvin Solomon University of Wisconsin-Madison Email:...

  • Hunter of Idle WorkstationsMiron LivnyMarvin SolomonUniversity of Wisconsin-MadisonEmail: condor-admin@cs.wisc.eduURL: http://www.cs.wisc.edu/condor

  • OutlineCondor overviewPotential uses of Java in CondorCurrent use of Java in Condor:Classified Advertisements

  • What is Condor?Resource finderBatch queue managerSchedulerCheckpoint/RestartProcess migrationRemote system callsAll jobsJobs linkedwith the Condorlibrary

  • Condor is RealIn production use at dozens (hundreds?) of sitesIn production use for over a decadeBasis of commercial productsLoad levelerLCFEvolving

  • Condor System StructureSubmit MachineExecution MachineCollectorCA[...A][...B][...C]CNRANegotiatorCustomer AgentResource AgentCentral Manager

  • Customer AgentMaintains queue of submitted jobsAdvertises statusSelects jobs to run

  • Resource AgentMonitors system statusLoad averageKeyboard and mouse idle timeMemory, disk space, ...Advertises statusListens for requests to run jobs

  • Central ManagerCollectorAccepts ads from resource agents and customer agentsNegotiatorMatches customers with resourcesAccountantRecords resource usage by customers

  • Condor System StructureSubmit MachineExecution MachineCollectorCA[...A][...B][...C]CNRANegotiatorCustomer AgentResource AgentCentral Manager

  • Advertising ProtocolCA[...A][...B][...C]CNRA[...N][...M][...M]

  • Advertising ProtocolCA[...A][...B][...C]CNRA[...M][...N]

  • Matching ProtocolCA[...A][...B][...C]CNRA[...M][...N]

  • Claiming ProtocolCA[...A][...C]CNRA[...S]

  • Claiming ProtocolCA[...A][...C]CNRA[...S]Job

  • Remote System CallsCA[...A][...C]CNRA[...S]JobShadow

  • Condor Meets JavaJava jobsJava for Condor implementation

  • Running Java JobsRun JVM as vanilla jobClass files are treated as ordinary jobsRequires uniform environment (same CLASSPATH everywhere)No checkpointingRe-link JVM as standard jobRemote system calls for class loaderCheckpoint/restart of vanilla jobs

  • Java-Aware CondorClass file as jobRequires pre-installed JVM, class libraries and/or job package (code + files)Also useful for remote compilationCheckpoint JVM statePlatform-independent checkpoint

  • Java for Implementing Condor

  • Classified AdvertisementsSimple yet powerfulExtensibleActive matchingSymmetric matching

  • Symmetric Active MatchingJob requires a workstationX86 architectureSolaris 2.61 GB memoryResource is only avialableBetween 6pm and 6amIf the keyboard is idle at least 15 mintuesTo DOE Contractors

  • The ClassAd LanguageSet of bindings of Attribute Names to ExpressionsSelf-describing (no separate schema)Combine query and dataArbitrarily composed and nested

  • Examples[ Type= "Job"; Owner= "raman"; Cmd= "run_sim"; Args= "-Q 17 3200"; Cwd= "/u/raman"; Memory= 31; Qdate= 886799469; ... Rank= other.Kflops... Constraint=other.Type = ...]

    [ Type= "Machine"; Name= "xxy.cs. ..."; Arch= "iX86"; OpSys= "Solaris"; Mips= 104; Kflops= 21893; State= "Unclaimed"; LoadAvg= 0.042969; ... Rank= ...; Constraint= ...;]

  • Attribute ExpressionsConstants104, 0.042969, "iX86"Referencesattr, self.attr, other.attr, expr.attrOperators+, *, >>, =, &&, ... Functionsstrcat, substr, floor, member, ...Lists{ expr, expr, ... }ClassAds[ name=expr; name=expr; ... ]

  • Example AttributesDescriptive attributesType = "Job";Owner = "raman";Arch = "iX86";OpSys = "Solaris";Memory = 64;// megabytesDisk = 323496;// k bytes

  • Example AttributesCurrent stateDaytime = 36017;// secs past midnight KeyboardIdle = 1432;// secondsState = "Unclaimed";LoadAvg = 0.042969;

  • Example AttributesParametersResearchGrp = { "raman", "miron", "solomon", "jbasney" };Friends = { "tannenba", "wright" };Untrusted = { "rival", "riffraff" };WantCheckpoint = 1;

  • Complex AttributesDerived dataRank =// machine's rank for job10 * member(other.Owner,ResearchGrp) + member(other.Owner, Friends);Rank =// job's rank for machineKflops/1E3 + other.Memory/32;

  • ConstraintsJob constraint

    Constraint =other.Type = "Machine"&& Arch = "iX86"&& OpsSys = "Solaris"&& Disk > 10000&& other.Memory >= self.Memory;

  • Constraints Machine constraintConstraint = ! member(other.Owner, Untrusted) && Rank >= 10 ? true : Rank > 0 ? (LoadAvg < 0.3 && KeyboardIdle > 15*60) : DayTime < 6*60*60 || DayTime > 18*60*60;

  • Matching AlgorithmTo match two ads A and BSet up enironment such that in Aself evaluates to Aother evaluates to Bother attributes are searched for first in A and then in Band vice versa (with A and B interchanged)Check if A.Constraint and B.Constraint both evaluate to true A.Rank and B.Rank for preferences

  • Three-valued Logicother.Memory > 32allother.Memory == 32UNDEFINEDother.Memory != 32 if other has no!(other.Memory == 32)"Memory" attribute

    other.Mips >= 10 || other.Kflps >= 1000TRUEif either attribute exists andsatisfies the given condition

  • SummaryDistributed resource allocationDistributed clients, serversHeterogeneous resourcesDistributed ownershipClassified advertisementsSemi-structured data modelSchema, data, and query in one languageSeparation of matching from claiming

  • SummaryClassAds are currently in use throughout CondorFlexibleRobustC++ and Java implementationsFreely available as part of Condor and as stand-alone libraries

  • Future WorkGet Java customersSupport Java customersVanilla jobsStandard jobsJava-aware Condor execution engine

  • Future WorkApplication of ClassAds to other distributed resource-allocation and discovery problemsBulk operations and aggregationStructural regularityValue regularityUser interfacesTools

  • Information About CondorWWWhttp://www.cs.wisc.edu/condorEmailcondor-admin@cs.wisc.edu solomon@cs.wisc.edu