Click here to load reader

Jaime Frey Computer Sciences Department University of Wisconsin-Madison [email protected] Condor-G: A Case in Distributed

  • View

  • Download

Embed Size (px)

Text of Jaime Frey Computer Sciences Department University of Wisconsin-Madison [email protected] ...

  • Slide 1
  • Jaime Frey Computer Sciences Department University of Wisconsin-Madison [email protected] Condor-G: A Case in Distributed Job Delegation
  • Slide 2
  • Job Delegation Transfer of responsibility to schedule and execute a job Multiple delegations can form a chain
  • Slide 3
  • Job Delegation in Condor-G Today Condor-G Globus GRAM Batch System Front-end Execute Machine
  • Slide 4
  • Expanding the Model What can we do with new forms of job delegation? Some ideas Mirroring Load-balancing Glide-in schedd Multi-hop grid scheduling
  • Slide 5
  • Mirroring What it does Jobs mirrored on two Condor-Gs If primary Condor-G crashes, secondary one starts running jobs On recovery, primary Condor-G gets job status from secondary one Removes Condor-G submit point as single point of failure
  • Slide 6
  • Mirroring Example Condor-G 1 Matchmaker Execute Machine Condor-G 2
  • Slide 7
  • Mirroring Example Condor-G 1 Matchmaker Execute Machine Condor-G 2
  • Slide 8
  • Load-Balancing What it does Front-end Condor-G distributes all jobs among several back-end Condor-Gs Front-end Condor-G keeps updated job status Improves scalability Maintains single submit point for users
  • Slide 9
  • Load-Balancing Example Condor-G Back-end 1 Condor-G Front-end Condor-G Back-end 3 Condor-G Back-end 2
  • Slide 10
  • Glide-In Schedd What it does Drop a Condor-G onto the front-end machine of a cluster Delegate jobs to the cluster through the glide-in schedd Apply cluster-specific policies to jobs
  • Slide 11
  • Glide-In Schedd Example Condor-G Glide-In Schedd Batch System
  • Slide 12
  • Multi-Hop Grid Scheduling Match a job to a Virtual Organization (VO), then to a resource within that VO Easier to schedule jobs across multiple VOs and grids
  • Slide 13
  • Multi-Hop Grid Scheduling Example Experiment Condor-G Experiment Resource Broker VO Condor-G VO Resource Broker Globus GRAM Batch Scheduler
  • Slide 14
  • Endless Possibilities These new models can be combined with each other or with other new models Resulting system can be arbitrarily sophisticated
  • Slide 15
  • Job Delegation Challenges New complexity introduces new issues and exacerbates existing ones A few Transparency Representation Scheduling Control Active Job Control Revocation Error Handling and Debugging
  • Slide 16
  • Transparency Full information about job should be available to user Information from full delegation path No manual tracing across multiple machines Users need to know whats happening with their jobs
  • Slide 17
  • Representation Job state is a vector How best to show this to user Summary Current delegation endpoint Job state at endpoint Full information available if desired Series of nested ClassAds?
  • Slide 18
  • Scheduling Control Avoid loops in delegation path Give user control of scheduling Allow limiting of delegation path length? Allow user to specify part or all of delegation path
  • Slide 19
  • Active Job Control User may request certain actions hold, suspend, vacate, checkpoint Actions cannot be completed synchronously for user Must forward along delegation path User checks completion later
  • Slide 20
  • Active Job Control (cont) Endpoint systems may not support actions If possible, execute them at furthest point that does support them Allow user to apply action in middle of delegation path
  • Slide 21
  • Revocation Leases Lease must be renewed periodically for delegation to remain valid Allows revocation during long-term failures What are good values for lease lifetime and update interval?
  • Slide 22
  • Error Handling and Debugging Many more places for things to go horribly wrong Need clear, simple error semantics Logs, logs, logs Have them everywhere
  • Slide 23
  • Current Status Done Mirroring In Progress Condor-G -> Condor-G delegation User must specify hops Glide-in schedd Set up by hand
  • Slide 24
  • Thank You! Questions?

Search related