35
Peter Couvares Computer Sciences Department University of Wisconsin-Madison [email protected] http://www.cs.wisc.edu/~pfc High-Throughput Computing With Condor

Peter Couvares Computer Sciences Department University of Wisconsin-Madison [email protected] pfc High-Throughput Computing With

Embed Size (px)

Citation preview

Page 1: Peter Couvares Computer Sciences Department University of Wisconsin-Madison pfc@cs.wisc.edu pfc High-Throughput Computing With

Peter CouvaresComputer Sciences DepartmentUniversity of Wisconsin-Madison

[email protected]://www.cs.wisc.edu/~pfc

High-Throughput Computing With

Condor

Page 2: Peter Couvares Computer Sciences Department University of Wisconsin-Madison pfc@cs.wisc.edu pfc High-Throughput Computing With

www.cs.wisc.edu/condor

Who Are We?

Page 3: Peter Couvares Computer Sciences Department University of Wisconsin-Madison pfc@cs.wisc.edu pfc High-Throughput Computing With

www.cs.wisc.edu/condor

The Condor Project (Established ‘85)

Distributed systems CS research performed by a team that faces:

software engineering challenges in a Unix/Linux/NT environment,

active interaction with users and collaborators, daily maintenance and support challenges of a

distributed production environment, and educating and training students.

Funding - NSF, NASA,DoE, DoD, IBM, INTEL, Microsoft and the UW Graduate School

.

Page 4: Peter Couvares Computer Sciences Department University of Wisconsin-Madison pfc@cs.wisc.edu pfc High-Throughput Computing With

www.cs.wisc.edu/condor

The Condor System

Page 5: Peter Couvares Computer Sciences Department University of Wisconsin-Madison pfc@cs.wisc.edu pfc High-Throughput Computing With

www.cs.wisc.edu/condor

The Condor System

› Unix and NT

› Operational since 1986

› More than 1300 CPUs at UW-Madison

› Available on the web

› More than 150 clusters worldwide in academia and industry

Page 6: Peter Couvares Computer Sciences Department University of Wisconsin-Madison pfc@cs.wisc.edu pfc High-Throughput Computing With

www.cs.wisc.edu/condor

What is Condor?

› Condor converts collections of distributively owned workstations and dedicated clusters into a high-throughput computing facility.

› Condor uses matchmaking to make sure that everyone is happy.

Page 7: Peter Couvares Computer Sciences Department University of Wisconsin-Madison pfc@cs.wisc.edu pfc High-Throughput Computing With

www.cs.wisc.edu/condor

What is High-Throughput Computing?

› High-performance: CPU cycles/second under ideal circumstances. “How fast can I run simulation X on this

machine?”

› High-throughput: CPU cycles/day (week, month, year?) under non-ideal circumstances. “How many times can I run simulation X in

the next month using all available machines?”

Page 8: Peter Couvares Computer Sciences Department University of Wisconsin-Madison pfc@cs.wisc.edu pfc High-Throughput Computing With

www.cs.wisc.edu/condor

What is High-Throughput Computing?

› Condor does whatever it takes to run your jobs, even if some machines… Crash! (or are disconnected) Run out of disk space Don’t have your software installed Are frequently needed by others Are far away & admin’ed by someone

else

Page 9: Peter Couvares Computer Sciences Department University of Wisconsin-Madison pfc@cs.wisc.edu pfc High-Throughput Computing With

www.cs.wisc.edu/condor

What is Matchmaking?

› Condor uses Matchmaking to make sure that work gets done within the constraints of both users and owners.

› Users (jobs) have constraints: “I need an Alpha with 256 MB RAM”

› Owners (machines) have constraints: “Only run jobs when I am away from my

desk and never run jobs owned by Bob.”

Page 10: Peter Couvares Computer Sciences Department University of Wisconsin-Madison pfc@cs.wisc.edu pfc High-Throughput Computing With

www.cs.wisc.edu/condor

“What can Condordo for me?”

Condor can…

› …do your housekeeping.

› …improve reliability.

› …give performance feedback.

› …increase your throughput!

Page 11: Peter Couvares Computer Sciences Department University of Wisconsin-Madison pfc@cs.wisc.edu pfc High-Throughput Computing With

www.cs.wisc.edu/condor

Some Numbers: UW-CS Pool

6/98-6/00 4,000,000 hours ~450 years“Real” Users 1,700,000 hours ~260 years

CS-Optimization 610,000 hoursCS-Architecture 350,000 hoursPhysics 245,000 hoursStatistics 80,000 hoursEngine Research Center 38,000 hoursMath 90,000 hoursCivil Engineering 27,000 hoursBusiness 970 hours

“External” Users 165,000 hours ~19 yearsMIT76,000 hoursCornell 38,000 hoursUCSD 38,000 hoursCalTech 18,000 hours

Page 12: Peter Couvares Computer Sciences Department University of Wisconsin-Madison pfc@cs.wisc.edu pfc High-Throughput Computing With

www.cs.wisc.edu/condor

Condor & Physics

Page 13: Peter Couvares Computer Sciences Department University of Wisconsin-Madison pfc@cs.wisc.edu pfc High-Throughput Computing With

www.cs.wisc.edu/condor

Current CMS Activity

› Simulation (CMSIM) for CalTech provided >135,000 CPU hours to date peak day ~ 4000 CPU hours via NCSA Alliance, Condor has allocated

1,000,000 hours total to CalTech

› Simulation and Reconstruction (CMSIM + ORCA) for HEP group at UW-Madison

Page 14: Peter Couvares Computer Sciences Department University of Wisconsin-Madison pfc@cs.wisc.edu pfc High-Throughput Computing With

www.cs.wisc.edu/condor

INFN Condor Pool - Italy

› Italian National Institute for Research in Nuclear and Subnuclear Physics

› 19 locations, each running a Condor pool

› as few as 1 CPU -- to >100 CPUs

› each locally controlled

› each “flocks” jobs to other pools when available

Page 15: Peter Couvares Computer Sciences Department University of Wisconsin-Madison pfc@cs.wisc.edu pfc High-Throughput Computing With

www.cs.wisc.edu/condor

Particle Physics Data Grid

› The PPDG Project is... a software engineering effort to

design, implement, experiment, evaluate, and prototype HEP-specific data-transfer and caching software tools for Grid environments

› For example...

Page 16: Peter Couvares Computer Sciences Department University of Wisconsin-Madison pfc@cs.wisc.edu pfc High-Throughput Computing With

www.cs.wisc.edu/condor

Condor PPDG Work

› Condor Data Manager technology to automate & coordinate

data movement from a variety of long-term repositories to available Condor computing resources & back again

keeping the pipeline full! SRB (SDSC), SAM (Fermi), PPDG HRM

Page 17: Peter Couvares Computer Sciences Department University of Wisconsin-Madison pfc@cs.wisc.edu pfc High-Throughput Computing With

www.cs.wisc.edu/condor

California Institute of Technology Harvey B. Newman, Julian J. Bunn, Koen Holtman,Asad Samar, Takako Hickey, Iosif Legrand, VladimirLitvin, Philippe Galvez, James C.T. Pool, Roy Williams

Argonne National Laboratory Ian Foster, Steven TueckeLawrence Price, David Malon, Ed May

Berkeley Laboratory Stewart C. Loken, Ian Hinchcliffe, Doug Olson,Alexandre VaniachineArie Shoshani, Andreas Mueller, Alex Sim, John Wu

Brookhaven National Laboratory Bruce Gibbard, Richard Baker, Torre Wenaus

Fermi National Laboratory Victoria White, Philip Demar, Donald PetravickMatthias Kasemann, Ruth Pordes, James Amundson,Rich Wellner, Igor Terekhov, Shahzad Muzaffar

University of Florida Paul Avery

San Diego Supercomputer Center Margaret Simmons, Reagan Moore,

Stanford Linear Accelerator Center Richard P. Mount, Les Cottrell, Andrew Hanushevsky,Davide Salomoni

Thomas Jefferson NationalAccelerator Facility

Chip Watson, Ian Bird, Jie Chen

University of Wisconsin Miron Livny, Peter Couvares, Tevfik Kosar

PPDG Collaborators

Page 18: Peter Couvares Computer Sciences Department University of Wisconsin-Madison pfc@cs.wisc.edu pfc High-Throughput Computing With

www.cs.wisc.edu/condor

National Grid Efforts

› GriPhyN (Grid Physics Network)

› National Technology Grid - NCSA Alliance (NSF-PACI)

› Information Power Grid - IPG (NASA)

› close collaboration with the Globus project

Page 19: Peter Couvares Computer Sciences Department University of Wisconsin-Madison pfc@cs.wisc.edu pfc High-Throughput Computing With

www.cs.wisc.edu/condor

I have 600simulations to run.

How can Condorhelp me?

Page 20: Peter Couvares Computer Sciences Department University of Wisconsin-Madison pfc@cs.wisc.edu pfc High-Throughput Computing With

www.cs.wisc.edu/condor

My Application …Simulate the behavior of F(x,y,z) for 20 values of x, 10 values of y and 3 values of z (20*10*3 = 600) F takes on the average 3 hours to compute

on a “typical” workstation (total = 1800 hours) F requires a “moderate” (128MB) amount of

memory F performs “moderate” I/O - (x,y,z) is 5 MB

and F(x,y,z) is 50 MB

Page 21: Peter Couvares Computer Sciences Department University of Wisconsin-Madison pfc@cs.wisc.edu pfc High-Throughput Computing With

www.cs.wisc.edu/condor

Step I - get organized!› Write a script that creates 600 input files for

each of the (x,y,z) combinations

› Write a script that will collect the data from the 600 output files

› Turn your workstation into a “Personal Condor”

› Submit a cluster of 600 jobs to your personal Condor

› Go on a long vacation … (2.5 months)

Page 22: Peter Couvares Computer Sciences Department University of Wisconsin-Madison pfc@cs.wisc.edu pfc High-Throughput Computing With

www.cs.wisc.edu/condor

yourworkstation

personalCondor

600 Condorjobs

Page 23: Peter Couvares Computer Sciences Department University of Wisconsin-Madison pfc@cs.wisc.edu pfc High-Throughput Computing With

www.cs.wisc.edu/condor

Step II - build your personal Grid

› Install Condor on the desktop machine next door

› …and on the machines in the classroom.

› Install Condor on the department’s Linux cluster or the O2K in the basement.

› Configure these machines to be part of your Condor pool.

› Go on a shorter vacation ...

Page 24: Peter Couvares Computer Sciences Department University of Wisconsin-Madison pfc@cs.wisc.edu pfc High-Throughput Computing With

www.cs.wisc.edu/condor

yourworkstation

personalCondor

600 Condorjobs

GroupCondor

Page 25: Peter Couvares Computer Sciences Department University of Wisconsin-Madison pfc@cs.wisc.edu pfc High-Throughput Computing With

www.cs.wisc.edu/condor

Step III - take advantage of your

friends› Get permission from “friendly”

Condor pools to access their resources

› Configure your personal Condor to “flock” to these pools

› reconsider your vacation plans ...

Page 26: Peter Couvares Computer Sciences Department University of Wisconsin-Madison pfc@cs.wisc.edu pfc High-Throughput Computing With

www.cs.wisc.edu/condor

yourworkstation

friendly Condor

personalCondor

600 Condorjobs

GroupCondor

Page 27: Peter Couvares Computer Sciences Department University of Wisconsin-Madison pfc@cs.wisc.edu pfc High-Throughput Computing With

www.cs.wisc.edu/condor

Think BIG.

Go to the Grid.

Page 28: Peter Couvares Computer Sciences Department University of Wisconsin-Madison pfc@cs.wisc.edu pfc High-Throughput Computing With

www.cs.wisc.edu/condor

Upgrade to Condor-G

A Grid-enabled version of Condor that uses the inter-domain services of Globus to bring Grid resources into the domain of your Personal Condor

Easy to use on different platforms Robust Supports SMPs & dedicated schedulers

Page 29: Peter Couvares Computer Sciences Department University of Wisconsin-Madison pfc@cs.wisc.edu pfc High-Throughput Computing With

www.cs.wisc.edu/condor

Step IV - Go for the Grid

› Get access (account(s) + certificate(s)) to a “Computational” Grid

› Submit 599 “Grid Universe” Condor- glide-in jobs to your personal Condor

› Take the rest of the afternoon off ...

Page 30: Peter Couvares Computer Sciences Department University of Wisconsin-Madison pfc@cs.wisc.edu pfc High-Throughput Computing With

www.cs.wisc.edu/condor

yourworkstation

friendly Condor

personalCondor

600 Condorjobs

Globus Grid

PBS LSF

Condor

GroupCondor

599 glide-ins

Page 31: Peter Couvares Computer Sciences Department University of Wisconsin-Madison pfc@cs.wisc.edu pfc High-Throughput Computing With

www.cs.wisc.edu/condor

What Have We Done with the Grid Already?

› NUG30 quadratic assignment problem 30 facilities, 30 locations

• minimize cost of transferring materials between them

posed in 1968 as challenge, long unsolved but with a good pruning algorithm & high-

throughput computing...

Page 32: Peter Couvares Computer Sciences Department University of Wisconsin-Madison pfc@cs.wisc.edu pfc High-Throughput Computing With

www.cs.wisc.edu/condor

NUG30 Personal Condor Grid

For the run we will be flocking to

-- the main Condor pool at Wisconsin (600 processors)

-- the Condor pool at Georgia Tech (190 Linux boxes)

-- the Condor pool at UNM (40 processors)

-- the Condor pool at Columbia (16 processors)

-- the Condor pool at Northwestern (12 processors)

-- the Condor pool at NCSA (65 processors)

-- the Condor pool at INFN (200 processors)

We will be using glide_in to access the Origin 2000 (through LSF ) at NCSA.

We will use "hobble_in" to access the Chiba City Linux cluster and Origin

2000 here at Argonne.

Page 33: Peter Couvares Computer Sciences Department University of Wisconsin-Madison pfc@cs.wisc.edu pfc High-Throughput Computing With

www.cs.wisc.edu/condor

NUG30 - Solved!!!

Sender: [email protected] Subject: Re: Let the festivities begin.

Hi dear Condor Team,

you all have been amazing. NUG30 required 10.9 years of

Condor Time. In just seven days !

More stats tomorrow !!! We are off celebrating !

condor rules !

cheers,

JP.

Page 34: Peter Couvares Computer Sciences Department University of Wisconsin-Madison pfc@cs.wisc.edu pfc High-Throughput Computing With

www.cs.wisc.edu/condor

Conclusion

Computing power

is everywhere, we try to make it usable

by anyone.

Page 35: Peter Couvares Computer Sciences Department University of Wisconsin-Madison pfc@cs.wisc.edu pfc High-Throughput Computing With

www.cs.wisc.edu/condor

Need more info?

›Condor Web Page (http://www.cs.wisc.edu/condor)

›Peter Couvares ([email protected])