26
Peter Keller Computer Sciences Department University of Wisconsin-Madison [email protected] http://www.cs.wisc.edu/condor Quill Tutorial Condor Week 2006

Peter Keller Computer Sciences Department University of Wisconsin-Madison [email protected] Quill Tutorial Condor Week

Embed Size (px)

Citation preview

Page 1: Peter Keller Computer Sciences Department University of Wisconsin-Madison psilord@cs.wisc.edu  Quill Tutorial Condor Week

Peter KellerComputer Sciences DepartmentUniversity of Wisconsin-Madison

[email protected]://www.cs.wisc.edu/condor

Quill TutorialCondor Week 2006

Page 2: Peter Keller Computer Sciences Department University of Wisconsin-Madison psilord@cs.wisc.edu  Quill Tutorial Condor Week

www.cs.wisc.edu/condor

What is Quill?

A non-invasive method of storing a read only version of the job queue and job historical data in a relational database.

Page 3: Peter Keller Computer Sciences Department University of Wisconsin-Madison psilord@cs.wisc.edu  Quill Tutorial Condor Week

www.cs.wisc.edu/condor

Why Do We Need It?

› Presents the job queue information as a set of tables in a relational database (Big Win!)

› Fault tolerance

› Provides performance enhancements in very large and busy pools

Page 4: Peter Keller Computer Sciences Department University of Wisconsin-Madison psilord@cs.wisc.edu  Quill Tutorial Condor Week

www.cs.wisc.edu/condor

Job Queue Management

Job Queue

schedd

quilld

Database

Job Queue

schedd

Without Quill With Quill

Page 5: Peter Keller Computer Sciences Department University of Wisconsin-Madison psilord@cs.wisc.edu  Quill Tutorial Condor Week

www.cs.wisc.edu/condor

Deployment

› One Quill daemon per schedd

› Quill daemons must be uniquely named

› Each Quill daemon uses a unique DB name

› Multiple Quill daemons may utilize one database server

› Currently uses PostgreSQL Recommend PostgreSQL 8.1 or later for

automatic vacuuming of tables

Page 6: Peter Keller Computer Sciences Department University of Wisconsin-Madison psilord@cs.wisc.edu  Quill Tutorial Condor Week

www.cs.wisc.edu/condor

Condor’s Interface to Quill

› Modified two tools to utilize the DB condor_q condor_history

› Very minor modifications to schedd

› Multiple sources for Job Queue & History pose an interesting problem

Page 7: Peter Keller Computer Sciences Department University of Wisconsin-Madison psilord@cs.wisc.edu  Quill Tutorial Condor Week

www.cs.wisc.edu/condor

Job Queue Discovery Sequence

(Local Query)

Job Queueschedd

quilldDatabase

condor_q

1 2

3

Page 8: Peter Keller Computer Sciences Department University of Wisconsin-Madison psilord@cs.wisc.edu  Quill Tutorial Condor Week

www.cs.wisc.edu/condor

Job Queue Discovery Sequence

(Remote Query)

Job Queueschedd

condor_q

collector

quilldDatabase

1 2

30

Page 9: Peter Keller Computer Sciences Department University of Wisconsin-Madison psilord@cs.wisc.edu  Quill Tutorial Condor Week

www.cs.wisc.edu/condor

A User Perspective: condor_q

› condor_q changes -name takes a ScheddName or

QuillName -avgqueuetime details average time

in queue for all jobs

Page 10: Peter Keller Computer Sciences Department University of Wisconsin-Madison psilord@cs.wisc.edu  Quill Tutorial Condor Week

www.cs.wisc.edu/condor

A User Perspective: condor_qExample: condor_q -name

Linux merlin > condor_q -name [email protected]

-- DB: [email protected] : <merlin.cs.wisc.edu:42999> : psilord_db ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 92.0 psilord 4/21 09:21 0+00:00:00 I 0 9.8 foo

1 jobs; 1 idle, 0 running, 0 held

Page 11: Peter Keller Computer Sciences Department University of Wisconsin-Madison psilord@cs.wisc.edu  Quill Tutorial Condor Week

www.cs.wisc.edu/condor

A User PerspectiveExample: condor_q -

avgqueuetimeLinux merlin > condor_q -avgqueuetime

-- DB: [email protected] : <merlin.cs.wisc.edu:42999> : psilord_db

Average time in queue for uncompleted jobs (in hh:mm:ss)00:40:47.011993

Page 12: Peter Keller Computer Sciences Department University of Wisconsin-Madison psilord@cs.wisc.edu  Quill Tutorial Condor Week

www.cs.wisc.edu/condor

Job History Discovery Sequence

(Local Query)

quilldDatabase

condor_history History File

The quilld is neverqueried directly!

Job Queue1

2

Page 13: Peter Keller Computer Sciences Department University of Wisconsin-Madison psilord@cs.wisc.edu  Quill Tutorial Condor Week

www.cs.wisc.edu/condor

Job History Discovery(Remote Query) NEW!

Job Queue

condor_history

quilldDatabase

The quilld is neverqueried directly!

History File

collector

1

0

Page 14: Peter Keller Computer Sciences Department University of Wisconsin-Madison psilord@cs.wisc.edu  Quill Tutorial Condor Week

www.cs.wisc.edu/condor

A User Perspective: condor_history

› condor_history changes -name takes a Quill Name to retrieve

job histories from a remote quill’s database

-completedsince returns all jobs completed since a PostgreSQL formatted date

Page 15: Peter Keller Computer Sciences Department University of Wisconsin-Madison psilord@cs.wisc.edu  Quill Tutorial Condor Week

www.cs.wisc.edu/condor

A User Perspective: condor_historyExample: condor_history -name

Linux merlin > condor_history -name [email protected]

-- DB: [email protected] : <merlin.cs.wisc.edu:42999> : psilord_db ID OWNER SUBMITTED RUN_TIME ST COMPLETED CMD 91.0 psilord 4/20 14:23 0+00:00:00 X ??? /scratch/psilor 92.0 psilord 4/21 09:21 0+00:00:00 X ??? /scratch/psilor 93.0 psilord 4/21 10:12 0+00:00:01 C 4/21 10:12 /scratch/psilor

Page 16: Peter Keller Computer Sciences Department University of Wisconsin-Madison psilord@cs.wisc.edu  Quill Tutorial Condor Week

www.cs.wisc.edu/condor

A User Perspective: condor_historyExample: condor_history -

completedsinceLinux merlin > condor_history -completedsince "2006-01-01 00:00:01"

-- DB: [email protected] : <merlin.cs.wisc.edu:42999> : psilord_db ID OWNER SUBMITTED RUN_TIME ST COMPLETED CMD 93.0 psilord 4/21 10:12 0+00:00:01 C 4/21 10:12 /scratch/psilor

Page 17: Peter Keller Computer Sciences Department University of Wisconsin-Madison psilord@cs.wisc.edu  Quill Tutorial Condor Week

www.cs.wisc.edu/condor

Short Circuiting the Discovery Sequence

› Use the –direct option!› Examples

condor_q –direct rdbms condor_q –direct quilld condor_q –direct schedd

› “rdbms”, “quilld”, and “schedd” are the actual parameters.

› Invaluable for debugging!

Page 18: Peter Keller Computer Sciences Department University of Wisconsin-Madison psilord@cs.wisc.edu  Quill Tutorial Condor Week

www.cs.wisc.edu/condor

PostgreSQL 8.1 Installation

› ./configure› gmake && gmake install› mkdir /path/to/pgsql/data› initdb –D /path/to/pgsql/data› postmaster –D /path/to/pgsql/data

› Note: Default port binding is 5432.

Page 19: Peter Keller Computer Sciences Department University of Wisconsin-Madison psilord@cs.wisc.edu  Quill Tutorial Condor Week

www.cs.wisc.edu/condor

PostgreSQL Configuration

› Add two special user accounts: quillreader and quillwriter createuser quillreader --no-createdb --no-adduser --pwprompt

createuser quillwriter --createdb --no-adduser --pwprompt

Page 20: Peter Keller Computer Sciences Department University of Wisconsin-Madison psilord@cs.wisc.edu  Quill Tutorial Condor Week

www.cs.wisc.edu/condor

PostgreSQL Configuration (cont)

› Allow TCP/IP connections Edit file postgresql.conf

• Add listen_address = '*'

› Allow connections from specific hosts Edit file pg_hba.conf

• host all quillreader 128.105.0.0 255.255.0.0 password• host all quillwriter 128.105.0.0 255.255.0.0 password

› Note: only use ‘password’ authentication at this time.

Page 21: Peter Keller Computer Sciences Department University of Wisconsin-Madison psilord@cs.wisc.edu  Quill Tutorial Condor Week

www.cs.wisc.edu/condor

Quill Configuration

› User quillwriter needs a write password.

› Store it in a file called .quillwritepassword in the $(SPOOL) directory.

› Ensure only the condor uid can read it if Condor is running as root

Page 22: Peter Keller Computer Sciences Department University of Wisconsin-Madison psilord@cs.wisc.edu  Quill Tutorial Condor Week

www.cs.wisc.edu/condor

Quill Configuration (cont)

› Condor system specific attributes in file condor_config.local QUILL = $(SBIN)/condor_quill QUILL_LOG = $(LOG)/QuillLog QUILL_ADDRESS_FILE = $(LOG)/.quill_address DAEMON_LIST = …, QUILL VALID_SPOOL_FILES = …, .quillwritepassword DC_DAEMON_LIST = …, QUILL

Page 23: Peter Keller Computer Sciences Department University of Wisconsin-Madison psilord@cs.wisc.edu  Quill Tutorial Condor Week

www.cs.wisc.edu/condor

Quill Configuration (cont)

› Quill specific attributes QUILL_ENABLED = TRUE # The quill name must be unique across all # quill daemons AND schedds QUILL_NAME = [email protected] QUILL_DB_NAME = psilord_db QUILL_DB_IP_ADDR = merlin.cs.wisc.edu:42999 QUILL_POLLING_PERIOD = 10 (seconds)

Page 24: Peter Keller Computer Sciences Department University of Wisconsin-Madison psilord@cs.wisc.edu  Quill Tutorial Condor Week

www.cs.wisc.edu/condor

Quill Configuration (cont)

› QUILL_HISTORY_CLEANING_INTERVAL = 24 (hours)› QUILL_HISTORY_DURATION = 30 (days)› QUILL_MANAGE_VACUUM = FALSE› QUILL_IS_REMOTELY_QUERYABLE = TRUE› QUILL_DB_QUERY_PASSWD = xxx

Page 25: Peter Keller Computer Sciences Department University of Wisconsin-Madison psilord@cs.wisc.edu  Quill Tutorial Condor Week

www.cs.wisc.edu/condor

DB Storage Method

› Schema designed to store and query classads 4 tables to represent the job queue classads 2 for history data 1 for metadata

› Some queries are easier than others

› Ask more questions at the BOF!

Page 26: Peter Keller Computer Sciences Department University of Wisconsin-Madison psilord@cs.wisc.edu  Quill Tutorial Condor Week

www.cs.wisc.edu/condor

Thank you!

› Want more information?

› BOF “Databases in Condor: Now and in the Future”