Upload
osborne-derek-willis
View
217
Download
0
Embed Size (px)
Citation preview
Peter KellerComputer Sciences DepartmentUniversity of Wisconsin-Madison
[email protected]://www.cs.wisc.edu/condor
Quill TutorialCondor Week 2006
www.cs.wisc.edu/condor
What is Quill?
A non-invasive method of storing a read only version of the job queue and job historical data in a relational database.
www.cs.wisc.edu/condor
Why Do We Need It?
› Presents the job queue information as a set of tables in a relational database (Big Win!)
› Fault tolerance
› Provides performance enhancements in very large and busy pools
www.cs.wisc.edu/condor
Job Queue Management
Job Queue
schedd
quilld
Database
Job Queue
schedd
Without Quill With Quill
www.cs.wisc.edu/condor
Deployment
› One Quill daemon per schedd
› Quill daemons must be uniquely named
› Each Quill daemon uses a unique DB name
› Multiple Quill daemons may utilize one database server
› Currently uses PostgreSQL Recommend PostgreSQL 8.1 or later for
automatic vacuuming of tables
www.cs.wisc.edu/condor
Condor’s Interface to Quill
› Modified two tools to utilize the DB condor_q condor_history
› Very minor modifications to schedd
› Multiple sources for Job Queue & History pose an interesting problem
www.cs.wisc.edu/condor
Job Queue Discovery Sequence
(Local Query)
Job Queueschedd
quilldDatabase
condor_q
1 2
3
www.cs.wisc.edu/condor
Job Queue Discovery Sequence
(Remote Query)
Job Queueschedd
condor_q
collector
quilldDatabase
1 2
30
www.cs.wisc.edu/condor
A User Perspective: condor_q
› condor_q changes -name takes a ScheddName or
QuillName -avgqueuetime details average time
in queue for all jobs
www.cs.wisc.edu/condor
A User Perspective: condor_qExample: condor_q -name
Linux merlin > condor_q -name [email protected]
-- DB: [email protected] : <merlin.cs.wisc.edu:42999> : psilord_db ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 92.0 psilord 4/21 09:21 0+00:00:00 I 0 9.8 foo
1 jobs; 1 idle, 0 running, 0 held
www.cs.wisc.edu/condor
A User PerspectiveExample: condor_q -
avgqueuetimeLinux merlin > condor_q -avgqueuetime
-- DB: [email protected] : <merlin.cs.wisc.edu:42999> : psilord_db
Average time in queue for uncompleted jobs (in hh:mm:ss)00:40:47.011993
www.cs.wisc.edu/condor
Job History Discovery Sequence
(Local Query)
quilldDatabase
condor_history History File
The quilld is neverqueried directly!
Job Queue1
2
www.cs.wisc.edu/condor
Job History Discovery(Remote Query) NEW!
Job Queue
condor_history
quilldDatabase
The quilld is neverqueried directly!
History File
collector
1
0
www.cs.wisc.edu/condor
A User Perspective: condor_history
› condor_history changes -name takes a Quill Name to retrieve
job histories from a remote quill’s database
-completedsince returns all jobs completed since a PostgreSQL formatted date
www.cs.wisc.edu/condor
A User Perspective: condor_historyExample: condor_history -name
Linux merlin > condor_history -name [email protected]
-- DB: [email protected] : <merlin.cs.wisc.edu:42999> : psilord_db ID OWNER SUBMITTED RUN_TIME ST COMPLETED CMD 91.0 psilord 4/20 14:23 0+00:00:00 X ??? /scratch/psilor 92.0 psilord 4/21 09:21 0+00:00:00 X ??? /scratch/psilor 93.0 psilord 4/21 10:12 0+00:00:01 C 4/21 10:12 /scratch/psilor
www.cs.wisc.edu/condor
A User Perspective: condor_historyExample: condor_history -
completedsinceLinux merlin > condor_history -completedsince "2006-01-01 00:00:01"
-- DB: [email protected] : <merlin.cs.wisc.edu:42999> : psilord_db ID OWNER SUBMITTED RUN_TIME ST COMPLETED CMD 93.0 psilord 4/21 10:12 0+00:00:01 C 4/21 10:12 /scratch/psilor
www.cs.wisc.edu/condor
Short Circuiting the Discovery Sequence
› Use the –direct option!› Examples
condor_q –direct rdbms condor_q –direct quilld condor_q –direct schedd
› “rdbms”, “quilld”, and “schedd” are the actual parameters.
› Invaluable for debugging!
www.cs.wisc.edu/condor
PostgreSQL 8.1 Installation
› ./configure› gmake && gmake install› mkdir /path/to/pgsql/data› initdb –D /path/to/pgsql/data› postmaster –D /path/to/pgsql/data
› Note: Default port binding is 5432.
www.cs.wisc.edu/condor
PostgreSQL Configuration
› Add two special user accounts: quillreader and quillwriter createuser quillreader --no-createdb --no-adduser --pwprompt
createuser quillwriter --createdb --no-adduser --pwprompt
www.cs.wisc.edu/condor
PostgreSQL Configuration (cont)
› Allow TCP/IP connections Edit file postgresql.conf
• Add listen_address = '*'
› Allow connections from specific hosts Edit file pg_hba.conf
• host all quillreader 128.105.0.0 255.255.0.0 password• host all quillwriter 128.105.0.0 255.255.0.0 password
› Note: only use ‘password’ authentication at this time.
www.cs.wisc.edu/condor
Quill Configuration
› User quillwriter needs a write password.
› Store it in a file called .quillwritepassword in the $(SPOOL) directory.
› Ensure only the condor uid can read it if Condor is running as root
www.cs.wisc.edu/condor
Quill Configuration (cont)
› Condor system specific attributes in file condor_config.local QUILL = $(SBIN)/condor_quill QUILL_LOG = $(LOG)/QuillLog QUILL_ADDRESS_FILE = $(LOG)/.quill_address DAEMON_LIST = …, QUILL VALID_SPOOL_FILES = …, .quillwritepassword DC_DAEMON_LIST = …, QUILL
www.cs.wisc.edu/condor
Quill Configuration (cont)
› Quill specific attributes QUILL_ENABLED = TRUE # The quill name must be unique across all # quill daemons AND schedds QUILL_NAME = [email protected] QUILL_DB_NAME = psilord_db QUILL_DB_IP_ADDR = merlin.cs.wisc.edu:42999 QUILL_POLLING_PERIOD = 10 (seconds)
www.cs.wisc.edu/condor
Quill Configuration (cont)
› QUILL_HISTORY_CLEANING_INTERVAL = 24 (hours)› QUILL_HISTORY_DURATION = 30 (days)› QUILL_MANAGE_VACUUM = FALSE› QUILL_IS_REMOTELY_QUERYABLE = TRUE› QUILL_DB_QUERY_PASSWD = xxx
www.cs.wisc.edu/condor
DB Storage Method
› Schema designed to store and query classads 4 tables to represent the job queue classads 2 for history data 1 for metadata
› Some queries are easier than others
› Ask more questions at the BOF!
www.cs.wisc.edu/condor
Thank you!
› Want more information?
› BOF “Databases in Condor: Now and in the Future”