Upload
norman-harrison
View
217
Download
0
Tags:
Embed Size (px)
Citation preview
Introduction
IU has around 2000 Windows PCs in public Student Technology Centers
Condor is used to harvest unused cycles
Simple Message Brokering Library(SMBL) used for parallelizing applications on Windows
Web portal for user interaction
Project History
SETI@home Was used as initial test of Condor SMBL was created to address the lack of a
general purpose parallel library on Windows that could tolerate sporadically available systems
FastDNAml was ported to SMBL Web portal created Other apps ported to SMBL(MEME,BLAST)
System Architecture
Condor “server” running on Linux BLAST databases served via Samba
on a second Linux machine Apache/MySQL/PHP web portal Windows “clients”
What is SMBL?
Simple Message Brokering Library Open Source(http://smbl.sf.net) Uses master / worker model Process and Port Manager(PPM) manages
SMBL servers and master processes Number of master /foreman processes is
different for each application SMBL workers contact the SMBL master to
get work SMBL server terminates workers when
they are no longer needed
Condor and SMBL Condor is used as the scheduling and delivery
system for SMBL workers SMBL workers contact the SMBL server when they
start running to begin receiving work. SMBL server seperates the work to be into smaller
pieces depending on the number of workers Work is redistributed if a worker is “lost” SMBL server terminates workers when there is no
work left
Applications using SMBL
FastDNAml – Generates phylogenic trees from molecular data
MEME – Detects patterns in nucleotide and protein sequences
NCBI BLAST(blastall) – Query molecular sequences against sequence databases
The Challenges of porting BLAST to SMBL
BLAST relies on the availability of large database files Files too large for efficient delivery via
Condor Local copies of databases on pool
machines would be difficult to manage Sharing DB files via Samba is the best
solution Samba was moved to a seperate server to
increase perfomance
The Challenges of porting BLAST to SMBL(cont.)
BLAST jobs take more time to complete than FastDNAml and MEME Dissapearing worker problem
Pool machines would end up in CLAIMED/IDLE state Size of our Condor pool made the problem hard to
track Only jobs taking more than 30 minutes were
affected Problem was determined to be state table
“sessions” timing out on the machine room firewall. Machines were removed from firewall and switched
to host-based iptables firewall.
Web portal
Apache/MySQL/PHP based Jobs are submitted via portal ONLY Condor submit files are dynamically
generated based on user input Status of jobs can be checked using
the portal Results retrieved from the portal