21
NSF Site Visit 2-23-2006 HYDRA Using Windows Desktop Systems in Distributed Parallel Computing

NSF Site Visit 2-23-2006 HYDRA Using Windows Desktop Systems in Distributed Parallel Computing

Embed Size (px)

Citation preview

Page 1: NSF Site Visit 2-23-2006 HYDRA Using Windows Desktop Systems in Distributed Parallel Computing

NSF Site Visit2-23-2006

HYDRAUsing Windows Desktop Systems in Distributed Parallel Computing

Page 2: NSF Site Visit 2-23-2006 HYDRA Using Windows Desktop Systems in Distributed Parallel Computing

Introduction…

• Windows desktop systems at IUB student labs– 2300 systems, 3 year replacement cycle– Pentium IV (>=1.6 GHz), 256/512/1024 MB

memory, 10/100 Mbps/GigE, Windows XP– More than 1.5 TF

NSF Site Visit2-23-2006

Page 3: NSF Site Visit 2-23-2006 HYDRA Using Windows Desktop Systems in Distributed Parallel Computing

Possibly Utilize Idle Cycles?

Red: total owner Blue: total idle Green: total Condor

NSF Site Visit2-23-2006

Page 4: NSF Site Visit 2-23-2006 HYDRA Using Windows Desktop Systems in Distributed Parallel Computing

Problem Description

• Once again... Windows desktop systems at IUB student labs:

– As a scientific resource

– Harvest idle cycles

NSF Site Visit2-23-2006

Page 5: NSF Site Visit 2-23-2006 HYDRA Using Windows Desktop Systems in Distributed Parallel Computing

Constraints

• Systems dedicated to students using desktop office applications — not parallel scientific computing – making their availability unpredictable and sporadic

• Microsoft Windows environment

• Daily software rebuild (updates)

NSF Site Visit2-23-2006

Page 6: NSF Site Visit 2-23-2006 HYDRA Using Windows Desktop Systems in Distributed Parallel Computing

What could these systems be used for?

• Many small computations and a few small messages– Foreman-worker– Parameter studies– Monte Carlo

• Goal: High Throughput Computing (not HPC)– Parallel runs of the aforementioned small computations

to make better use of resource– Parallel libraries – MPI, PVM, etc. – have constraints if

availability of resources is ephemeral i.e. not predictable

NSF Site Visit2-23-2006

Page 7: NSF Site Visit 2-23-2006 HYDRA Using Windows Desktop Systems in Distributed Parallel Computing

Solution• Simple Message Brokering Library (SMBL)

– Limited replacement for MPI• Both server and client library based on TCP socket

abstraction

– Porting from MPI is fairly straight forward

• Process and Port Manager (PPM) • Plus …

– Condor for job management, file transfer, no checkpointing or parallelism

– Web portal for job submission

NSF Site Visit2-23-2006

Page 8: NSF Site Visit 2-23-2006 HYDRA Using Windows Desktop Systems in Distributed Parallel Computing

The Big PictureWe’ll discuss each part in more detail next…

The shaded box indicates components hosted on multiple desktop computers

NSF Site Visit2-23-2006

Page 9: NSF Site Visit 2-23-2006 HYDRA Using Windows Desktop Systems in Distributed Parallel Computing

SMBL (Server)

• SMBL server maintains a dynamic pool of client process connections

• Worker job manager hides details of ephemeral workers at the application level

SMBL Rank Condor Assigned Node

0

(Foreman)

Wrubel Computing Center, sacramento

1 Chemistry Student Lab, computer_14

2 CS Student Lab, computer_8

3 Library, computer_6

SMBL Server Process Table for 4 CPU parallel session

NSF Site Visit2-23-2006

Page 10: NSF Site Visit 2-23-2006 HYDRA Using Windows Desktop Systems in Distributed Parallel Computing

SMBL (Server)

• SMBL server maintains a dynamic pool of client process connections

• Worker job manager hides details of ephemeral workers at the application level

SMBL Rank Condor Assigned Node

0

(Foreman)

Wrubel Computing Center, sacramento

1 Chemistry Student Lab, computer_14

2 Physics Student Lab, computer_11

3 Library, computer_6

SMBL Server Process Table for 4 CPU parallel session

NSF Site Visit2-23-2006

Page 11: NSF Site Visit 2-23-2006 HYDRA Using Windows Desktop Systems in Distributed Parallel Computing

SMBL (Client)

• Client library implements selected MPI-like calls– MPI_Send () SMBL_Send ()

– MPI_Recv () SMBL_Recv ()

• In charge of message delivery for each parallel process

NSF Site Visit2-23-2006

Page 12: NSF Site Visit 2-23-2006 HYDRA Using Windows Desktop Systems in Distributed Parallel Computing

Process and Port Manager (PPM)

• Starts the SMBL server and application processes on demand

• Assigns port/host to each parallel session• Directs workers to their servers

NSF Site Visit2-23-2006

Page 13: NSF Site Visit 2-23-2006 HYDRA Using Windows Desktop Systems in Distributed Parallel Computing

PPM with two SMBL servers (two parallel sessions)

SMBL Rank Condor Assigned Node

0 (Foreman) Wrubel Computing Center, sacramento

1 Chemistry Student Lab, computer_14

2 CS Student Lab, computer_8

3 Wells Library, computer_6

0 (Foreman) Wrubel Computing Center, sacramento

1 Wells Library, computer_27

2 Biology Student Lab, computer_4

3 CS Student Lab, computer_2

PPM (cont’d ...)

NSF Site Visit2-23-2006

Parallel Session 1

Parallel Session 2

Page 14: NSF Site Visit 2-23-2006 HYDRA Using Windows Desktop Systems in Distributed Parallel Computing

Once again … the big picture

The shaded box indicates components hosted on multiple desktop computers

NSF Site Visit2-23-2006

Page 15: NSF Site Visit 2-23-2006 HYDRA Using Windows Desktop Systems in Distributed Parallel Computing

Recent Development

• Hydra cluster Teragrid enabled! (Nov 2005)– Allow TG users to use resource– Virtual Host based solution – two different

URLs for IU and Teragrid users– Teragrid users authenticate against PSC’s

Kerberos server

NSF Site Visit2-23-2006

Page 16: NSF Site Visit 2-23-2006 HYDRA Using Windows Desktop Systems in Distributed Parallel Computing

• PPM, SMBL server, Condor and web portal running on Linux server– Dual Intel Xeon 3.0 GHz, 4 GB memory,

GigE

• Second Linux server running Samba to serve BLAST database

System Layout

NSF Site Visit2-23-2006

Page 17: NSF Site Visit 2-23-2006 HYDRA Using Windows Desktop Systems in Distributed Parallel Computing

Portal

• Creates and submits Condor files, handles data files

• Apache/PHP based• Kerberos authentication

• URLs:– http://hydra.indiana.edu (IU users)– http://hydra.iu.teragrid.org (Teragrid users)

NSF Site Visit2-23-2006

Page 18: NSF Site Visit 2-23-2006 HYDRA Using Windows Desktop Systems in Distributed Parallel Computing

Utilization of Idle Cycles

Red: total owner Blue: total idle Green: total Condor

NSF Site Visit2-23-2006

Page 19: NSF Site Visit 2-23-2006 HYDRA Using Windows Desktop Systems in Distributed Parallel Computing

Summary

• Large parallel computing facility created at a low cost– SMBL parallel message passing library that can deal

with ephemeral resources– PPM port broker that can handle multiple parallel

sessions

• SMBL Homepage– http://smbl.sourceforge.net (Open Source)

NSF Site Visit2-23-2006

Page 20: NSF Site Visit 2-23-2006 HYDRA Using Windows Desktop Systems in Distributed Parallel Computing

Links and References

• Hydra Portal– http://hydra.indiana.edu (IU users)– http://hydra.iu.teragrid.org (Teragrid users)

• SMBL home page: http://smbl.sourceforge.net• Condor home page:

http://www.cs.wisc.edu/condor/• IU Teragrid home page – http://iu.teragrid.org

NSF Site Visit2-23-2006

Page 21: NSF Site Visit 2-23-2006 HYDRA Using Windows Desktop Systems in Distributed Parallel Computing

Links and References (cont’d..)

• Parallel FastDNAml: http://www.indiana.edu/~rac/hpc/fastDNAml

• Blast: http://www.ncbi.nlm.nih.gov/BLAST• Meme: http://meme.sdsc.edu/meme/intro.html

NSF Site Visit2-23-2006