Upload
tyne
View
45
Download
0
Tags:
Embed Size (px)
DESCRIPTION
SGE Training NASA LaRC ASDC Delivered May 5,6,7 2009 Chris Dwan Bioteam. Bioteam Inc. Independent Consulting Shop Vendor/technology agnostic Staffed by: Scientists forced to learn High Performance IT Many years of industry & academic experience - PowerPoint PPT Presentation
Citation preview
Bioteam Inc. Independent Consulting Shop
Vendor/technology agnostic Staffed by:
Scientists forced to learn High Performance IT Many years of industry & academic experience
Our specialty: Bridging the gap between Science & IT
Session Goals
Interactive / Small Group Goals 1 - 2 hours 1 – 5 people Users log into systems. Users type examples, run jobs. If code is available, bring it. If specific use cases exist, bring them.
Selected ASDC Systems
Selected ASDC Systems Apple Cluster
Online and in use at SCF since 2007 ~40 dual processor OS X systems (80+ CPUs) Access through manila and corregidor
Magneto ~28 quad core linux servers (100+ CPUs) Online and in production use since 2006
New Magneto (ORR May 15) Large, mixed purpose Linux cluster / file store 176 CPUs dedicated to SCF 576 CPUs dedicated to production Disk based archive: 1.1PB
NASA LaRC Science Directorate
Picture taken 9/2/08 1.2PB usable space Fibre connected (384+ fibre
ports) 2,560 individual disk drives
16 disks per chassis 10 chassis per rack 16 racks of disks
IBM Linux servers, mixed P6 and x86 CPUs to support legacy codes
Filesystem: IBM GPFS
Interactive hosts
Please do not copy, put online or redistribute [email protected]
Most “grids” look like this on paper…
Private Network
Local Area Network
Portal node(s)Dedicated File services
Compute Nodes
Please do not copy, put online or redistribute [email protected]
… and in reality:
Please do not copy, put online or redistribute [email protected]
… and in reality:
Please do not copy, put online or redistribute [email protected]
… and in reality:
Sun Grid Engine Historyhttp://blogs.sun.com/templedf/entry/a_little_history_lesson 1996:
Codine 4.02 Grid Resource Director (GRD) 1.0
2000: SGE 5.2. Sun acquires Gridware Inc.
2001: SGE 5.3. Sun releases source code Last version called GRD
2004: SGE(EE) vs. SGE N1GE vs. SGE
Sun Grid Engine References http://gridengine.sunsource.net/
Generally, the user manuals are awful
http://gridengine.info/ Very useful blog run by Chris Dagdigian
My slides / examples are going to be online in-house.
Deep, in house expertise.
Please do not copy, put online or redistribute [email protected]
Compute Farm Logical View
Cluster Network
User 1User 1 User NUser N
Distributed Resource Manager
Please do not copy, put online or redistribute [email protected]
Grid Engine does the following:
Accept work requests (jobs) from users Puts jobs in a pending area Sends jobs from the pending area to the
best available machine Manages the job while it runs Returns results, logs accounting data
when the job is finished
Please do not copy, put online or redistribute [email protected]
Huh? What you need to know:
Don’t worry about queues or specific machines. All you need to do when submitting a job is describe the resources your job will need to run successfully.
Grid Engine will take care of the rest The ‘default’ settings are good enough for
most cases
Please do not copy, put online or redistribute [email protected]
Most useful SGE commands qsub / qdel
Submit jobs & delete jobs qstat & qhost
Status info for queues, hosts and jobs qacct
Summary info and reports on completed job qrsh
Get an interactive shell on a cluster node Quickly run a command on a remote host
qmon Launch the X11 GUI interface
Examples
Live Examples Single job Single job with resource requirements Job dependency Task array job Demand a whole compute node Consumable resources