View
45
Download
0
Category
Tags:
Preview:
DESCRIPTION
SGE Training NASA LaRC ASDC Delivered May 5,6,7 2009 Chris Dwan Bioteam. Bioteam Inc. Independent Consulting Shop Vendor/technology agnostic Staffed by: Scientists forced to learn High Performance IT Many years of industry & academic experience - PowerPoint PPT Presentation
Citation preview
SGE TrainingNASA LaRC ASDC
Delivered May 5,6,7 2009Chris Dwan
Bioteam
cdwan@bioteam.net
Bioteam Inc. Independent Consulting Shop
Vendor/technology agnostic Staffed by:
Scientists forced to learn High Performance IT Many years of industry & academic experience
Our specialty: Bridging the gap between Science & IT
cdwan@bioteam.net
Session Goals
cdwan@bioteam.net
Interactive / Small Group Goals 1 - 2 hours 1 – 5 people Users log into systems. Users type examples, run jobs. If code is available, bring it. If specific use cases exist, bring them.
cdwan@bioteam.net
Selected ASDC Systems
cdwan@bioteam.net
Selected ASDC Systems Apple Cluster
Online and in use at SCF since 2007 ~40 dual processor OS X systems (80+ CPUs) Access through manila and corregidor
Magneto ~28 quad core linux servers (100+ CPUs) Online and in production use since 2006
New Magneto (ORR May 15) Large, mixed purpose Linux cluster / file store 176 CPUs dedicated to SCF 576 CPUs dedicated to production Disk based archive: 1.1PB
cdwan@bioteam.net
Apple Cluster Access:
LDAP account manila or corregidor
cdwan@bioteam.net
NASA LaRC Science Directorate
cdwan@bioteam.net
Picture taken 9/2/08 1.2PB usable space Fibre connected (384+ fibre
ports) 2,560 individual disk drives
16 disks per chassis 10 chassis per rack 16 racks of disks
IBM Linux servers, mixed P6 and x86 CPUs to support legacy codes
Filesystem: IBM GPFS
Operational Readiness ReviewMid May 2009
Stay Tuned
cdwan@bioteam.net
cdwan@bioteam.net
cdwan@bioteam.net
cdwan@bioteam.net
Interactive hosts
cdwan@bioteam.net
Sun Grid Engine
Technical Introduction
cdwan@bioteam.net
Please do not copy, put online or redistribute info@bioteam.net
Most “grids” look like this on paper…
Private Network
Local Area Network
Portal node(s)Dedicated File services
Compute Nodes
Please do not copy, put online or redistribute info@bioteam.net
… and in reality:
Please do not copy, put online or redistribute info@bioteam.net
… and in reality:
Please do not copy, put online or redistribute info@bioteam.net
… and in reality:
Sun Grid Engine Historyhttp://blogs.sun.com/templedf/entry/a_little_history_lesson 1996:
Codine 4.02 Grid Resource Director (GRD) 1.0
2000: SGE 5.2. Sun acquires Gridware Inc.
2001: SGE 5.3. Sun releases source code Last version called GRD
2004: SGE(EE) vs. SGE N1GE vs. SGE
cdwan@bioteam.net
Sun Grid Engine References http://gridengine.sunsource.net/
Generally, the user manuals are awful
http://gridengine.info/ Very useful blog run by Chris Dagdigian
My slides / examples are going to be online in-house.
Deep, in house expertise.
cdwan@bioteam.net
Please do not copy, put online or redistribute info@bioteam.net
Compute Farm Logical View
Cluster Network
User 1User 1 User NUser N
Distributed Resource Manager
Please do not copy, put online or redistribute info@bioteam.net
Grid Engine does the following:
Accept work requests (jobs) from users Puts jobs in a pending area Sends jobs from the pending area to the
best available machine Manages the job while it runs Returns results, logs accounting data
when the job is finished
Please do not copy, put online or redistribute info@bioteam.net
Huh? What you need to know:
Don’t worry about queues or specific machines. All you need to do when submitting a job is describe the resources your job will need to run successfully.
Grid Engine will take care of the rest The ‘default’ settings are good enough for
most cases
Please do not copy, put online or redistribute info@bioteam.net
Most useful SGE commands qsub / qdel
Submit jobs & delete jobs qstat & qhost
Status info for queues, hosts and jobs qacct
Summary info and reports on completed job qrsh
Get an interactive shell on a cluster node Quickly run a command on a remote host
qmon Launch the X11 GUI interface
Examples
cdwan@bioteam.net
Live Examples Single job Single job with resource requirements Job dependency Task array job Demand a whole compute node Consumable resources
cdwan@bioteam.net
Recommended