Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
CSC 7600 Lecture 3 : Commodity Clusters,Spring 2011
HIGH PERFORMANCE COMPUTING: MODELS, METHODS, &
MEANS
COMMODITY CLUSTERS
Prof. Thomas Sterling
Department of Computer Science
Louisiana State University
January 25, 2011
CSC 7600 Lecture 3 : Commodity Clusters,Spring 2011
Topics
• Introduction to Commodity Clusters
• A brief history of Cluster computing
• Dominance of Clusters
• Core systems elements of Clusters
• SMP Nodes
• Operating Systems
• DEMO 1 : Arete Cluster Environment
• Throughput Computing
• Networks
• Resource Management / Scheduling Systems
• Message-passing/Cooperative programming model
• Cluster programming/application runtime environment
• Performance measurement & profiling of applications
• Summary Materials for Test
2
CSC 7600 Lecture 3 : Commodity Clusters,Spring 2011
Topics
• Introduction to Commodity Clusters
• A brief history of Cluster computing
• Dominance of Clusters
• Core systems elements of Clusters
• SMP Nodes
• Operating Systems
• DEMO 1 : Arete Cluster Environment
• Throughput Computing
• Networks
• Resource Management / Scheduling Systems
• Message-passing/Cooperative programming model
• Cluster programming/application runtime environment
• Performance measurement & profiling of applications
• Summary Materials for Test
3
CSC 7600 Lecture 3 : Commodity Clusters,Spring 2011
4
What is a Commodity Cluster
• It is a distributed/parallel computing system
• It is constructed entirely from commodity subsystems
– All subcomponents can be acquired commercially and separately
– Computing elements (nodes) are employed as fully operational
standalone mainstream systems
• Two major subsystems:
– Compute nodes
– System area network (SAN)
• Employs industry standard interfaces for integration
• Uses industry standard software for majority of services
• Incorporates additional middleware for interoperability among
elements
• Uses software for coordinated programming of elements in parallel
CSC 7600 Lecture 3 : Commodity Clusters,Spring 2011
6
Earth Simulator and
TSUBAME
CSC 7600 Lecture 3 : Commodity Clusters,Spring 2011
7
Red Sky
• One of the largest clusters in the
world (located in Sandia National
Laboratories, USA)
• Sun Blade x6275 system family
• 41616 Cores
• Intel EM64T Xeon X55xx (Nehalem-
EP) 2930 MHz (11.72 GFlops)
• 22104 GB main memory
• Number 10 on TOP500
• Infiniband interconnection
• Peak perforamnce:
487 Tflops
• R_max:
423 Tflops
CSC 7600 Lecture 3 : Commodity Clusters,Spring 2011
8
Commodity Clusters vs “Constellations”
16X16X
16X 16X
System Area Network
64 Processor Constellation
64 Processor Commodity Cluster
4X
4X
4X
4X
4X 4X 4X 4X
4X
4X
4X
4X
4X 4X 4X 4X
System Area Network
• An ensemble of N nodes each comprising p computing elements
• The p elements are tightly bound shared memory (e.g., smp, dsm)
• The N nodes are loosely coupled, i.e., distributed memory
• p is greater than N
• Distinction is which layer gives us the most power through parallelism
CSC 7600 Lecture 3 : Commodity Clusters,Spring 2011
9
Columbia
• NASA’s largest computer
• NASA Ames Research Center
• A Constellation
– 20 nodes
– SGI Altix 512 processor nodes
– Total: 10,240 Intel Itanium-2
processors
• 400 Terabytes of RAID
• 2.5 Petabytes of silo farm tape
storage
CSC 7600 Lecture 3 : Commodity Clusters,Spring 2011
Topics
• Introduction to Commodity Clusters
• A brief history of Cluster computing
• Dominance of Clusters
• Core systems elements of Clusters
• SMP Nodes
• Operating Systems
• DEMO 1 : Arete Cluster Environment
• Throughput Computing
• Networks
• Resource Management / Scheduling Systems
• Message-passing/Cooperative programming model
• Cluster programming/application runtime environment
• Performance measurement & profiling of applications
• Summary Materials for Test
10
CSC 7600 Lecture 3 : Commodity Clusters,Spring 2011
11
A Brief History of Clusters
• 1957 – SAGE by IBM & MIT-LL for Airforce NORAD
• 1976 -- Ethernet
• 1984 – Cluster of 160 Apollo workstations by NSA
• 1985 – M31 Andromeda by DEC, 32 VAX 11/750
• 1986 – Production Condor cluster operational
• 1990 – PVM released
• 1993 – First NOW workstation cluster at UC Berkeley
• 1993 – Myrinet introduced
• 1994 – First Beowulf PC cluster at NASA Goddard
• 1994 – MPI standard
• 1996 – >1Gflops
• 1997 – Gordon Bell Prize for Price-Performance
• 1997 – Berkeley NOW first cluster on Top-500
• 1997 -- >10 Gflops
• 1998 – Avalon by LANL on Top500 list
• 1999 -- >100 Gflops
• 2000 – Compaq and PSC awarded 5 Tflops by NSF
CSC 7600 Lecture 3 : Commodity Clusters,Spring 2011
12
UC-Berkeley NOW Project
• NOW-1 1995
• 32-40 SparcStation 10s and
20s
• originally ATM
• first large myrinet network
NOW-2 1997
100+ Ultra Sparc 170s
128 MB, 2 2GB disks, ethernet, myrinet
largest Myrinet configuration in the world
First cluster on the TOP500 list
CSC 7600 Lecture 3 : Commodity Clusters,Spring 2011
13
NOW Accomplishments
• Early prototypes in 1993 & 1994
• First Inktomi
• Complete Glunix + virtual network environment– able to page many processes onto dedicated
user-level network resources
• NPACI production resource since 1998
• Active Messages demonstrates user level communication in full Unix environment
• First cluster on the TOP500 list
• Set all Parallel Disk-disk sort records (2 yrs)– 500 MB/s disk bandwidth
– 1,000 MB/s network bandwidth
• Basis for studies in novel OS structures
Minute Sort
SGI Power
Challenge
SGI Orgin
0
1
2
3
4
5
6
7
8
9
0 10 20 30 40 50 60 70 80 90 100
Processors
Gig
ab
yte
s s
ort
ed
CSC 7600 Lecture 3 : Commodity Clusters,Spring 2011
14
NASA Beowulf Project
Wiglaf - 1994
16 Intel 80486 100 MHz
VESA Local bus
256 Mbytes memory
6.4 Gbytes of disk
Dual 10 base-T Ethernet
72 Mflops sustained
$40K
Hrothgar - 1995
16 Intel Pentium100 MHz
PCI
1 Gbyte memory
6.4 Gbytes of disk
100 base-T Fast Ethernet
(hub)
240 Mflops sustained
$46K
Hyglac-1996 (Caltech)
16 Pentium Pro 200 MHz
PCI
2 Gbytes memory
49.6 Gbytes of disk
100 base-T Fast Ethernet
(switch)
1.25 Gflops sustained
$50K
CSC 7600 Lecture 3 : Commodity Clusters,Spring 2011
15
Beowulf Accomplishments
• An experiment in parallel computing systems
• Established vision low-cost HPC
• Demonstrated effectiveness of PC clusters for some classes of applications
• Provided networking software in Linux
• Mass Storage with PVFS
• Provided cluster management tools
• Achieved >10 Gflops performance
• Gordon Bell Prize for Price-Performance
• Conveyed findings to broad community
• Tutorials and the book
• Provided design standard to rally community
• Spin-off of Scyld Computing Corp.
Hive at GSFC
Naegling at Caltech CACR
CSC 7600 Lecture 3 : Commodity Clusters,Spring 2011
16
“Do it Yourself Supercomputers”
• Synthesis of just-ready hardware/software elements
• Narrow window of opportunity
• PCs just capable of a few Mflops
• Ethernet LAN (10 base-T) just cheap enough
• A cost constrained requirement with funding
• An open source Unix, albeit immature
• Experience with clustering
• A stable message passing library
• Talent availability to fill the gaps
• Willingness to win or fail
• Modest and well defined goals, vision, and path
CSC 7600 Lecture 3 : Commodity Clusters,Spring 2011
17
Dominance of Clusters in HPC
• Every major HPC vendor (but 1) has a
cluster product
– IBM
– HP
– SUN
– NEC
– Fujitsu
– SGI
– Cray
• Additional vendors dedicated to clusters
– Penguin
– Dell
CSC 7600 Lecture 3 : Commodity Clusters,Spring 2011
Topics
• Introduction to Commodity Clusters
• A brief history of Cluster computing
• Dominance of Clusters
• Core systems elements of Clusters
• SMP Nodes
• Operating Systems
• DEMO 1 : Arete Cluster Environment
• Throughput Computing
• Networks
• Resource Management / Scheduling Systems
• Message-passing/Cooperative programming model
• Cluster programming/application runtime environment
• Performance measurement & profiling of applications
• Summary Materials for Test
18
CSC 7600 Lecture 3 : Commodity Clusters,Spring 2011
19
Clusters Dominate Top-500
CSC 7600 Lecture 3 : Commodity Clusters,Spring 2011
20
Why are Clusters so Prevalent
• Excellent performance to cost for many workloads– Exploits economy of scale
• Mass produced device types
• Mainstream standalone subsystems
– Many competing vendors for similar products
• Just in place configuration– Scalable up and down
– Flexible in configuration
• Rapid tracking of technology advance– First to exploit newest component types
• Programmable– Uses industry standard programming languages and tools
• User empowerment• Low cost, ubiquitous systems
• Programming systems make it relatively easy to program for expert users
CSC 7600 Lecture 3 : Commodity Clusters,Spring 2011
21
1st printing: May, 1999
2nd printing: Aug. 1999
MIT Press
CSC 7600 Lecture 3 : Commodity Clusters,Spring 2011
Topics
• Introduction to Commodity Clusters
• A brief history of Cluster computing
• Dominance of Clusters
• Core systems elements of Clusters
• SMP Nodes
• Operating Systems
• DEMO 1 : Arete Cluster Environment
• Throughput Computing
• Networks
• Resource Management / Scheduling Systems
• Message-passing/Cooperative programming model
• Cluster programming/application runtime environment
• Performance measurement & profiling of applications
• Summary Materials for Test
22
CSC 7600 Lecture 3 : Commodity Clusters,Spring 2011
23
What You Need to Know about Clusters
• Key system elements
– SMP Node
– Interconnect Networks
– Operating Systems
– Resource Management / Scheduling systems
• Programming & Runtime environment
– Message-passing/Cooperative programming model
– Programming languages & compilers, debuggers
• Performance Measurement & Profiling
– How is performance effected
– How to measure how well the applications behave
– How to optimize application behavior
CSC 7600 Lecture 3 : Commodity Clusters,Spring 2011
24
Key Parameters for Cluster Computing
• Peak floating point performance
• Sustained floating point performance
• Main memory capacity
• Bi-section bandwidth
• I/O bandwidth
• Secondary storage capacity
• Organization– Processor architecture
– # processors per node
– # nodes
– Accelerators
– Network topology
• Logistical Issues– Power Consumption
– HVAC / Cooling
– Floor Space (Sq. Ft)
CSC 7600 Lecture 3 : Commodity Clusters,Spring 2011
25
Where’s the Parallelism
• Inter-node
– Multiple nodes
– Primary level for commodity clusters
– Secondary level for constellations
• Multi socket, intra-node
– Routinely 1, 2, 4, 8
– Heterogeneous computing with accelerators
• Multi-core, intra-socket
– 2, 4 cores per socket
• Multi-thread, intra-core
– None or two usually
• ILP, intra-core
– Multiple operations issued per instruction
• Out of order, reservation stations
• Prefetching
• Accelerators
CSC 7600 Lecture 3 : Commodity Clusters,Spring 2011
26
Cluster System
MPL1L2
MPL1L2L3
MPL1L2
MPL1L2L3
M1
M1
Mn-1
Controller
S
S
NIC
NIC
MPL1L2
MPL1L2L3
MPL1L2
MPL1L2L3
M1
M1
Mn-1
Controller
S
S
NIC
NIC
MPL1L2
MPL1L2L3
MPL1L2
MPL1L2L3
M1
M1
Mn-1
Controller
S
S
NIC
NIC
MPL1L2
MPL1L2L3
MPL1L2
MPL1L2L3
M1
M1
Mn-1
Controller
S
S
NIC
NIC
Resource management & scheduling subsystem
Login & Cluster Access
Co
mp
ute
No
des
Interco
nn
ect N
etwo
rk
CSC 7600 Lecture 3 : Commodity Clusters,Spring 2011
27
Constituent Hardware Elements
• Compute Nodes (“nodes”)
– Standalone mainstream products
– Processors and accelerators
– Memory and caches
– Chip set
– Interfaces
• System Area Network(s)
– Network interface controllers (NIC)
– Switches
– Cables
• External I/O
– File system
– Internet access
– User interface
– Management and administration
CSC 7600 Lecture 3 : Commodity Clusters,Spring 2011
Topics
• Introduction to Commodity Clusters
• A brief history of Cluster computing
• Dominance of Clusters
• Core systems elements of Clusters
• SMP Nodes
• Operating Systems
• DEMO 1 : Arete Cluster Environment
• Throughput Computing
• Networks
• Resource Management / Scheduling Systems
• Message-passing/Cooperative programming model
• Cluster programming/application runtime environment
• Performance measurement & profiling of applications
• Summary Materials for Test
28
CSC 7600 Lecture 3 : Commodity Clusters,Spring 2011
29
Microprocessor Clock Rate
CSC 7600 Lecture 3 : Commodity Clusters,Spring 2011
31
Compute Node Diagram
MP
L1L2
MP
L1L2
L3
MP
L1L2
MP
L1L2
L3
M0 M1 Mn-1
Controller
S
S
NIC NICUSBPeripherals
JTAG
Legend : MP : MicroProcessorL1,L2,L3 : CachesM1.. : Memory BanksS : StorageNIC : Network Interface Card
Ethernet
PCI-e
CSC 7600 Lecture 3 : Commodity Clusters,Spring 2011
33
Parameters for Cluster Nodes
• Processor architecture family (AMD Opteron, Intel Xeon, IBM Power)• Number of processor chips (2)• Number of processor cores per chip (multicore) (3-4)• Memory capacity per processor chip (2 GBytes per core)• Processor core clock rate (3)
– GHz
• Operations per instruction issue, ILP (2 – 4 floating point operations)• Cache size per core (L1, L2, L3)• Distributed or shared memory (SMP) structure
– Cache coherent?
• Number and class of network ports• Latency to main memory (100 – 400 cycles)
– Measured in processor clock cycles
• Disk spindles and capacity (0, 1, or 2)• Ancillary I/O ports• Packaging issues
– Power– Size (1 to 4 u) (http://en.wikipedia.org/wiki/Rack_unit)– Cost
CSC 7600 Lecture 3 : Commodity Clusters,Spring 2011
Topics
• Introduction to Commodity Clusters
• A brief history of Cluster computing
• Dominance of Clusters
• Core systems elements of Clusters
• SMP Nodes
• Operating Systems
• DEMO 1 : Arete Cluster Environment
• Throughput Computing
• Networks
• Resource Management / Scheduling Systems
• Message-passing/Cooperative programming model
• Cluster programming/application runtime environment
• Performance measurement & profiling of applications
• Summary Materials for Test
34
CSC 7600 Lecture 3 : Commodity Clusters,Spring 2011
35
The History of Linux
• Started out with Linus' frustation on available affordable operating
systems for the PC
• He put together a rudimentary scheduler, and later added on more
features until he could bootstrap the kernel (1991).
• The source was released on the internet in hope that more people
would contribute to the kernel
• GCC was ported, a C library was added and a primitive serial and
tty driver code
• Networks, file systems were added
• Slackware
• RedHat
• Extreme Linux
CSC 7600 Lecture 3 : Commodity Clusters,Spring 2011
36
Open Source Software
• Evolution of PC Clusters has benefited from Open Source Software
• Early examples
– Gnu compiler tools, FreeBSD, Linux, PVM
• Advantages
– Provides shared infrastructure – avoids duplication of effort
– Permits wide collaborations
– Facilitates exploratory studies and innovation
• Free software is not necessarily OSS
• Business model in state of flux: how to fund free deliverables
• Important synergy between OSS standard infrastructure software and
proprietary ISV target-specific software:
– OSS provides common framework
– For-profit software provides incentive and resources
CSC 7600 Lecture 3 : Commodity Clusters,Spring 2011 37
Linux DistributionsAlphanet Linux
Alzza Linux
Andrew Linux
Apokalypse
Armed Linux
ASPLinux
Bad Penguin
Bastille Linux
Best Linux
BlackCat Linux
Blue Linux
Bluecat Linux
BluePoint Linux
Brutalware
Caldera OpenLinux
Cclinux
ChainSaw Linux
CLEClIeNUX
Conectiva
CoolLinux
Coyote Linux
Corel
COX-Linux
Darkstar Linux
Debian Definite
Linux
deepLINUX
Delix
Dlite (Debian Lite)
DragonLinux
Eagle Linux M68K
easyLinux
Elfstone Linux
Embedix
Enoch
Eonova Linux
ESware
Etlinux
Eurielec Linux
FinnixFloppi Gentoo
Linux
Gentus Linux
Green Frog Linux
Halloween Linux
Hard Hat Linux
HispaFuentes
HVLinux
Icepack
Immunix
OSIndependence
InfoMagick Workgroup
Server
Ivrix
ix86 Linux
JBLinux
Jurix Linux
Kondara
Krud
KW Linux
KSI Linux
L13Plus
Laser5
Leetnux
Lightening
Linpus Linux
Linux Antarctica
Linux by Linux
Linux GT Server Edition
Linux Mandrake
Linux MX
LinuxOne
LinuxPPC
LinuxPPP
LinuxSIS
LinuxWare
Linux-YeS
LNX System
Lunet
LuteLinux
LST
Mastodon
MaxOS&trade
MIZI Linux OS
MkLinux
MNIS Linux
MicroLinux
Monkey Linux
NeoLinux
Newlix OfficeServer
NoMad Linux
Ocularis
Open Kernel Linux
Open Share Linux
OS2000
Peanut Linux
PhatLINUX
PingOO
Plamo Linux
Platinum Linux
Power Linux
Progeny Debian
Project Freesco
Prosa Debian
Pygmy Linux
Red Flag Linux
Red Hat Linux
Redmond Linux
Rock Linux
RT-Linux
Scrudge Ware
Secure Linux
Skygate Linux
Slacknet Linux
Slackware
Slinux
SOT Linux
Spiro
Stampede Linux
Storm Linux
S.u.SE
Thin Linux
TINY Linux
Trinux
Trustix Secure Linux
TurboLinux
Turquaz
UltraPenguin
Ute-Linux
VA-enhanced RedHat Linux
VectorLinux
Vedova Linux
Vine Linux
White Dwarf Linux
Whole Linux
WinLinux 2000
WorkGroup Solutions
Linux Pro Plus
Xdenu
Xpresso Linux 2000
XTeam Linux
Yellow Dog Linux
Yggdrasil Linux
ZiiF Linux
ZipHam
ZipSlack
CSC 7600 Lecture 3 : Commodity Clusters,Spring 2011
38
Operating System
• What is an Operating System?
– A program that controls the execution of application programs
– An interface between applications and hardware
• Primary functionality
– Exploits the hardware resources of one or more processors
– Provides a set of services to system users
– Manages secondary memory and I/O devices
• Objectives
– Convenience: Makes the computer more convenient to use
– Efficiency: Allows computer system resources to be used in an
efficient manner
– Ability to evolve: Permit effective development, testing, and
introduction of new system functions without interfering with service
Source: William Stallings “Operating Systems: Internals and Design Principles (5th Edition)”
CSC 7600 Lecture 3 : Commodity Clusters,Spring 2011
39
Services Provided by the OS
• Program development
– Editors and debuggers
• Program execution
• Access to I/O devices
• Controlled access to files
• System access
• Protection
• Error detection and response
– Internal and external hardware errors
– Software errors
– Operating system cannot grant request of application
• Accounting
CSC 7600 Lecture 3 : Commodity Clusters,Spring 2011
40
Layers of Computer System
CSC 7600 Lecture 3 : Commodity Clusters,Spring 2011
41
Resources Managed by the OS
• Processor
• Main Memory
– volatile
– referred to as real memory or primary memory
• I/O modules
– secondary memory devices
– communications equipment
– terminals
• System bus
– communication among processors, memory, and I/O modules
CSC 7600 Lecture 3 : Commodity Clusters,Spring 2011
Topics
• Introduction to Commodity Clusters
• A brief history of Cluster computing
• Dominance of Clusters
• Core systems elements of Clusters
• SMP Nodes
• Operating Systems
• DEMO 1 : Arete Cluster Environment
• Throughput Computing
• Networks
• Resource Management / Scheduling Systems
• Message-passing/Cooperative programming model
• Cluster programming/application runtime environment
• Performance measurement & profiling of applications
• Summary Materials for Test
42
CSC 7600 Lecture 3 : Commodity Clusters,Spring 2011
Topics
• Introduction to Commodity Clusters
• A brief history of Cluster computing
• Dominance of Clusters
• Core systems elements of Clusters
• SMP Nodes
• Operating Systems
• DEMO 1 : Arete Cluster Environment
• Throughput Computing
• Networks
• Resource Management / Scheduling Systems
• Message-passing/Cooperative programming model
• Cluster programming/application runtime environment
• Performance measurement & profiling of applications
• Summary Materials for Test
43
CSC 7600 Lecture 3 : Commodity Clusters,Spring 2011
44
Programming on Clusters
• Several ways of programming application on clusters− Throughput – jobstream
− Decoupled Work Queue Model – SPMD for parameter studies
− Communicating Sequential Processes (CSP)
− Multi threaded
• Throughput: job stream– PBS, Maui
• Decoupled Work Queue Model : SPMD, e.g. parametric studies– Condor
• Communicating Sequential Processes– Message passing
– Distributed memory
– Global barrier synchronization
– e.g., MPI
• Multi threaded– Limited to intra-node programming
– Shared memory
– e.g., OpenMP
CSC 7600 Lecture 3 : Commodity Clusters,Spring 2011
Throughput Computing
• Simplest form of parallel computing
• Separate jobs on separate compute nodes
– Independent tasks on independent nodes
• No intra application / cross node communication
• “job stream” workflow
• Capacity computing
– Distinguished from cooperative and capability computing
– Scaling dependent on number of concurrent jobs
• Performance
– Throughput
– Total aggregate operations per second achieved
• Widely used for servers
45
CSC 7600 Lecture 3 : Commodity Clusters,Spring 2011
Decoupled Work Queue Model
• Concurrent disjoint tasks
• Parametric Studies
– SPMD (single program multiple data)
• Very coarse grained
• Example software package : Condor
• Processor farms and clusters
• Throughput Computing Lecture covers this model of
parallelism in greater depth
46
CSC 7600 Lecture 3 : Commodity Clusters,Spring 2011
Topics
• Introduction to Commodity Clusters
• A brief history of Cluster computing
• Dominance of Clusters
• Core systems elements of Clusters
• SMP Nodes
• Operating Systems
• DEMO 1 : Arete Cluster Environment
• Throughput Computing
• Networks
• Resource Management / Scheduling Systems
• Message-passing/Cooperative programming model
• Cluster programming/application runtime environment
• Performance measurement & profiling of applications
• Summary Materials for Test
47
CSC 7600 Lecture 3 : Commodity Clusters,Spring 2011 48
Some Node Interconnect Options
• Current Generation
– Gigabit Ethernet (~1000 Mb/s)
– 10 Gigabit Ethernet
– 40 Gigabit Ethernet and 100 Gigabt Ethernet (100GbE)
standards are in draft as of 2009
– Infiniband (IBA)
• Previous Generation
– Fast Ethernet (~100 Mb/s)
– Myricom’s Myrinet-2000 (~1600 Mb/s)
– SCI (~4000 Mb/s)
– OC-12 ATM (~622 Mb/s)
– Fiber Channel (~100 MB/s)
– USB (12 Mb/s)
– Firewire (IEEE 1394 400 Mb/s)
CSC 7600 Lecture 3 : Commodity Clusters,Spring 2011
49
Fast and Gigabit Ethernet
• Cost effective
• Lucent, 3com, Cisco, etc.
• Directly leverage LAN technology and market
• Up to 384 100 Mbps ports in one switch
• Switches can be stacked on connected with multiple gigabit links
• 100 Base-T:– Bandwidth: > 11 MB/s
– Latency: < 90 microseconds
• 1000 Base-T:– Bandwidth: ~ 50 MB/s
– Latency: < 90 microseconds
CSC 7600 Lecture 3 : Commodity Clusters,Spring 2011
50
Myrinet
• High Performance: 2+2 Gbps
• Low latency: 11 microseconds
• Fiber and copper interconnects
• High Availability – auto reroute
• 4, 8,16 and 64 port switches, stackable
• Scalable to 1000s of hosts
CSC 7600 Lecture 3 : Commodity Clusters,Spring 2011
InfiniBand
51
• High Performance: 10 - 20 Gbps
• Low latency: 1.2 microseconds
• Copper interconnects
• High availability - IEEE 802.3ad Link Aggregation / Channel Bonding
http://www.hpcwire.com/hpc/1342206.html
CSC 7600 Lecture 3 : Commodity Clusters,Spring 2011
Network Interconnect Topologies
52
TORUS
FAT-TREE (CLOS)
CSC 7600 Lecture 3 : Commodity Clusters,Spring 2011
53
Dell PowerEdge SC1435
Opteron, IBA
CSC 7600 Lecture 3 : Commodity Clusters,Spring 2011
54
Example: 320-host Clos topology of
16-port switches
64 hosts 64 hosts 64 hosts 64 hosts 64 hosts
(From Myricom)
CSC 7600 Lecture 3 : Commodity Clusters,Spring 2011
Arete Infiniband Network
55
CSC 7600 Lecture 3 : Commodity Clusters,Spring 2011
Topics
• Introduction to Commodity Clusters
• A brief history of Cluster computing
• Dominance of Clusters
• Core systems elements of Clusters
• SMP Nodes
• Operating Systems
• DEMO 1 : Arete Cluster Environment
• Throughput Computing
• Networks
• Resource Management / Scheduling Systems
• Message-passing/Cooperative programming model
• Cluster programming/application runtime environment
• Performance measurement & profiling of applications
• Summary Materials for Test
56
CSC 7600 Lecture 3 : Commodity Clusters,Spring 2011
57
Schedulers : PBS
Workload management system – coordinates resource utilization policy and user job requirements– Multi users, Multi jobs, Multi nodes
• Both Open Source and Commercially supported (Veridian)
• Functionality– Manages parallel job execution
– Interactive and batch cross system scheduling
– Security and access control lists
– Dynamic distribution and automatic load-leveling of workload
– Job and user accounting
• Accomplishments– Runs on all Unix and Linux platforms
– Supports MPI
– First release 1995
– 2000 sites registered, 1000 people on the mailing list
– PBSPro sales at >5000 cpu’s
CSC 7600 Lecture 3 : Commodity Clusters,Spring 2011
58
Schedulers : Maui (Moab)
• Cluster Resources Inc.
• Advanced systems software tool for more optimal job
scheduling
• Improved administration and statistical reporting
capabilities
• Analytical simulation capabilities to evaluate different
allocation and prioritization schemes.
• Offers different classes of services to users, allowing
high priority users to be scheduled first, while
preventing long-term starvation of low priority jobs.
• SMP Enabled
CSC 7600 Lecture 3 : Commodity Clusters,Spring 2011
59
Schedulers : Condor
• Distributed Task Scheduler
• Emphasis on throughput or capacity computing
• Services
– Automates cycle harvesting and workstation farms
– Distributed time-sharing and batch processing resource
– Exploits opportunitstic versus dedicated resources
– Permit preemptive acquisition of resources
– Transparent checkpointing
– Remote I/O – preserve local execution environment (require relinking)
– Asynchronous process management, master-worker processing
• Accomplishments
– First production system operational in 1986
– U. of Wisconsin 1300 CPU’s Condor controlled on campus
– Used by:
• large software house for bills and testing,
• Xerox printer simulation,
• Core Digital Pictures rendering of movies,
• INFN for high energy physics,
• 250 machines at NAS, half million hours
CSC 7600 Lecture 3 : Commodity Clusters,Spring 2011
Topics
• Introduction to Commodity Clusters
• A brief history of Cluster computing
• Dominance of Clusters
• Core systems elements of Clusters
• SMP Nodes
• Operating Systems
• DEMO 1 : Arete Cluster Environment
• Throughput Computing
• Networks
• Resource Management / Scheduling Systems
• Message-passing/Cooperative programming model
• Cluster programming/application runtime environment
• Performance measurement & profiling of applications
• Summary Materials for Test
60
CSC 7600 Lecture 3 : Commodity Clusters,Spring 2011
61
MPI Software
• Community wide standard process
– Leveraged experiences with NX, PVM, P4, Zipcode, others
• Dominant programming model for clusters
• Multiple implementations both OSS and commercial (MPI Soft Tech)
– All of MPI-1
– MPI I./O
– All of MPI-2
– MPI-3 under development
• Functionality
– Message passing model for distributed memory platforms
– Support for truly scalable operations (1000s nodes)
• Rich set of collective operations (gathers, reduces, scans, all to all)
• Scalable one sided operations (fence barrier synchronization, group-oriented synchronization)
– Dynamic processes (2) to spawn, disconnect etc. with scalability
• MPICH-2 entirely new rewrite
• OpenMPI includes fault tolerant capability
CSC 7600 Lecture 3 : Commodity Clusters,Spring 2011
Topics
• Introduction to Commodity Clusters
• A brief history of Cluster computing
• Dominance of Clusters
• Core systems elements of Clusters
• SMP Nodes
• Operating Systems
• DEMO 1 : Arete Cluster Environment
• Throughput Computing
• Networks
• Resource Management / Scheduling Systems
• Message-passing/Cooperative programming model
• Cluster programming/application runtime environment
• Performance measurement & profiling of applications
• Summary Materials for Test
62
CSC 7600 Lecture 3 : Commodity Clusters,Spring 2011
Compilers & Debuggers
• Compilers : – Intel C/ C++ / Fortran
– PGI C/ C++ / Fortran
– GNU C / C++ / Fortran
• Libraries :– Each compiler is linked against MPICH
– Mesh/Grid Partitioning software : METIS etc.
– Math Kernel Libraries (MKL)
– Intel MKL, AMD MKL, GNU Scientific Library (GSL)
– Data format libraries : NetCDF, HDF 5 etc
– Linear Algebra Packages : BLAS, LAPACK etc
• Debuggers– gdb
– Totalview
• Performance & Profiling tools : – PAPI
– TAU
– Gprof
– perfctr
63
CSC 7600 Lecture 3 : Commodity Clusters,Spring 2011
Distributed File Systems
• A distributed file system is a file system that is stored locally on one system (server) but is accessible by processes on many systems (clients).
• Multiple processes access multiple files simultaneously.
• Other attributes of a DFS may include :
– Access control lists (ACLs)
– Client-side file replication
– Server- and client- side caching
• Some examples of DFSes:
– NFS (Sun)
– AFS (CMU)
– PVFS (Clemson, Argonne), OrangeFS
– Lustre (Sun)
– GPFS (IBM)
• Distributed file systems can be used by parallel programs, but they have significant disadvantages :
– The network bandwidth of the server system is a limiting factor on performance
– To retain UNIX-style file consistency, the DFS software must implement some form of locking which has significant performance implications
64
Ohio Supercomputer Center
CSC 7600 Lecture 3 : Commodity Clusters,Spring 2011
Distributed File System : NFS
• Popular means for accessing remote file
systems in a local area network.
• Based on the client-server model , the remote
file systems are “mounted” via NFS and
accessed through the Linux virtual file system
(VFS) layer.
• NFS clients cache file data, periodically
checking with the original file for any changes.
• The loosely-synchronous model makes for
convenient, low-latency access to shared
spaces.
• NFS avoids the common locking systems used
to implement POSIX semantics.
65
CSC 7600 Lecture 3 : Commodity Clusters,Spring 2011
66
Parallel Virtual File System (PVFS)
• Clemson University - 1993
• Objective: high throughput file system – DOE, NASA, (GPL)
• Strategy:
– exploit parallelism of bandwidth
– provide user interface so that applications can make powerful requests such as large collection of non-contiguous data with single request for multidimensional data sets,
– allow application direct access to server:
• multiple application tasks directly access/spawn multiple file servers without going through kernel or central mechanism.
• N-clients and N-servers
• Single file spread across multiple disks and nodes and accessed by multiple tasks in an application.
• Scaling facilitated by eliminating single bottleneck
• Actual distribution of a file is configurable on a file by file basis.
• Reactive scheduling addresses problem of network contention and adaptive to file system load
CSC 7600 Lecture 3 : Commodity Clusters,Spring 2011
Topics
• Introduction to Commodity Clusters
• A brief history of Cluster computing
• Dominance of Clusters
• Core systems elements of Clusters
• SMP Nodes
• Operating Systems
• DEMO 1 : Arete Cluster Environment
• Throughput Computing
• Networks
• Resource Management / Scheduling Systems
• Message-passing/Cooperative programming model
• Cluster programming/application runtime environment
• Performance measurement & profiling of applications
• Summary Materials for Test
67
CSC 7600 Lecture 3 : Commodity Clusters,Spring 2011
Measuring Performance on Clusters
• Ways of measuring performance– Wall clock time
– Benchmarks
– Processor efficiency factors
– Scalability
– MPI communications and synchronization overhead
– System operations
• Tools– PAPI
– Tau
– Ganglia
– Many others
68
CSC 7600 Lecture 3 : Commodity Clusters,Spring 2011
MPI Performance Measurement : VAMPIR
69src : http://mumps.enseeiht.fr/
CSC 7600 Lecture 3 : Commodity Clusters,Spring 2011
MPI Performance : Tau
70
src : http://www.cs.uoregon.edu/research/tau
CSC 7600 Lecture 3 : Commodity Clusters,Spring 2011
Topics
• Introduction to Commodity Clusters
• A brief history of Cluster computing
• Dominance of Clusters
• Core systems elements of Clusters
• SMP Nodes
• Operating Systems
• DEMO 1 : Arete Cluster Environment
• Throughput Computing
• Networks
• Resource Management / Scheduling Systems
• Message-passing/Cooperative programming model
• Cluster programming/application runtime environment
• Performance measurement & profiling of applications
• Summary Materials for Test
71
CSC 7600 Lecture 3 : Commodity Clusters,Spring 2011
72
Summary – Material for the Test
• What is a commodity cluster – slide 4
• Commodity clusters vs “Constellations” – slide 8
• Key parameters for cluster computing – slide 24
• Where is the parallelism – slide 25
• Parameters for cluster nodes – slide 33
• Node operating system – slide 38,39,40,41
• Programming clusters – slide 44
• Throughput computing – slide 45
• Decoupled work queue model – slide 46
• Interconnect options – slide 48
• Scheduling systems – slide 57, 58, 59
• Message passing : MPI software – slide 61
• Distributed file systems – slide 64
• Measuring performance on cluster: Metrics & Tools – slide 68