35
Lecture 11: Unix Clusters Asoc. Prof. Guntis Barzdins Asist. Girts Folkmanis University of Latvia Dec 10, 2004

Lecture 11: Unix Clusters

  • Upload
    sue

  • View
    24

  • Download
    3

Embed Size (px)

DESCRIPTION

Lecture 11: Unix Clusters. Asoc. Prof. Guntis Barzdins Asist. Girts Folkmanis University of Latvia Dec 10, 2004. Moore’s Law - Density. Moore's Law and Performance. The performance of computers is determined by architecture and clock speed. - PowerPoint PPT Presentation

Citation preview

Page 1: Lecture 11: Unix Clusters

Lecture 11: Unix Clusters

Asoc. Prof. Guntis BarzdinsAsist. Girts Folkmanis

University of LatviaDec 10, 2004

Page 2: Lecture 11: Unix Clusters

Moore’s Law - Density

Page 3: Lecture 11: Unix Clusters

Moore's Law and Performance The performance of computers is determined by

architecture and clock speed. Clock speed doubles over a 3 year period due to

the scaling laws on chip. Processors using identical or similar

architectures gain performance directly as a function of Moore's Law.

Improvements in internal architecture can yield better gains cf Moore's Law.

Page 4: Lecture 11: Unix Clusters

Future of Moore’s Law Short-Term (1-5 years)

Will operate (due to prototypes in lab) Fabrication cost will go up rapidly

Medium-Term (5-15 years) Exponential growth rate will likely slow Trillion-dollar industry is motivated

Long-Term (>15 years) May need new technology (chemical or quantum) We can do better (e.g., human brain) I would not close the patent office

Page 5: Lecture 11: Unix Clusters

Different kinds of PC cluster

High Performance Computing Cluster Load Balancing High Availability

Page 6: Lecture 11: Unix Clusters

High Performance Computing Cluster (Beowulf)

Start from 1994 Donald Becker of NASA assemble the world’s first cluster

with 16 sets of DX4 PCs and 10 Mb/s ethernet Also called Beowulf cluster Built from commodity off-the-shelf hardware Applications like data mining, simulations, parallel

processing, weather modelling, computer graphical rendering, etc.

Page 7: Lecture 11: Unix Clusters

Examples of Beowulf cluster

Scyld Cluster O.S. by Donald Becker http://www.scyld.com

ROCKS from NPACI http://www.rocksclusters.org

OSCAR from open cluster group http://oscar.sourceforge.net

OpenSCE from Thailand http://www.opensce.org

Page 8: Lecture 11: Unix Clusters

Cluster Sizing Rule of Thumb

System software (Linux, MPI, Filesystems, etc) scale from 64 nodes to at most 2048 nodes for most HPC applications Max socket connections Direct access message tag lists & buffers NFS / storage system clients Debugging Etc

It is probably hard to rewrite MPI and all Linux system software for O(100,000) node clusters

Page 9: Lecture 11: Unix Clusters
Page 10: Lecture 11: Unix Clusters

Apple Xserve G5 with Xgrid Environment

Alternative to Beowulf PC cluster Server node + 10 compute nodes

Dual CPU G5 processors (2 GHz, 1 GB memory)

Gigabit ethernet inter-connectivity3 TB XServe RAID array

Xgrid offers ‘easy’ pool-of-processors computing model

MPI available for heritage code

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 11: Lecture 11: Unix Clusters

Xgrid Computing Environment Suitable for loosely coupled distributed computing

Controller distributes tasks to agent processors (tasks include data and code) Collects results when agents finish Distributes more chunks to agents as they become free and join cluster/grid

Xgrid client

Xgrid controller

Xgrid agents

Server storage

Page 12: Lecture 11: Unix Clusters

Xgrid Work Flow

Page 13: Lecture 11: Unix Clusters

Cluster Status

Offline turned off

Unavailable turned on,but busy w/ othernon-cluster tasks

Working computingon this cluster job

Available waiting tobe assignedcluster work

Page 14: Lecture 11: Unix Clusters

Cluster Status Displays

Tachometer illustratestotal processing poweravailable to cluster atany time.

Level will change if runningon a cluster of desktopworkstations, but will staysteady if monitoring adedicated cluster

Rocky’s Tachy Tach

Page 15: Lecture 11: Unix Clusters

Load Balancing Cluster

PC cluster deliver load balancing performance Commonly used with busy ftp and web servers

with large client base Large number of nodes to share load

Page 16: Lecture 11: Unix Clusters
Page 17: Lecture 11: Unix Clusters

High Availability Cluster

Avoid downtime of services Avoid single point of failure Always with redundancy Almost all load balancing cluster are with HA

capability

Page 18: Lecture 11: Unix Clusters

Examples of Load Balancing and High Availability Cluster

RedHat HA cluster http://ha.redhat.com

Turbolinux Cluster Server http://www.turbolinux.com/products/tcs

Linux Virtual Server Project http://www.linuxvirtualserver.org/

Page 19: Lecture 11: Unix Clusters

High Availability Approach: Redundancy + Failover

Redundancy eliminates Single Points Of Failure (SPOF)Auto detect Failures (hardware, network, applications) Automatic Recovery from failures (no human intervention)

Page 20: Lecture 11: Unix Clusters

Real-Time Disk Replication (DRDB) Distributed Replicating Block Device

Page 21: Lecture 11: Unix Clusters

IBM Supported Solutions

Linux-HA (Heartbeat)Open Source ProjectMultiple platform solution for IBM

eServers, Solaris, BSDPackaged with several Linux

DistributionsStrong focus on ease-of-use,

security, simplicity, low-cost> 10K clusters in production since

1999

Tivoli System Automation (TSA) for Multi-PlatformProprietary IBM SolutionUsed across all eServers,

ia32 from any vendorAvailable on Linux, AIX, OS/400Rules Based Recovery SystemOver 1000 licenses since 2003

Page 22: Lecture 11: Unix Clusters
Page 23: Lecture 11: Unix Clusters

HPCC Cluster and parallel computing applications

Message Passing Interface MPICH (http://www-unix.mcs.anl.gov/mpi/mpich/) LAM/MPI (http://lam-mpi.org)

Mathematical fftw (fast fourier transform) pblas (parallel basic linear algebra software) atlas (a collections of mathematical library) sprng (scalable parallel random number generator) MPITB -- MPI toolbox for MATLAB

Quantum Chemistry software gaussian, qchem

Molecular Dynamic solver NAMD, gromacs, gamess

Weather modelling MM5 (http://www.mmm.ucar.edu/mm5/mm5-home.html)

Page 24: Lecture 11: Unix Clusters

MOSIX and openMosix

MOSIX: MOSIX is a software package that enhances the Linux kernel with cluster capabilities. The enhanced kernel supports any size cluster of X86/Pentium based boxes. MOSIX allows for the automatic and transparent migration of processes to other nodes in the cluster, while standard Linux process control utilities, such as 'ps' will show all processes as if they are running on the node the process originated from.

openMosix: openMosix is a spin off of the original Mosix. The first version of openMosix is fully compatible with the last version of Mosix, but is going to go in its own direction.

Page 25: Lecture 11: Unix Clusters

MOSIX architecture (3/9)

Preemptive process migration

any user’s process, trasparently and at any time, can migrate to any available node.

The migrating process is divided into two contexts: system context (deputy) that may not be migrated from “home” workstation

(UHN); user context (remote) that can be migrated on a diskless node;

Page 26: Lecture 11: Unix Clusters

MOSIX architecture (4/9)

Preemptive process migration

master node diskless node

Page 27: Lecture 11: Unix Clusters

Multi-CPU Servers

Page 28: Lecture 11: Unix Clusters

Benchmark - Memory

1x Stream: 2x Stream: 4x Stream:

2x Opteron, 1.8 GHz, HyperTransport: 1006 – 1671 MB/s 975 – 1178 MB/s 924 – 1133 MB/s

2x Xeon, 2.4 GHz, 400 MHz FSB: 1202 – 1404 MB/s 561 – 785 MB/s 365 – 753 MB/s

4x DIMM

1GB DDR266

Avent Techn.

4x DIMM

1GB DDR266

Avent Techn.

Page 29: Lecture 11: Unix Clusters

Sybase DBMS Performance

Page 30: Lecture 11: Unix Clusters

Multi-CPU Hardware and Software

Page 31: Lecture 11: Unix Clusters

Service Processor (SP)

Dedicated SP on-board PowerPC based Own IP name/address Front panel Command line interface Web-server

Remote administration System status Boot/Reset/Shutdown Flash the BIOS

Page 32: Lecture 11: Unix Clusters

Unix Scheduling

Page 33: Lecture 11: Unix Clusters

Process Scheduling

When to run scheduler1. Process create2. Process exit3. Process blocks4. System interrupt

Non-preemptive – process runs until it blocks or gives up CPU (1,2,3)

Preemptive – process runs for some time unit, then scheduler selects a process to run (1-4)

Page 34: Lecture 11: Unix Clusters

Solaris Overview Multithreaded, Symmetric Multi-Processing Preemptive kernel - protected data structures

Interrupts handled using threads MP Support - per cpu dispatch queues, one global

kernel preempt queue. System Threads Priority Inheritance Turnstiles rather than wait queues

Page 35: Lecture 11: Unix Clusters

Linux Today Linux scales very well in SMP systems up to 4 CPU’s. Linux on 8 CPU’s is still competitive, but between 4way and

8way systems the price per CPU increases significantly. For SMP systems with more than 8 CPU’s, classic Unix

systems are the best choice. With Oracle Real Application Clusters (RAC),

small 4 or 8way systems can be clustered to cross the today’s Linux limitations.

Commodity, inexpensive 4way Intel boxes, clustered with Oracle 9i RAC, help to reduce TCO.