Cluster Computing with Linux Prabhaker Mateti Wright State University

  • View
    215

  • Download
    2

Embed Size (px)

Text of Cluster Computing with Linux Prabhaker Mateti Wright State University

  • Cluster Computing with LinuxPrabhaker MatetiWright State University

    Mateti, Linux Clusters

  • Abstract Cluster computing distributes the computational load to collections of similar machines. This talk describes what cluster computing is, the typical Linux packages used, and examples of large clusters in use today. This talk also reviews cluster computing modifications of the Linux kernel.

    Mateti, Linux Clusters

  • What Kind of Computing, did you say?SequentialConcurrentParallelDistributedNetworkedMigratoryClusterGridPervasiveQuantumOpticalMolecular

    Mateti, Linux Clusters

  • Fundamentals Overview

    Mateti, Linux Clusters

  • Fundamentals OverviewGranularity of Parallelism SynchronizationMessage PassingShared Memory

    Mateti, Linux Clusters

  • Granularity of ParallelismFine-Grained Parallelism Medium-Grained ParallelismCoarse-Grained Parallelism NOWs (Networks of Workstations)

    Mateti, Linux Clusters

  • Fine-Grained MachinesTens of thousands of Processor ElementsProcessor Elements Slow (bit serial) Small Fast Private RAMShared Memory Interconnection Networks Message Passing Single Instruction Multiple Data (SIMD)

    Mateti, Linux Clusters

  • Medium-Grained MachinesTypical Configurations Thousands of processors Processors have power between coarse- and fine-grained Either shared or distributed memoryTraditionally: Research Machines Single Code Multiple Data (SCMD)

    Mateti, Linux Clusters

  • Coarse-Grained MachinesTypical Configurations Hundreds/Thousands of Processors Processors Powerful (fast CPUs) Large (cache, vectors, multiple fast buses)Memory: Shared or Distributed-Shared Multiple Instruction Multiple Data (MIMD)

    Mateti, Linux Clusters

  • Networks of WorkstationsExploit inexpensive Workstations/PCs Commodity network The NOW becomes a distributed memory multiprocessorWorkstations send+receive messages C and Fortran programs with PVM, MPI, etc. librariesPrograms developed on NOWs are portable to supercomputers for production runs

    Mateti, Linux Clusters

  • Definition of ParallelS1 begins at time b1, ends at e1S2 begins at time b2, ends at e2S1 || S2Begins at min(b1, b2)Ends at max(e1, e2)Commutative (Equiv to S2 || S1)

    Mateti, Linux Clusters

  • Data Dependencyx := a + b; y := c + d;x := a + b || y := c + d;y := c + d; x := a + b;X depends on a and b, y depends on c and dAssumed a, b, c, d were independent

    Mateti, Linux Clusters

  • Types of ParallelismResult: Data structure can be split into parts of same structure.Specialist: Each node specializes. Pipelines.Agenda: Have list of things to do. Each node can generalize.

    Mateti, Linux Clusters

  • Result ParallelismAlso called Embarrassingly ParallelPerfect ParallelComputations that can be subdivided into sets of independent tasks that require little or no communication Monte Carlo simulationsF(x, y, z)

    Mateti, Linux Clusters

  • Specialist ParallelismDifferent operations performed simultaneously on different processors E.g., Simulating a chemical plant; one processor simulates the preprocessing of chemicals, one simulates reactions in first batch, another simulates refining the products, etc.

    Mateti, Linux Clusters

  • Agenda Parallelism: MW ModelManager Initiates computation Tracks progress Handles workers requests Interfaces with userWorkers Spawned and terminated by manager Make requests to manager Send results to manager

    Mateti, Linux Clusters

  • Embarrassingly ParallelResult Parallelism is obviousEx1: Compute the square root of each of the million numbers given.Ex2: Search for a given set of words among a billion web pages.

    Mateti, Linux Clusters

  • ReductionCombine several sub-results into oneReduce r1 r2 rn with opBecomes r1 op r2 op op rnHadoop is based on this idea

    Mateti, Linux Clusters

  • Shared MemoryProcess A writes to a memory locationProcess B reads from that memory locationSynchronization is crucialExcellent speedSemantics ?

    Mateti, Linux Clusters

  • Shared MemoryNeeds hardware support: multi-ported memoryAtomic operations: Test-and-SetSemaphores

    Mateti, Linux Clusters

  • Shared Memory Semantics: AssumptionsGlobal time is available. Discrete increments.Shared variable, s = vi at ti, i=0,Process A: s := v1 at time t1Assume no other assignment occurred after t1.Process B reads s at time t and gets value v.

    Mateti, Linux Clusters

  • Shared Memory: SemanticsValue of Shared Variablev = v1, if t > t1v = v0, if t < t1v = ??, if t = t1t = t1 +- discrete quantumNext Update of Shared VariableOccurs at t2t2 = t1 + ?

    Mateti, Linux Clusters

  • Distributed Shared MemorySimultaneous read/write access by spatially distributed processorsAbstraction layer of an implementation built from message passing primitivesSemantics not so clean

    Mateti, Linux Clusters

  • Semaphores

    Semaphore s;V(s) ::= s := s + 1 P(s) ::= when s > 0 do s := s 1

    Deeply studied theory.

    Mateti, Linux Clusters

  • Condition VariablesCondition C;C.wait()C.signal()

    Mateti, Linux Clusters

  • Distributed Shared MemoryA common address space that all the computers in the cluster share.Difficult to describe semantics.

    Mateti, Linux Clusters

  • Distributed Shared Memory: IssuesDistributedSpatiallyLANWANNo global time available

    Mateti, Linux Clusters

  • Distributed ComputingNo shared memoryCommunication among processesSend a messageReceive a messageAsynchronousSynchronousSynergy among processes

    Mateti, Linux Clusters

  • MessagesMessages are sequences of bytes moving between processesThe sender and receiver must agree on the type structure of values in the messageMarshalling: data layout so that there is no ambiguity such as four chars v. one integer.

    Mateti, Linux Clusters

  • Message PassingProcess A sends a data buffer as a message to process B.Process B waits for a message from A, and when it arrives copies it into its own local memory.No memory shared between A and B.

    Mateti, Linux Clusters

  • Message PassingObviously, Messages cannot be received before they are sent.A receiver waits until there is a message.AsynchronousSender never blocks, even if infinitely many messages are waiting to be receivedSemi-asynchronous is a practical version of above with large but finite amount of buffering

    Mateti, Linux Clusters

  • Message Passing: Point to PointQ: send(m, P) Send message M to process PP: recv(x, Q)Receive message from process Q, and place it in variable xThe message dataType of x must match that of m As if x := m

    Mateti, Linux Clusters

  • BroadcastOne sender Q, multiple receivers PNot all receivers may receive at the same timeQ: broadcast (m) Send message M to processesP: recv(x, Q)Receive message from process Q, and place it in variable x

    Mateti, Linux Clusters

  • Synchronous Message PassingSender blocks until receiver is ready to receive.Cannot send messages to self.No buffering.

    Mateti, Linux Clusters

  • Asynchronous Message PassingSender never blocks.Receiver receives when ready.Can send messages to self.Infinite buffering.

    Mateti, Linux Clusters

  • Message PassingSpeed not so good Sender copies message into system buffers.Message travels the network.Receiver copies message from system buffers into local memory.Special virtual memory techniques help.Programming Qualityless error-prone cf. shared memory

    Mateti, Linux Clusters

  • Computer Architectures

    Mateti, Linux Clusters

  • Architectures of Top 500 Sys

    Mateti, Linux Clusters

  • Architectures of Top 500 Sys

    Mateti, Linux Clusters

  • Mateti, Linux Clusters

  • Parallel ComputersTraditional supercomputersSIMD, MIMD, pipelinesTightly coupled shared memoryBus level connectionsExpensive to buy and to maintainCooperating networks of computers

    Mateti, Linux Clusters

  • Traditional SupercomputersVery high starting costExpensive hardwareExpensive softwareHigh maintenanceExpensive to upgrade

    Mateti, Linux Clusters

  • Computational Grids Grids are persistent environments that enable software applications to integrate instruments, displays, computational and information resources that are managed by diverse organizations in widespread locations.

    Mateti, Linux Clusters

  • Computational GridsIndividual nodes can be supercomputers, or NOWHigh availabilityAccommodate peak usageLAN : Internet :: NOW : Grid

    Mateti, Linux Clusters

  • Buildings-Full of Workstations

    Distributed OS have not taken a foot hold. Powerful personal computers are ubiquitous. Mostly idle: more than 90% of the up-time?100 Mb/s LANs are common. Windows and Linux are the top two OS in terms of installed base.

    Mateti, Linux Clusters

  • Networks of Workstations (NOW)WorkstationNetworkOperating SystemCooperationDistributed+Parallel Programs

    Mateti, Linux Clusters

  • What is a Workstation? PC? Mac? Sun ?Workstation OS

    Mateti, Linux Clusters

  • Workstation OSAuthenticated usersProtection of resourcesMultiple processesPreemptive schedulingVirtual MemoryHierarchical file systemsNetwork centric

    Mateti, Linux Clusters

  • Clusters of WorkstationsInexpensive alternative to traditional supercomputersHigh availabilityLower down timeEasier accessDevelopment platform with production runs on traditional supercomputers

    Mateti, Linux Clusters

  • Clusters of WorkstationsDedicated NodesCome-