1
Making Parallel Processing on Making Parallel Processing on Clusters Efficient, Transparent and Clusters Efficient, Transparent and
Easy for ProgrammersEasy for Programmers
Andrzej M. Goscinski Andrzej M. Goscinski School of Computing and MathematicsSchool of Computing and Mathematics
Deakin UniversityDeakin University
Joint work with Michael Hobbs. Jackie Silcock and Justin Joint work with Michael Hobbs. Jackie Silcock and Justin RoughRough
2
Overview and AimsOverview and Aims Basic issues and Basic issues and
solutionssolutions– Parallel processing: user Parallel processing: user
expectations, clusters, expectations, clusters, phasesphases
– Parallelism managementParallelism management– TransparencyTransparency– Communication paradigmsCommunication paradigms– What to do?What to do?– Related systemsRelated systems
Cluster Execution Cluster Execution environmentsenvironments– MiddlewareMiddleware– Cluster operating systemsCluster operating systems
GENESISGENESIS– ArchitectureArchitecture– Services for parallelism Services for parallelism
management and management and transparencytransparency
GENESIS programming GENESIS programming interfaceinterface– Message passingMessage passing– DSMDSM– PrimitivesPrimitives
Easy to Use and Program Easy to Use and Program EnvironmentEnvironment
Performance StudyPerformance Study Summary and Future WorkSummary and Future Work
3
Parallel Processing:Parallel Processing:User ExpectationsUser Expectations
AffordableAffordable Supercomputers for a “poor man” Supercomputers for a “poor man”
PerformancePerformance Good performanceGood performance
Ease of UseEase of Use Free from creation and placement concernsFree from creation and placement concerns
TransparencyTransparency Unaware of location of processesUnaware of location of processes
Ease of ProgrammingEase of Programming Choice and easy use of communication paradigmChoice and easy use of communication paradigm
4
Parallel Processing:Parallel Processing:ClustersClusters
Clusters are an Clusters are an ideal platform for ideal platform for the execution of the execution of parallel applications parallel applications
Many institutions Many institutions (universities, banks, (universities, banks, industries) move industries) move toward toward homogeneous non-homogeneous non-dedicated clustersdedicated clusters
Advantages:Advantages:– Cheap to build: commodity Cheap to build: commodity
PCs, networksPCs, networks– Widely availableWidely available– Idle during weekendsIdle during weekends– Low utilization during Low utilization during
working hoursworking hours Disadvantages:Disadvantages:
– Poor and difficult to use Poor and difficult to use software (operating software (operating systems and runtime systems and runtime systems)systems)
– User unfriendlyUser unfriendly– Distribution of resources Distribution of resources
(CPUs and peripherals)(CPUs and peripherals)
5
Parallel ProcessingParallel Processing PhasesPhases
Three distinct phases:Three distinct phases:– InitializationInitialization– ExecutionExecution– TerminationTermination
Researchers and manufacturers mainly Researchers and manufacturers mainly concentrate on execution to achieve the best concentrate on execution to achieve the best performanceperformance
Ease of use of parallel systems and Ease of use of parallel systems and programmer’s time are neglectedprogrammer’s time are neglected
Application developers are discouraged as they Application developers are discouraged as they have to program many activities, which are of an have to program many activities, which are of an operating system nature operating system nature
6
Parallelism ManagementParallelism Management
Present operating systems that manage clusters Present operating systems that manage clusters are not built to support parallel processingare not built to support parallel processing
Reason: these operating systems do not provide Reason: these operating systems do not provide services to manage parallelismservices to manage parallelism
Parallelism management is the management of Parallelism management is the management of parallel processes and computational resourcesparallel processes and computational resources– Achieve high performanceAchieve high performance– Use computational resources efficientlyUse computational resources efficiently– Make programming and use of parallel systems easyMake programming and use of parallel systems easy
7
Parallelism ManagementParallelism Management Parallelism management in parallel Parallelism management in parallel
programming tools, Distributed Shared programming tools, Distributed Shared Memory and enhanced operating system Memory and enhanced operating system environments environments – has been neglectedhas been neglected– left to the application developersleft to the application developers
Application developers must deal Application developers must deal – not only with parallel application developmentnot only with parallel application development– but also with the problems of initiation and control but also with the problems of initiation and control
for the execution on the cluster for the execution on the cluster Transparency and reliability (SSI) have been Transparency and reliability (SSI) have been
neglected – users do not see a cluster as a neglected – users do not see a cluster as a single powerful computersingle powerful computer
8
Services for Parallelism Services for Parallelism Management on ClustersManagement on Clusters
Services for parallelism management Services for parallelism management and transparencyand transparency
Establishment of a virtual machine Establishment of a virtual machine Mapping of processes to computersMapping of processes to computers Parallel processes instantiationParallel processes instantiation Data (including shared) distributionData (including shared) distribution Initialisation of synchronization variablesInitialisation of synchronization variables Coordination of parallel processesCoordination of parallel processes Dynamic load balancingDynamic load balancing
9
TransparencyTransparency Users should see a cluster as a single powerful Users should see a cluster as a single powerful
computer computer Dimensions of parallel processing Dimensions of parallel processing
transparencytransparency – Location transparencyLocation transparency– Process relation transparencyProcess relation transparency– Execution transparencyExecution transparency– Device transparencyDevice transparency
10
Communication ParadigmsCommunication Paradigms
Two communication paradigms:Two communication paradigms: Message Passing (MP)Message Passing (MP)
Explicit communication between processes of a Explicit communication between processes of a parallel applicationparallel application– FastFast– Difficult to use for programmersDifficult to use for programmers
Distributed Shared Memory (DSM)Distributed Shared Memory (DSM)Implicit communication between processes of a Implicit communication between processes of a parallel application through shared memory objectsparallel application through shared memory objects– Easy to useEasy to use– Demonstrates reduced performanceDemonstrates reduced performance
Claim: Operating environments that offer MP and Claim: Operating environments that offer MP and DSM should be provided as a part of a cluster DSM should be provided as a part of a cluster operating system as they manage system resourcesoperating system as they manage system resources
11
What to do?What to do? AffordableAffordable
ClustersClusters PerformancePerformance
Introduce Introduce special servicesspecial services
Ease of UseEase of Use Parallelism Parallelism
managementmanagement TransparencyTransparency
Operating Operating systemssystems
Ease of ProgrammingEase of Programming Message passing Message passing
and DSMand DSM
Development of cluster Development of cluster operating systems operating systems supporting parallel supporting parallel processingprocessing
Services of cluster Services of cluster operating systems:operating systems:– Distributed services for Distributed services for
transparent transparent communication and communication and management of basic management of basic system resourcessystem resources
– Services for parallelism Services for parallelism management and management and transparencytransparency
12
Related SystemsRelated SystemsMessage Passing SystemsMessage Passing Systems
PVMPVM– A set of cooperating server processes and specialized libraries A set of cooperating server processes and specialized libraries
that support process communication, execution and that support process communication, execution and synchronizationsynchronization
– A virtual machine must be set up by the userA virtual machine must be set up by the user– Provides transparent process creation and terminationProvides transparent process creation and termination
MPIMPI– Objective is to standardize and coordinate the direction of various Objective is to standardize and coordinate the direction of various
message passing applications, tools and environmentsmessage passing applications, tools and environments– Provides limited process management functions to support Provides limited process management functions to support
parallel processingparallel processing HARNESSHARNESS
– Does not provide transparencyDoes not provide transparency– Programmers are forced to specify computers, map processes to Programmers are forced to specify computers, map processes to
these computersthese computers– Load imbalance is neglectedLoad imbalance is neglected
13
Related SystemsRelated SystemsDSM SystemsDSM Systems
Research concentrates mainly on improving Research concentrates mainly on improving performanceperformance
Ease of use has been neglectedEase of use has been neglected MuninMunin
– Programmers must label different variables according to the Programmers must label different variables according to the consistency protocol they requireconsistency protocol they require
– The initialisation stage requires the application developer to The initialisation stage requires the application developer to define the number of computers to be useddefine the number of computers to be used
– Programmers must create a thread on each computer, Programmers must create a thread on each computer, initialise shared data and create synchronization variablesinitialise shared data and create synchronization variables
TreadMarksTreadMarks– The application developer has a substantial input into The application developer has a substantial input into
initialisation of DSM processesinitialisation of DSM processes– Full transparency is not providedFull transparency is not provided
14
Related SystemsRelated SystemsExecution EnvironmentsExecution Environments
Improvement to PVM, MPI and DSM approach of running Improvement to PVM, MPI and DSM approach of running on top of an operating system is through the on top of an operating system is through the enhancement of an operating system to support parallel enhancement of an operating system to support parallel processingprocessing
Beowulf Beowulf – Exploits distributed process space to manage parallel processesExploits distributed process space to manage parallel processes– Processes can be started on remote computers after logon Processes can be started on remote computers after logon
operation into that computer was completed successfullyoperation into that computer was completed successfully– It does not address resource allocation nor load balancingIt does not address resource allocation nor load balancing– Transparent process migration is not provided Transparent process migration is not provided
15
Related SystemsRelated SystemsExecution EnvironmentsExecution Environments
NOWNOW– Combines specialized libraries and server processes with Combines specialized libraries and server processes with
enhancement to the kernelenhancement to the kernel– Enhancement: scheduling and communication kernel modules- Enhancement: scheduling and communication kernel modules-
GLUnix to provide network wide process, file and VM managementGLUnix to provide network wide process, file and VM management– Parallelism management service: process initialisation on any Parallelism management service: process initialisation on any
cluster computer, support semi-transparent start of parallel cluster computer, support semi-transparent start of parallel processes on multiple nodes (how to select nodes?), barriers, MPIprocesses on multiple nodes (how to select nodes?), barriers, MPI
MOSIXMOSIX– Provides enhanced and transparent communication and Provides enhanced and transparent communication and
scheduling within the kernelscheduling within the kernel– Employs PVM to provide parallelism support (initial placement)Employs PVM to provide parallelism support (initial placement)– Process migration transparently migrates processesProcess migration transparently migrates processes– Provides dynamic load balancing and data collectionProvides dynamic load balancing and data collection– Remote communication is handled through the originating Remote communication is handled through the originating
computer computer
16
Related SystemsRelated SystemsSummarySummary
All systems but MOSIX are based on middleware – All systems but MOSIX are based on middleware – there is no trial to develop a comprehensive operating there is no trial to develop a comprehensive operating system to support parallel processing on clusterssystem to support parallel processing on clusters
The solutions are performance driven – little work has The solutions are performance driven – little work has been done on making them programmer friendlybeen done on making them programmer friendly
Problems from parallel processing point of view:Problems from parallel processing point of view:– Processes are created one at a time although primitives Processes are created one at a time although primitives
provided enable the user to create multiple processesprovided enable the user to create multiple processes– These systems (with the exception of MOSIX) do not provide These systems (with the exception of MOSIX) do not provide
complete transparencycomplete transparency– Virtual machine is not set up automaticallyVirtual machine is not set up automatically– These systems do not provide load balancingThese systems do not provide load balancing
17
Cluster Execution Cluster Execution EnvironmentsEnvironments
Execution environments that support parallel Execution environments that support parallel processing on clusters can be developed usingprocessing on clusters can be developed using
Middleware approach – at the application levelMiddleware approach – at the application level Underware – at the kernel levelUnderware – at the kernel level
18
MiddlewareMiddleware
Userprocess
M
PVM software Library functions or separate software
Operating System (Unix)
Userprocess
M
DSM software
Operating System (Unix)
ORApplication processes
Operating system
19
Middleware - summaryMiddleware - summary Middleware allows programmersMiddleware allows programmers
– to develop parallel application (PVM, MPI)to develop parallel application (PVM, MPI)– execute parallel applications on clusters (Beowulf)execute parallel applications on clusters (Beowulf)– employ shared memory based programming (Munin)employ shared memory based programming (Munin)– achieve good execution performanceachieve good execution performance– take advantage of portabilitytake advantage of portability
MiddlewareMiddleware– does not offer complete transparencydoes not offer complete transparency– reduces potential execution performance (services are duplicated)reduces potential execution performance (services are duplicated)– forces programmers to be involved in many time consuming and forces programmers to be involved in many time consuming and
error prone activities that are of the operating system natureerror prone activities that are of the operating system nature Conclusion: to provide parallelism management, offer Conclusion: to provide parallelism management, offer
transparency, make programming and use of a system transparency, make programming and use of a system easy develop the needed services at the operating system easy develop the needed services at the operating system levellevel
20
Cluster operating systemsCluster operating systems Cluster is a special kind of a distributed systemCluster is a special kind of a distributed system Cluster operating system supporting parallel Cluster operating system supporting parallel
processing shouldprocessing should– possess the features of a distributed operating system to possess the features of a distributed operating system to
deal with distributed resources and their management deal with distributed resources and their management and hide distributionand hide distribution
– exploit additional services to manage parallelism for exploit additional services to manage parallelism for application and offer complete transparencyapplication and offer complete transparency
– provide an enhanced programming environmentprovide an enhanced programming environment Three logical levels of a cluster operating systemThree logical levels of a cluster operating system
– Basic distributed operating systemBasic distributed operating system– Parallelism management and transparency systemParallelism management and transparency system– Programming environmentProgramming environment
21
Logical architecture of Logical architecture of a cluster operating system a cluster operating system
Message Passing/PVM
M
PROGRAMMINGENVIRONMENT
CommunicationServices
DSMServices
ParallelismManagement
System
Enhanced Subset of a Distributed Operating System(Microkernel, Communication/File Management)
SharedMemory
22
GENESIS GENESIS Cluster Operating SystemCluster Operating System
Proof of conceptProof of concept Client-server model, microkernel approach and object Client-server model, microkernel approach and object
based approach (all entities have names)based approach (all entities have names) All basic resources: processor, main memory, network, All basic resources: processor, main memory, network,
interprocess communication, files are managed by interprocess communication, files are managed by relevant serversrelevant servers
IPC - Message passing servicesIPC - Message passing services– basic communication paradigmbasic communication paradigm– cornerstone of the architecturecornerstone of the architecture– provided by IPC Manager and local IPC component of provided by IPC Manager and local IPC component of
microkernelmicrokernel IPC placement and relationship with other services IPC placement and relationship with other services
designed to achieve high performance and transparencydesigned to achieve high performance and transparency DSM provided by Space (memory) and IPC Managers DSM provided by Space (memory) and IPC Managers
23
The GENESIS ArchitectureThe GENESIS Architecture
ParallelProcesses
ParallelismManagement
System
KernelServers
GlobalScheduler
RHODOS Microkernel
DSM System
ExecutionManager
IPCManager
SpaceManager
ProcessManager
File/CacheManager
NetworkManager
MP PVM DSM
ResourceDiscovery
MigrationManager
24
GENESIS Services for GENESIS Services for Parallelism Management and Parallelism Management and
Transparency Transparency
Basic services that provide parallelism Basic services that provide parallelism management and offer transparency:management and offer transparency:
Establishment of a virtual machineEstablishment of a virtual machine Process creationProcess creation Process duplicationProcess duplication Process migrationProcess migration Global schedulingGlobal scheduling
25
Establishment of Establishment of a Virtual Machinea Virtual Machine
Resource Discovery Server supports adaptive Resource Discovery Server supports adaptive establishment of a virtual machineestablishment of a virtual machine
Resource Discovery Server Resource Discovery Server – IdentifiesIdentifies
Idle and lightly loaded computersIdle and lightly loaded computers Computer resources: e.g., processor model, memory sizeComputer resources: e.g., processor model, memory size Computational load and available memory Computational load and available memory Communication patterns for each processCommunication patterns for each process
– Passes information to the Global Scheduling Server perPasses information to the Global Scheduling Server per ProcessProcess ServerServer Averaged over an entire clusterAveraged over an entire cluster
Virtual machine changes dynamicallyVirtual machine changes dynamically Some computers become overloaded or out of orderSome computers become overloaded or out of order Some computers become idleSome computers become idle
26
Process CreationProcess Creation
RequirementsRequirements– Multiple process creation – to create many instances of a process Multiple process creation – to create many instances of a process
on a single or over many computerson a single or over many computers– Scalability – must be scalable to many computersScalability – must be scalable to many computers– Complete transparency – must hide the location of all resources Complete transparency – must hide the location of all resources
and processesand processes Three forms of process creation:Three forms of process creation:
SingleSingle MultipleMultiple GroupGroup
Creation is invoked when the Execution Manager receives Creation is invoked when the Execution Manager receives a process create request from a parent processa process create request from a parent process– Execution Manager notifies Global SchedulerExecution Manager notifies Global Scheduler– Global Scheduler sends location on which process should be Global Scheduler sends location on which process should be
createdcreated– Execution Manager on selected computer manages process Execution Manager on selected computer manages process
creationcreation
27
Process CreationProcess CreationSingle and Multiple ServicesSingle and Multiple Services
Single process creation serviceSingle process creation service– Similar to the services found in traditional systems Similar to the services found in traditional systems
supporting parallel processingsupporting parallel processing– Requires executable image to be downloaded from disk Requires executable image to be downloaded from disk
for each parallel process to be createdfor each parallel process to be created Multiple process creation serviceMultiple process creation service
– Supports the concurrent instantiation of a number of Supports the concurrent instantiation of a number of processes on a given computer through one creation callprocesses on a given computer through one creation call
– When many computers are involved in multiple process When many computers are involved in multiple process creation, each computer is addressed in a sequential creation, each computer is addressed in a sequential mannermanner
– Executable image of a parallel child process must be Executable image of a parallel child process must be downloaded separately for each computer involved – downloaded separately for each computer involved – scalability problemscalability problem
28
Process CreationProcess CreationGroupGroup
Group process creation combines multiple Group process creation combines multiple process creation and group communication process creation and group communication
Group process creation serviceGroup process creation service– allows multiple process to be created concurrently allows multiple process to be created concurrently
on many computers on many computers – Single executable is downloaded from a file server Single executable is downloaded from a file server
using group communicationusing group communication
29
Group Process CreationGroup Process CreationBehaviorBehavior
FileFile ServerServer
GlobalGlobalSchedulerScheduler
ExecExecManagerManager
ParentParent Child 1Child 1
Computer 1Computer 1
ExecExecManagerManager
Child 2Child 2
Computer 2Computer 2
ExecExecManagerManager
Child nChild n
Computer nComputer n
9911
22 33
44
55
55
44
44
88
66
77
77
30
Process DuplicationProcess DuplicationSingle Local and RemoteSingle Local and Remote
Parallel processes are instantiated on selected Parallel processes are instantiated on selected computers by employing process duplication computers by employing process duplication supported by process migration supported by process migration
Three forms of process duplicationThree forms of process duplication Single local and remote Single local and remote Multiple local and remoteMultiple local and remote Group remoteGroup remote
Single local and remote process duplicationSingle local and remote process duplication– Duplication is invoked when the Execution Manager receives Duplication is invoked when the Execution Manager receives
a twin request from a parent processa twin request from a parent process Execution Manager notifies Global SchedulerExecution Manager notifies Global Scheduler Global Scheduler sends a location on which twin should be Global Scheduler sends a location on which twin should be
placedplaced If this computer is remote process migration is employedIf this computer is remote process migration is employed
31
Process DuplicationProcess DuplicationMultiple Local and RemoteMultiple Local and Remote
Multiple local and remote process duplication is an Multiple local and remote process duplication is an enhancement of single process duplicationenhancement of single process duplication
Duplication is invoked when the Execution Manager Duplication is invoked when the Execution Manager receives a multiple duplication request from a parent receives a multiple duplication request from a parent processprocess– Execution Manager notifies Global SchedulerExecution Manager notifies Global Scheduler– Global Scheduler sends a location on which twin should be placedGlobal Scheduler sends a location on which twin should be placed– If computer is local If computer is local
Process Manager and Space Manager are requested to duplicate Process Manager and Space Manager are requested to duplicate multiple copies of process entries and memory spacesmultiple copies of process entries and memory spaces
– If computer is remote If computer is remote the parent process is migrated to this destinationthe parent process is migrated to this destination multiple copies of the parent process are duplicatedmultiple copies of the parent process are duplicated the parent process on the remote computer is killedthe parent process on the remote computer is killed
Child processes should be duplicated on many computersChild processes should be duplicated on many computers– Remote process duplication is performed for each selected Remote process duplication is performed for each selected
computercomputer
32
Process DuplicationProcess DuplicationGroup RemoteGroup Remote
When more than one remote computer is involved in When more than one remote computer is involved in process duplication the overall performance decreasesprocess duplication the overall performance decreases
Decrease is caused by migrating a parent process to Decrease is caused by migrating a parent process to each remote computer sequentiallyeach remote computer sequentially
Performance is improved by employing group process Performance is improved by employing group process migrationmigration– Process Managers and Execution Managers each join a Process Managers and Execution Managers each join a
relevant group and use group communicationrelevant group and use group communication– The parent process is concurrently migrated to all selected The parent process is concurrently migrated to all selected
remote computers involved in process duplicationremote computers involved in process duplication
33
Group Remote Process Group Remote Process DuplicationDuplicationBehaviorBehavior
Computer nComputer n
GlobalGlobalSchedulerScheduler
ExecExecManagerManager
Child 1Child 1 ParentParent
Computer 1Computer 1
99
11
22 33
5M5M
5544
66
77
77
MigrationMigration ManagerManager
ExecExecManagerManager
Child 2Child 2ParentParent
Computer 2Computer 2
MigrationMigration ManagerManager
ExecExecManagerManager
Child nChild nParentParent
MigrationMigration ManagerManager
88 1010
77
55
55
5M5M88
88
34
Process MigrationProcess Migration Designed to separate policy from mechanismDesigned to separate policy from mechanism
– Process Migration Manager acts as the coordinator for migration Process Migration Manager acts as the coordinator for migration of various resources that combine to form a processof various resources that combine to form a process
– Migration of resources: memory, process entries, buffers is carried Migration of resources: memory, process entries, buffers is carried out by the Space, Process and IPC Managers, respectivelyout by the Space, Process and IPC Managers, respectively
Two forms of process migration: single and groupTwo forms of process migration: single and group Single process migrationSingle process migration
– Global Scheduler provides “which” process to “where” computerGlobal Scheduler provides “which” process to “where” computer– Local Manager requests its remote peer to prepare for a processLocal Manager requests its remote peer to prepare for a process– Local Migration Manager requests Space, Process and IPC Local Migration Manager requests Space, Process and IPC
Managers to migrate respective resourcesManagers to migrate respective resources– Remote Manager informs its local peer of successful migrationRemote Manager informs its local peer of successful migration– Local Manager requests Space, Process and IPC Managers to Local Manager requests Space, Process and IPC Managers to
delete the respective resources of the migrated processdelete the respective resources of the migrated process
35
Process MigrationProcess MigrationBehaviorBehavior
33
1122
6655
77 ProcessProcess ManagerManager
GlobalGlobalSchedulerScheduler
MigrationMigration ManagerManager
ProcessProcess
Source ComputerSource Computer
SpaceSpace ManagerManager
IPCIPC ManagerManager
ProcessProcess ManagerManager
MigrationMigration ManagerManager
ProcessProcess
Destination ComputerDestination Computer
SpaceSpace ManagerManager
IPCIPC ManagerManager
Process StateProcess State
SpacesSpaces
IPC BuffersIPC Buffers
44
44
44
EventEvent
36
Group Process MigrationGroup Process Migration
Enhancement of the single process migrationEnhancement of the single process migration Modifying the single communication between Modifying the single communication between
the peer Migration Managers, Process Managers, the peer Migration Managers, Process Managers, Space Managers and IPC Managers to that of Space Managers and IPC Managers to that of group communicationgroup communication
Global Scheduler provides “which” process to Global Scheduler provides “which” process to “where” computers“where” computers– Each server migrates their respective resources to Each server migrates their respective resources to
multiple destination computers in a single message multiple destination computers in a single message using group communicationusing group communication
– Parent process is duplicated on each remote computerParent process is duplicated on each remote computer– At the end of successful migration the parent process At the end of successful migration the parent process
on each remote computer is killed on each remote computer is killed
37
Global SchedulingGlobal Scheduling
Makes policy decisions of which processes should Makes policy decisions of which processes should be mapped to which computersbe mapped to which computers
Input provided by the Resource Discovery ManagerInput provided by the Resource Discovery Manager Relies on mechanisms ofRelies on mechanisms of
– Single, multiple an group process creation and duplication Single, multiple an group process creation and duplication servicesservices
– Single and group process migrationSingle and group process migration The server combines services ofThe server combines services of
– Static allocation – at the initial stage of parallel processing Static allocation – at the initial stage of parallel processing – Dynamic load balancing – to react to load fluctuationsDynamic load balancing – to react to load fluctuations
Currently, the Global Scheduler is implemented as a Currently, the Global Scheduler is implemented as a centralized servercentralized server
38
GENESIS Programming GENESIS Programming InterfaceInterface
Designed and Designed and developed to developed to provide both provide both communicatiocommunication paradigms:n paradigms:– Message Message
passingpassing– Shared Shared
memorymemory
Message Passing/PVM
M
PROGRAMMINGENVIRONMENT
CommunicationServices
DSMServices
ParallelismManagement
System
Enhanced Subset of a Distributed Operating System(Microkernel, Communication/File Management)
SharedMemory
39
Message PassingMessage Passing
Basic Message PassingBasic Message Passing– Exploits basic interprocess communication conceptsExploits basic interprocess communication concepts– Transparent and reliable local and remote IPCTransparent and reliable local and remote IPC– Integral component of GENESISIntegral component of GENESIS– Offers standard message passing and RPC primitivesOffers standard message passing and RPC primitives
GENESIS PVMGENESIS PVM– PVM added to provide a well known parallelism programming tool PVM added to provide a well known parallelism programming tool – Ported from the UNIX based PVMPorted from the UNIX based PVM– Implemented within a ‘library’ in GENESISImplemented within a ‘library’ in GENESIS– Mapping of the standard PVM services onto the GENESIS servicesMapping of the standard PVM services onto the GENESIS services– Performance improvement of PVM on GENESISPerformance improvement of PVM on GENESIS
No additional “classic” PVM server processes requiredNo additional “classic” PVM server processes required Direct interprocess communication model instead of the default Direct interprocess communication model instead of the default
modelmodel Load balancing providedLoad balancing provided
40
Architecture of PVM on UnixArchitecture of PVM on Unix
PVMPVM ServerServer
User Task 1User Task 1 libpvmlibpvm
KernelKernel
Computer 1Computer 1
PVMPVM ServerServer
User Task 2User Task 2 libpvmlibpvm
KernelKernel
Computer 2Computer 2
TCP ConnectionsTCP Connections
UDP DatagramsUDP Datagrams
41
Architecture of PVM on Architecture of PVM on GENESISGENESIS
ExecutionExecutionManagerManager
MigrationMigrationManagerManager
GlobalGlobalSchedulerScheduler
IPCIPCManagerManager
NetworkNetworkManagerManager
ExecutionExecutionManagerManager
MigrationMigrationManagerManager
IPCIPCManagerManager
NetworkNetworkManagerManager
Computer 1Computer 1 Computer 2Computer 2PVMPVMCommsCommsUser PVMUser PVM
Parallel ProcessesParallel Processes
libpvmlibpvm
User PVMUser PVM Parallel ProcessesParallel Processes
libpvmlibpvm
MicrokernelMicrokernel MicrokernelMicrokernel
NetworkNetwork
42
Distributed Shared MemoryDistributed Shared Memory DSM is an integral component of the operating systemDSM is an integral component of the operating system Since DSM is a memory management function the DSM Since DSM is a memory management function the DSM
system is integrated into the Space Managersystem is integrated into the Space Manager– Shared memory used as though it were physically sharedShared memory used as though it were physically shared– Easy to use shared memoryEasy to use shared memory– Low overhead, improved performanceLow overhead, improved performance
Two consistency models supported:Two consistency models supported:– Sequential – implemented using invalidation modelSequential – implemented using invalidation model– Release – implemented using write-update modelRelease – implemented using write-update model
Synchronization and coordination of processesSynchronization and coordination of processes– Semaphores - owned by Space Manager on particular computerSemaphores - owned by Space Manager on particular computer– Gaining ownership is distributed and mutually exclusiveGaining ownership is distributed and mutually exclusive– Barriers used for coordination – their management is centralizedBarriers used for coordination – their management is centralized
43
Distributed Shared MemoryDistributed Shared Memory
IPCIPCManagerManager
IPCIPCManagerManager
ProcessProcess ManagerManager
ProcessProcessManagerManager
DSMDSM DSMDSM SpaceSpaceManagerManager
SpaceSpace ManagerManager
User DSMUser DSM Parallel ProcessesParallel Processes
User DSMUser DSM Parallel ProcessesParallel Processes
Computer 1Computer 1 Computer 2Computer 2
MicrokernelMicrokernel MicrokernelMicrokernel
NetworkNetwork
SharedSharedMemoryMemory
44
GENESIS PrimitivesGENESIS Primitives ExecutionExecution
Two groups of primitivesTwo groups of primitives
– to support execution services to support execution services
– for the provision of communication and coordination servicesfor the provision of communication and coordination services
MP PVM DSM
proc-ncreate() pvm_spawn() proc-ncreate()
proc-exit() pvm_exit() proc-exit()
45
GENESIS PrimitivesGENESIS PrimitivesCommunication and CoordinationCommunication and Coordination
MP PVM DSM
send() pvm_send() read access
recv() pvm_recv() write access
pvm_pkbuf() wait()
pvm_unpkbuf() signal()
barrier()pvm_barrier()
barrier()
46
Easy to Use and Program Easy to Use and Program EnvironmentEnvironment
GENESIS systemGENESIS system Provides and efficient and transparent environment Provides and efficient and transparent environment
for execution of parallel applicationsfor execution of parallel applications Offers transparencyOffers transparency Relieves programmers from activities such as:Relieves programmers from activities such as:
– Selection of computers for a virtual a machine for the Selection of computers for a virtual a machine for the given applicationgiven application
– Setting up a virtual machineSetting up a virtual machine– Mapping processes to virtual machineMapping processes to virtual machine– Process instantiation using process creation and Process instantiation using process creation and
duplication supported by process migrationduplication supported by process migration– Load balancingLoad balancing
47
Easy to Use and Program Easy to Use and Program EnvironmentEnvironment
In the In the GENESIS systemGENESIS system Location of the remote computer(s) of the cluster is Location of the remote computer(s) of the cluster is
selected automatically by Global Schedulerselected automatically by Global Scheduler Users do not know process locationUsers do not know process location Programming of parallel applications has been Programming of parallel applications has been
made easy by providingmade easy by providing– Message passing: standard and PVMMessage passing: standard and PVM– Distributed Shared MemoryDistributed Shared Memory– Powerful primitives: implement sequences of operations Powerful primitives: implement sequences of operations
and provide transparency and provide transparency process_ncreate(GROUP_CREATE,n, process_ncreate(GROUP_CREATE,n, “child_prog”)“child_prog”)
– Process instantiation using process creation and Process instantiation using process creation and duplication supported by process migrationduplication supported by process migration
– Load balancingLoad balancing
48
Performance Performance of Standard Parallel of Standard Parallel
ApplicationsApplications
GENESIS SystemGENESIS System– 13 Sun3/50 Workstations13 Sun3/50 Workstations
12 Computation + 1 File Server12 Computation + 1 File Server– 10 Mbit/sec shared Ethernet10 Mbit/sec shared Ethernet
Influence of process instantiation on execution Influence of process instantiation on execution performanceperformance
GENESIS PVM vs. Unix PVMGENESIS PVM vs. Unix PVM Standard parallel applicationsStandard parallel applications
– Successive Over Relaxation Successive Over Relaxation – Quicksort Quicksort – Traveling Salesman ProblemTraveling Salesman Problem
49
Influence of Process Instantiation on Influence of Process Instantiation on Execution PerformanceExecution Performance
Parallel Simulation (5, 25, 50 Second Workload)Parallel Simulation (5, 25, 50 Second Workload)
Simulation - amount of work Simulation - amount of work relates to the overall exec relates to the overall exec time time
Two parameters:Two parameters: – Work load (Work load (5, 25, 50 Seconds)5, 25, 50 Seconds)– Number of workstations (1 ..12)Number of workstations (1 ..12)
Global scheduler & migrationGlobal scheduler & migration Speedups for #comp = #procSpeedups for #comp = #proc
0123456789
10111213
0 1 2 3 4 5 6 7 8 9 10 11 12 13
Number of Workstations
Speed-u
p IdealGroupMultiSingle
0123456789
10111213
0 1 2 3 4 5 6 7 8 9 10 11 12 13
Number of Workstations
Speed-u
p IdealGroupMultiSingle
0123456789
10111213
0 1 2 3 4 5 6 7 8 9 10 11 12 13
Number of Workstations
Speed-u
p IdealGroupMultiSingle
50
GENESIS PVM vs. Unix PVMGENESIS PVM vs. Unix PVMIPC LatencyIPC Latency
Support for IPC provided by the PVM server in Unix was substituted Support for IPC provided by the PVM server in Unix was substituted with GENESIS operating system mechanismswith GENESIS operating system mechanisms
To measure the time saved by removing the server, a simple PVM To measure the time saved by removing the server, a simple PVM application that exchanges messages (1kbyte –100kbytes) was usedapplication that exchanges messages (1kbyte –100kbytes) was used
Round-trip time (including data packing and unpacking) was measuredRound-trip time (including data packing and unpacking) was measured
0
200
400
600
800
1000
1200
1400
1600
1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96
Message Size (KBytes)
Ro
un
d T
rip
TIm
e (m
s)
Genesis PVM
Unix PVM (Default Route, No Encoding)
51
GENESIS PVM vs. Unix PVMGENESIS PVM vs. Unix PVMSpeedupSpeedup
Application used to study the influence of process instantiation Application used to study the influence of process instantiation - amount of work relates to the overall exec time – was studied- amount of work relates to the overall exec time – was studied
Parameters:Parameters:– Number of workstationsNumber of workstations– GENESIS with and without load balancingGENESIS with and without load balancing
1
3
5
7
9
11
1 2 4 6 8 10 12
Number of Workstations
Max
imu
m S
pee
du
p A
chie
ved
Optimal
Genesis PVM w ith Load Balancing
Genesis PVM w ithout Load Balancing
Unix PVM
52
Successive Over RelaxationSuccessive Over Relaxation Parallel applications developed based on algorithms of Rice UniversityParallel applications developed based on algorithms of Rice University Rice superior cluster hardware: DEC station-5000/240 + fast ATM netRice superior cluster hardware: DEC station-5000/240 + fast ATM net For 8 computers – array size: Rice - 512 x 2048 elements with 101 For 8 computers – array size: Rice - 512 x 2048 elements with 101
iterations; GENESIS 128 x 128 elements with 10 iterationsiterations; GENESIS 128 x 128 elements with 10 iterations– DSM: TreadMarks – 6.3; GENESIS – 4.4DSM: TreadMarks – 6.3; GENESIS – 4.4– PVM: Rice – 6.91; GENESIS – 5.1PVM: Rice – 6.91; GENESIS – 5.1
0123456789
10111213
0 1 2 3 4 5 6 7 8 9 10 11 12 13
Number of Workstations
Speed-u
p IdealMPPVMDSM
53
QuicksortQuicksort Parallel applications developed based on algorithms of Rice Parallel applications developed based on algorithms of Rice Rice superior cluster hardware: DEC station-5000/240 + fast ATM Rice superior cluster hardware: DEC station-5000/240 + fast ATM
netnet For 8 computers – array size: Rice - 256 x 1024 integers; GENESIS For 8 computers – array size: Rice - 256 x 1024 integers; GENESIS
256 x 256 integers256 x 256 integers– DSM: TreadMarks – 5.3; GENESIS – 2.5DSM: TreadMarks – 5.3; GENESIS – 2.5– PVM: Rice – 6.79; GENESIS – 6.07PVM: Rice – 6.79; GENESIS – 6.07
0123456789
10111213
0 1 2 3 4 5 6 7 8 9 10 11 12 13
Number of Workstations
Speed-u
p IdealMPPVMDSM
54
Traveling Salesman ProblemTraveling Salesman Problem Parallel applications developed based on algorithms of Rice Parallel applications developed based on algorithms of Rice
UniversityUniversity Rice superior cluster hardware: DEC station-5000/240 + fast ATM netRice superior cluster hardware: DEC station-5000/240 + fast ATM net For 8 computers – 18 city tour; with the minimum threshold set to 13 For 8 computers – 18 city tour; with the minimum threshold set to 13
citiescities– DSM: TreadMarks – 4.74; GENESIS – 6.33DSM: TreadMarks – 4.74; GENESIS – 6.33– PVM: Rice – 5.63; GENESIS – 5.94PVM: Rice – 5.63; GENESIS – 5.94
0123456789
10111213
0 1 2 3 4 5 6 7 8 9 10 11 12 13
Number of Workstations
Speed-u
p IdealMPPVMDSM
55
SummarySummary Nondedicated clusters are commonly availableNondedicated clusters are commonly available
– Force application developers to program operating system Force application developers to program operating system operationsoperations
– Do not offer transparencyDo not offer transparency Application developers need a computer system thatApplication developers need a computer system that
– Processes applications efficientlyProcesses applications efficiently– Uses cluster resources wellUses cluster resources well– Allows to see cluster as a single powerful computer rather than Allows to see cluster as a single powerful computer rather than
as a set of connected computersas a set of connected computers Proposal: employ a cluster operating systemProposal: employ a cluster operating system Design: cluster operating system with three logical levelsDesign: cluster operating system with three logical levels
– Distributed operating systemDistributed operating system– Parallelism management and transparency systemParallelism management and transparency system– Programming environmentProgramming environment
56
SummarySummary GENESIS – designed and developed as a “proof of GENESIS – designed and developed as a “proof of
concept”concept” GENESIS is a system that satisfies user requirementsGENESIS is a system that satisfies user requirements GENESIS approach is uniqueGENESIS approach is unique
– Offers both message passing (MP and PVM) and DSM environmentOffers both message passing (MP and PVM) and DSM environment– Services providing parallelism management are integral Services providing parallelism management are integral
components of an operating systemcomponents of an operating system– Provides a comprehensive environment to transparently manage Provides a comprehensive environment to transparently manage
system resourcessystem resources Programmers do not have to be involved in parallelism Programmers do not have to be involved in parallelism
management management Use of the cluster is has been made easyUse of the cluster is has been made easy Complete transparency is offeredComplete transparency is offered Good performance results have been achievedGood performance results have been achieved
57
Future WorkFuture Work
Port GENESIS to an Intel like platformPort GENESIS to an Intel like platform Use virtual memory to support DSMUse virtual memory to support DSM Offer reliable parallel computing services on Offer reliable parallel computing services on
clusters by employingclusters by employing– Reliable group communication Reliable group communication – Checkpointing to offer fault toleranceCheckpointing to offer fault tolerance