View
53
Download
1
Category
Tags:
Preview:
DESCRIPTION
Distributed Shared Memory Systems and Programming. By: Kenzie MacNeil Adapted from Parallel Programming Techniques and Applications using networked workstations and parallel computers by Barry Wilkinson and Michael Allen, and. Distributed Shared Memory Systems. - PowerPoint PPT Presentation
Citation preview
Distributed Shared Memory Systems and Programming
By: Kenzie MacNeil
Adapted from Parallel Programming Techniques and Applications using networked workstations and parallel computers by Barry Wilkinson and Michael Allen, and
Distributed Shared Memory Systems
• Shared memory programming model on al cluster
• Has physically distributed and separate memory
• Programming Viewpoint:– Memory is grouped together and sharable
between processes
• Known as Distributed Shared Memory (DSM)
Distributed Shared Memory Systems
• Can be achieved by software or hardware• Software:– Easy to use on clusters– Inferior to using explicit message passing on the
same cluster
• Utilizes the same techniques as true shared memory systems (Chapter 8)
Distributed Shared Memory
• Shared memory programming is generally more convenient than message passing
• Data can be accessed by individual processors without explicitly sending data
• Shared data has to be controlled– Locks or other means
• Both message passing and shared memory often require synchronization
Distributed Shared Memory
• Distributed Shared Memory is a group of interconnected computers appearing to have a sing memory with a single address space
• Each computer having its own memory which is physically distributed
• Any memory location can be accessed by any processor in the cluster– Regardless of the memory residing locally
Distributed Shared Memory
Advantages of DMS
• Normal shared memory programming techniques can be used
• Easily scalable, compared to traditional bus-connected shared memory multiprocessors
• Message passing is hidden from the user• Can handle complex and large data bases
without replication or sending the data to processes
Disadvantages of DMS
• Lower performance than true shared memory multiprocessor systems
• Must provide for protection against simultaneous access to shared data – Locks, etc.
• Little programmer control over actual messages being generated
• Incur performance penalties when compared to message passing routines on a cluster
Hardware DSM Systems
• Special network interfaces and cache coherence circuits are required
• Several interfaces that support shared memory operations
• Higher level of performance• More expensive
Software DSM Systems
• Requires no hardware changes• Preformed by software routines• Software layer added between the operating
system and the applications– Kernel may or may not be modified
• Software layer can be– Page based– Shared variable based– Object based
Page Based DMS
• Existing virtual memory is used to instigate movement of data between computer
• Occurs when page referenced does not reside locally
• Referred to as virtual shared memory system• Page based systems include:– The first DMS system by Li(1986), TreadMarks
(1996), Locust (1998)
Page Based DSM System
Page Based DMS Disadvantages
• Size of the unit of the data, a page, can be too big• More than the specific data is usually referenced – Leads to longer messages
• Not portable, because they are tied to a particular virtual memory hardware and software
• False sharing effects appear at the page level– Situation in which different parts of a page are
required by different processors without any actual sharing of information, but each page must be shared by each process to access different parts
Shared Variable DMS
• Only variables declared as shared are transferred
• Transferred on demand– Paging mechanism is not used
• Software routines perform the actions• Shared Variable DMS approach includes:– Munin (1990), JIAJIA (1999), Adsmith (1996)
Object Based DMS
• Shared data is embodied in objects– Includes data items and procedures/methods– Methods used to access data
• Similar to shared variable approach, even considered an extension
• Easily implemented in OO languages
Managing Shared Data
• Many ways a processor can be given access to shared data
• Simplest is the use of a central server– Responsible for all read write operations on
shared data– Requests sent to this server– Occurs sequentially on the server– Implements a single reader/ single writer policy
Managing Shared Data
• Single reader/writer policy incurs bottleneck• Additional servers can be added to relieve this
bottleneck by sharing variables• However multiple copies of data is preferable– Allows simultaneous access to the data by
different processors– Coherence policy must be used to maintain these
copies
Multiple Reader / Single Writer
• Allows multiple processors to read shared data– Which can be achieved by replicating data
• Allows only one processor, the owner, to alter data at any instant
• When an owner alters data two policies are available:– Update policy– Invalidate policy
Multiple Reader/Single Writer Policy
• Update policy– Utilizes broadcast– All copies are altered to reflect broadcast message
• Invalidate policy– All unaltered copies of the data are flagged as invalid– Requires a processor to make a request from the
owner– Any copies of the data that are not accessed remain
invalid• Both policies are needed to be reliable
Multiple Reader/Single Writer Policy
• Page based approach• Complete page, which holds the variable, is
transferred• A variable stored on a page which is not
shared will be moved or invalidated• Protocols offered by applications like
TreadMarks for dual writing to a single page
Achieving Consistent Memory in DSM
• Memory consistency addresses when the current value of a shared variable is seen by other processors
• Various models are available:– Strict Consistency– Sequential Consistency– Relaxed Consistency– Weak consistency– Release Consistency– Lazy Release Consistency
Strict Consistency
• Variable is obtained from the most recent write to the shared variable
• As soon as a variable is altered all other processors are informed– Can be done by update or invalidity
• Disadvantage is the large number of messages and changes are not instantaneous
• Relaxed memory consistency, writes are delayed to reduce message passing
Strict Consistency
Sequential and Weak Consistency
• Sequential consistency, result of any execution same as an interleaving of individual programs
• Weak consistency, synchronized operations are used by the programmer to enforce sequential consistency
• Any accesses to shared data can be controlled with synchronized operations– Locks, etc
Release Consistency
• Extension of weak consistency• Specified synchronization operation– Acquire operation, used before a shared variable or
variables are to be read– Release operations, used after the shared variable
or variable have been altered
• Acquire is performed with a lock operation• Release is performed with an unlock operation
Release Consistency
Lazy Release Consistency
• Version of release consistency• Update is only done at the time of acquire
rather than at release• Generates fewer messages that release
consistency
Lazy Release Consistency
Distributed Shared Memory Programming Primitives
• Four fundamental and necessary operations of shared memory programming:– Process/thread creations and termination– Shared data creation– Mutual exclusion synchronization, controlled
access to shared data– Process/thread and event synchronization
• Typically provided by user-level library calls
Process Creation
• Set of routines are defined by DSM systems– Such as Adsmith and TreadMarks
• Used to start new process if process creation is supported– dsm_spawn(filename, num_processes);
Shared Data Creation
• Routine is necessary to declare shared data– dsm_shared(&x); or shared int x;– Dynamically creates memory space for shared
data in the manner of a C malloc
• After memory space can be discarded
Shared Data Access
• Various forms of data access are provided depending on the memory consistency used
• Some systems provide efficient routines for difference classes of accesses
• Adsmith provides three types of accesses:– Ordinary Accesse– Synchronization Access– Non-Synchronization Access
Synchronization Accesses
• Two principle forms:– Global synchronization and process-process pair
synchronization• Global is usually done through barrier routines• Process-process pair can be done by the same
routine or separate routines through simple synchronous send/receive routines
• DSM systems could also provide their own routines
Overlapping Computations with Communications
• Can be provided by starting a nonblocking communication before it results are needed– Called a prefetch routine
• Program continues execution after the prefetch has been called and while the data is being fetched
• Could even be done speculatively• Special mechanism must be in place to handle memory
exceptions• Similar to speculative load mechanism used in
advanced processors that overlap memory operations with program execution
Distributed Shared Memory Programming
• DSM programming on a cluster uses the same concepts as shared memory programming on a shared memory multiprocessor system
• Uses user level library routines or methods • Message passing is hidden from the user
Basic Shared-Variable Implementation
• Simplest DSM implementation is to use a shared variable approach with user level DSM library routines– Sitting on top of an existing message passing systems,
such as MPI– Routines can be embodied into classes and methods
• The routines could send messages to a central location that is responsible for the shared variables
Simple DSM System using a Centralized Server
Single reader/writer protocol
Basic Shared-Variable Implementation
• A simple DSM system using a centralized server can easily result in a bottleneck
• One method to reduce this bottleneck is to have multiple servers running on different processors
• Each server responsible for specific shared variables
• This is a single reader / single writer protocol
Simple DSM System using Multiple Servers
Basic Shared-Variable Implementation
• Also can provide multiple reader capability • A specific server is responsible for the shared
variable• Other local copies are invalidated
Simple DSM System using Multiple Servers and Multiple Reader Policy
Overlapping Data Groups
• Existing interconnections structure• Access patterns of the application• Static overlapping– Defined by the programmer prior to execution
• Shared variables can migrate according to usage
Symmetrical Multiprocessor System with Overlapping Data Regions
Simple DSM System using Multiple Servers and Multiple Reader Policy
Questions or Comments?
Recommended