Upload
jaron
View
24
Download
0
Embed Size (px)
DESCRIPTION
TECHNIQUES FOR REDUCING CONSISTENCY-RELATED COMMUNICATION IN DISTRIBUTED SHARED-MEMORY SYSTEMS. J. B. Carter University of Utah J. K. Bennett and W. Zwaenepoel Rice University. INTRODUCTION. - PowerPoint PPT Presentation
Citation preview
TECHNIQUES FOR REDUCING CONSISTENCY-RELATED COMMUNICATION IN DISTRIBUTED
SHARED-MEMORY SYSTEMS
J. B. CarterUniversity of Utah
J. K. Bennett and W. ZwaenepoelRice University
INTRODUCTION
• Distributed shared memory is a software abstraction allowing a set of workstations connected by a LAN to share a single paged virtual address space
• Key issue in building a software DSM is minimizing the amount of data communication among the workstation memories
Why bother with DSM?
• Key idea is to build fast parallel computers that– are cheaper than conventional architectures– are convenient to use
• Conventional parallel computer architecture was the shared memory multiprocessor
CPU
Shared memory
Conventional parallel architecture
CACHE CACHE CACHE CACHE
CPU CPU CPU
Today’s architecture
• Clusters of workstations are much more cost effective– No need to develop complex bus and cache
structures– Can use off-the-shelf networking hardware
• Gigabit Ethernet • Myrinet (1.5 Gb/s)
– Can quickly integrate newest microprocessors
Limitations of cluster approach
• Communication within a cluster of workstation is through message passing– Much harder to program than concurrent
access to a shared memory• Many big programs were written for shared
memory architectures– Converting them to a message passing
architecture is a nightmare
Distributed shared memory
DSM = one shared global address space
main memories
Distributed shared memory
• DSM makes a cluster of workstations look like a shared memory parallel computer– Easier to write new programs– Easier to port existing programs
• Key problem is that DSM only provides the illusion of having a shared memory architecture– Data must still move back and forth among
the workstations
Characterizing a DSM (I)
• Four important issues:1. Size of transfer units (level of granularity)
• Big units are more efficient– Virtual memory pages
• Can have false sharing whenever page contains different variables that are accessed at the same time by different processors
False Sharing
accesses x accesses y
x y
page containing x and y will move back and forthbetween main memories of workstations
Characterizing a DSM (II)
2. Consistency model• Strict consistency is not possible• Various authors have proposed weak
consistency models–Cheaper to implement–Harder to use in a correct fashion
Characterizing a DSM (III)
3. Portability of programs• Some DSMs allow programs written for a
multiprocessor architecture to run on a cluster of workstations without any modifications (dusty decks)
• More efficient DSMs require more changes4. Portability of DSM
• Some DSMs require specific OS features
MUNIN
• Developed at Rice University• Based on software objects (variables)• Uses the processor virtual memory to detect
access to the shared objects• Includes several techniques for reducing
consistency-related communication• Only runs on top of V kernel
Key features
• Software release consistency: only requires the memory to be consistent at specific synchronization points,
• Multiple consistency protocols: allow the user to select the best consistency protocols for each data item,
• Write-shared protocols: reduce false sharing,• An update-with-timeout mechanism
SW RELEASE CONSISTENCY (I)
• Well-written parallel programs use locks to achieve mutual exclusion when they access shared variables– P(&mutex) and V(&mutex)– lock(&csect) and unlock(&csect) – request ( ) and release( )
• Unprotected accesses can produce unpredictable results
SW RELEASE CONSISTENCY (II)
• SW release consistency will only guarantee correctness of operations within a request/release pair
• No need to propagate new values of shared variables until the release
• Must guarantee that workstation has received the most recent values of all shared variables when it completes a request
SW RELEASE CONSISTENCY (III)shared int x;request( );
x = 1;release ( );// propagate x=1
shared int x;
request( );// wait for new value of x
x++;release ( );// propagate x=2
SW RELEASE CONSISTENCY (IV)
• Munin uses eager release: new values of shared variables are propagated at release time– Lazy release delays propagation until a
request is issued (Threadmarks)
• A workstation issuing a request gets the current values of all shared variables– Shared variables are not associated to a
particular critical section (as in Midway)
Munin Implementation (I)
• Three kinds of variables: 1. Ordinary variables: can only be accessed by
the process that created them2. Shared data variables: should always be
accessed from within critical regions3. Synchronization variables:
• locks, barriers or condition variables• must be accessed through special library
procedures .
Munin Implementation (II)
• When a processor modifies shared data inside a critical region, all update messages are buffered and delayed until the processor leaves the critical region
• Processes accessing shared data variables outside critical regions do it at their own risks– Same as with shared memory model– Risk is higher
FOUR CONSISTENCY PROTOCOLS
1. Conventional shared variables: – Replicated on demand – Single writer/multiple readers policy uses
an invalidation-based protocol 2. Read-only variables:
– Replicated on demand– Any attempt to modify them will result in a
runtime error
FOUR CONSISTENCY PROTOCOLS
3. Migratory variables:– Migrated among the processes accessing
them– Every process accessing them will always
get full read and write access4. Write-shared variables:
– Can be updated concurrently because different portions of the page are accessed
Implementation
• Programmer uses annotations to specify any of the last three consistency protocols– Read-only variables– Migratory variables– Write-shared variables
• Incorrect annotations may result in inefficient performance or in runtime errors but not in incorrect results
WRITE-SHARED PROTOCOL (I)
• Designed to fight false sharing• Uses a copy-on-write mechanism• Whenever a process is granted access to write-
shared data, the page containing these data is marked copy-on-write
• First attempt to modify the contents of the page will result in the creation of a copy of the page modified (the twin).
x = 1
y = 2
x = 1
y = 2
First write access
twin
x = 3
y = 2
Before
After
Compare with twinNew value of x is 3
Example
WRITE-SHARED PROTOCOL (II)
• At release time, the DSM will perform a word by word comparison of the page and its twin, store the diff in the space used by the twin page and notify all processors having a copy of the shared data of the update
• A runtime switch can be set to check for conflicting updates to write-shared data.
UPDATE TIME-OUT MECHANISM
• Munin does not send updates to processors holding stale replicas
• Anytime a processor receives an update for a page for which it does not have a twin, the page is marked supervisor-only and the time of receipt of the update is recorded.
• First local access to the page will cause a trap that will remove the restriction
UPDATE TIME-OUT MECHANISM
• When a process receives an update for a page that is still marked supervisor only, it checks the timestamp of the last update
• If more than 50 ms have elapsed, process notifies the originator of the update not to send more updates and invalidates the page.
CONCLUSIONS (I)
• The strongest point of Munin is its excellent performance– typically within 5 to 33% of the performances
of hand-coded message passing versions of the same programs
• Its major limitation is its dependence of some features of the V kernel
CONCLUSIONS (II)
• Munin requires programs to access shared data from within critical regions or after barriers– Appears to be a reasonable requirement
• Munin allows users to tune the performance of their programs by selecting the best consistency protocol for each shared variable– Can quickly become a tedious process
FURTHER DEVELOPMENTS
• Same team has come with a successor to Munin named TreadMarks
• Key differences are:– TreadMarks uses a more complex
lazy release protocol– TreadMarks is UNIX-based
• More portable