31
TECHNIQUES FOR REDUCING CONSISTENCY-RELATED COMMUNICATION IN DISTRIBUTED SHARED-MEMORY SYSTEMS J. B. Carter University of Utah J. K. Bennett and W. Zwaenepoel Rice University

TECHNIQUES FOR REDUCING CONSISTENCY-RELATED COMMUNICATION IN DISTRIBUTED SHARED-MEMORY SYSTEMS

  • Upload
    jaron

  • View
    24

  • Download
    0

Embed Size (px)

DESCRIPTION

TECHNIQUES FOR REDUCING CONSISTENCY-RELATED COMMUNICATION IN DISTRIBUTED SHARED-MEMORY SYSTEMS. J. B. Carter University of Utah J. K. Bennett and W. Zwaenepoel Rice University. INTRODUCTION. - PowerPoint PPT Presentation

Citation preview

Page 1: TECHNIQUES FOR REDUCING CONSISTENCY-RELATED COMMUNICATION IN DISTRIBUTED SHARED-MEMORY SYSTEMS

TECHNIQUES FOR REDUCING CONSISTENCY-RELATED COMMUNICATION IN DISTRIBUTED

SHARED-MEMORY SYSTEMS

J. B. CarterUniversity of Utah

J. K. Bennett and W. ZwaenepoelRice University

Page 2: TECHNIQUES FOR REDUCING CONSISTENCY-RELATED COMMUNICATION IN DISTRIBUTED SHARED-MEMORY SYSTEMS

INTRODUCTION

• Distributed shared memory is a software abstraction allowing a set of workstations connected by a LAN to share a single paged virtual address space

• Key issue in building a software DSM is minimizing the amount of data communication among the workstation memories

Page 3: TECHNIQUES FOR REDUCING CONSISTENCY-RELATED COMMUNICATION IN DISTRIBUTED SHARED-MEMORY SYSTEMS

Why bother with DSM?

• Key idea is to build fast parallel computers that– are cheaper than conventional architectures– are convenient to use

• Conventional parallel computer architecture was the shared memory multiprocessor

Page 4: TECHNIQUES FOR REDUCING CONSISTENCY-RELATED COMMUNICATION IN DISTRIBUTED SHARED-MEMORY SYSTEMS

CPU

Shared memory

Conventional parallel architecture

CACHE CACHE CACHE CACHE

CPU CPU CPU

Page 5: TECHNIQUES FOR REDUCING CONSISTENCY-RELATED COMMUNICATION IN DISTRIBUTED SHARED-MEMORY SYSTEMS

Today’s architecture

• Clusters of workstations are much more cost effective– No need to develop complex bus and cache

structures– Can use off-the-shelf networking hardware

• Gigabit Ethernet • Myrinet (1.5 Gb/s)

– Can quickly integrate newest microprocessors

Page 6: TECHNIQUES FOR REDUCING CONSISTENCY-RELATED COMMUNICATION IN DISTRIBUTED SHARED-MEMORY SYSTEMS

Limitations of cluster approach

• Communication within a cluster of workstation is through message passing– Much harder to program than concurrent

access to a shared memory• Many big programs were written for shared

memory architectures– Converting them to a message passing

architecture is a nightmare

Page 7: TECHNIQUES FOR REDUCING CONSISTENCY-RELATED COMMUNICATION IN DISTRIBUTED SHARED-MEMORY SYSTEMS

Distributed shared memory

DSM = one shared global address space

main memories

Page 8: TECHNIQUES FOR REDUCING CONSISTENCY-RELATED COMMUNICATION IN DISTRIBUTED SHARED-MEMORY SYSTEMS

Distributed shared memory

• DSM makes a cluster of workstations look like a shared memory parallel computer– Easier to write new programs– Easier to port existing programs

• Key problem is that DSM only provides the illusion of having a shared memory architecture– Data must still move back and forth among

the workstations

Page 9: TECHNIQUES FOR REDUCING CONSISTENCY-RELATED COMMUNICATION IN DISTRIBUTED SHARED-MEMORY SYSTEMS

Characterizing a DSM (I)

• Four important issues:1. Size of transfer units (level of granularity)

• Big units are more efficient– Virtual memory pages

• Can have false sharing whenever page contains different variables that are accessed at the same time by different processors

Page 10: TECHNIQUES FOR REDUCING CONSISTENCY-RELATED COMMUNICATION IN DISTRIBUTED SHARED-MEMORY SYSTEMS

False Sharing

accesses x accesses y

x y

page containing x and y will move back and forthbetween main memories of workstations

Page 11: TECHNIQUES FOR REDUCING CONSISTENCY-RELATED COMMUNICATION IN DISTRIBUTED SHARED-MEMORY SYSTEMS

Characterizing a DSM (II)

2. Consistency model• Strict consistency is not possible• Various authors have proposed weak

consistency models–Cheaper to implement–Harder to use in a correct fashion

Page 12: TECHNIQUES FOR REDUCING CONSISTENCY-RELATED COMMUNICATION IN DISTRIBUTED SHARED-MEMORY SYSTEMS

Characterizing a DSM (III)

3. Portability of programs• Some DSMs allow programs written for a

multiprocessor architecture to run on a cluster of workstations without any modifications (dusty decks)

• More efficient DSMs require more changes4. Portability of DSM

• Some DSMs require specific OS features

Page 13: TECHNIQUES FOR REDUCING CONSISTENCY-RELATED COMMUNICATION IN DISTRIBUTED SHARED-MEMORY SYSTEMS

MUNIN

• Developed at Rice University• Based on software objects (variables)• Uses the processor virtual memory to detect

access to the shared objects• Includes several techniques for reducing

consistency-related communication• Only runs on top of V kernel

Page 14: TECHNIQUES FOR REDUCING CONSISTENCY-RELATED COMMUNICATION IN DISTRIBUTED SHARED-MEMORY SYSTEMS

Key features

• Software release consistency: only requires the memory to be consistent at specific synchronization points,

• Multiple consistency protocols: allow the user to select the best consistency protocols for each data item,

• Write-shared protocols: reduce false sharing,• An update-with-timeout mechanism

Page 15: TECHNIQUES FOR REDUCING CONSISTENCY-RELATED COMMUNICATION IN DISTRIBUTED SHARED-MEMORY SYSTEMS

SW RELEASE CONSISTENCY (I)

• Well-written parallel programs use locks to achieve mutual exclusion when they access shared variables– P(&mutex) and V(&mutex)– lock(&csect) and unlock(&csect) – request ( ) and release( )

• Unprotected accesses can produce unpredictable results

Page 16: TECHNIQUES FOR REDUCING CONSISTENCY-RELATED COMMUNICATION IN DISTRIBUTED SHARED-MEMORY SYSTEMS

SW RELEASE CONSISTENCY (II)

• SW release consistency will only guarantee correctness of operations within a request/release pair

• No need to propagate new values of shared variables until the release

• Must guarantee that workstation has received the most recent values of all shared variables when it completes a request

Page 17: TECHNIQUES FOR REDUCING CONSISTENCY-RELATED COMMUNICATION IN DISTRIBUTED SHARED-MEMORY SYSTEMS

SW RELEASE CONSISTENCY (III)shared int x;request( );

x = 1;release ( );// propagate x=1

shared int x;

request( );// wait for new value of x

x++;release ( );// propagate x=2

Page 18: TECHNIQUES FOR REDUCING CONSISTENCY-RELATED COMMUNICATION IN DISTRIBUTED SHARED-MEMORY SYSTEMS

SW RELEASE CONSISTENCY (IV)

• Munin uses eager release: new values of shared variables are propagated at release time– Lazy release delays propagation until a

request is issued (Threadmarks)

• A workstation issuing a request gets the current values of all shared variables– Shared variables are not associated to a

particular critical section (as in Midway)

Page 19: TECHNIQUES FOR REDUCING CONSISTENCY-RELATED COMMUNICATION IN DISTRIBUTED SHARED-MEMORY SYSTEMS

Munin Implementation (I)

• Three kinds of variables: 1. Ordinary variables: can only be accessed by

the process that created them2. Shared data variables: should always be

accessed from within critical regions3. Synchronization variables:

• locks, barriers or condition variables• must be accessed through special library

procedures .

Page 20: TECHNIQUES FOR REDUCING CONSISTENCY-RELATED COMMUNICATION IN DISTRIBUTED SHARED-MEMORY SYSTEMS

Munin Implementation (II)

• When a processor modifies shared data inside a critical region, all update messages are buffered and delayed until the processor leaves the critical region

• Processes accessing shared data variables outside critical regions do it at their own risks– Same as with shared memory model– Risk is higher

Page 21: TECHNIQUES FOR REDUCING CONSISTENCY-RELATED COMMUNICATION IN DISTRIBUTED SHARED-MEMORY SYSTEMS

FOUR CONSISTENCY PROTOCOLS

1. Conventional shared variables: – Replicated on demand – Single writer/multiple readers policy uses

an invalidation-based protocol 2. Read-only variables:

– Replicated on demand– Any attempt to modify them will result in a

runtime error

Page 22: TECHNIQUES FOR REDUCING CONSISTENCY-RELATED COMMUNICATION IN DISTRIBUTED SHARED-MEMORY SYSTEMS

FOUR CONSISTENCY PROTOCOLS

3. Migratory variables:– Migrated among the processes accessing

them– Every process accessing them will always

get full read and write access4. Write-shared variables:

– Can be updated concurrently because different portions of the page are accessed

Page 23: TECHNIQUES FOR REDUCING CONSISTENCY-RELATED COMMUNICATION IN DISTRIBUTED SHARED-MEMORY SYSTEMS

Implementation

• Programmer uses annotations to specify any of the last three consistency protocols– Read-only variables– Migratory variables– Write-shared variables

• Incorrect annotations may result in inefficient performance or in runtime errors but not in incorrect results

Page 24: TECHNIQUES FOR REDUCING CONSISTENCY-RELATED COMMUNICATION IN DISTRIBUTED SHARED-MEMORY SYSTEMS

WRITE-SHARED PROTOCOL (I)

• Designed to fight false sharing• Uses a copy-on-write mechanism• Whenever a process is granted access to write-

shared data, the page containing these data is marked copy-on-write

• First attempt to modify the contents of the page will result in the creation of a copy of the page modified (the twin).

Page 25: TECHNIQUES FOR REDUCING CONSISTENCY-RELATED COMMUNICATION IN DISTRIBUTED SHARED-MEMORY SYSTEMS

x = 1

y = 2

x = 1

y = 2

First write access

twin

x = 3

y = 2

Before

After

Compare with twinNew value of x is 3

Example

Page 26: TECHNIQUES FOR REDUCING CONSISTENCY-RELATED COMMUNICATION IN DISTRIBUTED SHARED-MEMORY SYSTEMS

WRITE-SHARED PROTOCOL (II)

• At release time, the DSM will perform a word by word comparison of the page and its twin, store the diff in the space used by the twin page and notify all processors having a copy of the shared data of the update

• A runtime switch can be set to check for conflicting updates to write-shared data.

Page 27: TECHNIQUES FOR REDUCING CONSISTENCY-RELATED COMMUNICATION IN DISTRIBUTED SHARED-MEMORY SYSTEMS

UPDATE TIME-OUT MECHANISM

• Munin does not send updates to processors holding stale replicas

• Anytime a processor receives an update for a page for which it does not have a twin, the page is marked supervisor-only and the time of receipt of the update is recorded.

• First local access to the page will cause a trap that will remove the restriction

Page 28: TECHNIQUES FOR REDUCING CONSISTENCY-RELATED COMMUNICATION IN DISTRIBUTED SHARED-MEMORY SYSTEMS

UPDATE TIME-OUT MECHANISM

• When a process receives an update for a page that is still marked supervisor only, it checks the timestamp of the last update

• If more than 50 ms have elapsed, process notifies the originator of the update not to send more updates and invalidates the page.

Page 29: TECHNIQUES FOR REDUCING CONSISTENCY-RELATED COMMUNICATION IN DISTRIBUTED SHARED-MEMORY SYSTEMS

CONCLUSIONS (I)

• The strongest point of Munin is its excellent performance– typically within 5 to 33% of the performances

of hand-coded message passing versions of the same programs

• Its major limitation is its dependence of some features of the V kernel

Page 30: TECHNIQUES FOR REDUCING CONSISTENCY-RELATED COMMUNICATION IN DISTRIBUTED SHARED-MEMORY SYSTEMS

CONCLUSIONS (II)

• Munin requires programs to access shared data from within critical regions or after barriers– Appears to be a reasonable requirement

• Munin allows users to tune the performance of their programs by selecting the best consistency protocol for each shared variable– Can quickly become a tedious process

Page 31: TECHNIQUES FOR REDUCING CONSISTENCY-RELATED COMMUNICATION IN DISTRIBUTED SHARED-MEMORY SYSTEMS

FURTHER DEVELOPMENTS

• Same team has come with a successor to Munin named TreadMarks

• Key differences are:– TreadMarks uses a more complex

lazy release protocol– TreadMarks is UNIX-based

• More portable