17
Wait-Free Multi-Word Compare-And-Swap using Greedy Helping and Grabbing Håkan Sundell PDPTA 2009

Wait-Free Multi-Word Compare- And-Swap using Greedy Helping and Grabbing Håkan Sundell PDPTA 2009

Embed Size (px)

Citation preview

Page 1: Wait-Free Multi-Word Compare- And-Swap using Greedy Helping and Grabbing Håkan Sundell PDPTA 2009

Wait-Free Multi-Word Compare-And-Swap using Greedy Helping and Grabbing Håkan Sundell

PDPTA 2009

Page 2: Wait-Free Multi-Word Compare- And-Swap using Greedy Helping and Grabbing Håkan Sundell PDPTA 2009

Outline

Synchronization Methods Multi-Word Compare-And-Swap

ProblemsNew Wait-Free Algorithm

Experiments Conclusions

Page 3: Wait-Free Multi-Word Compare- And-Swap using Greedy Helping and Grabbing Håkan Sundell PDPTA 2009

Synchronization

Shared memory easily enables shared (e.g. multi-thread accessible) data structures

Shared data structures needs synchronization!

Accesses and especially updates must be coordinated to establish consistency. Updates should be done as atomic transactions.

T1T2

T3

Page 4: Wait-Free Multi-Word Compare- And-Swap using Greedy Helping and Grabbing Håkan Sundell PDPTA 2009

Hardware Synchronization Primitives Consensus 1

Atomic Read/Write

Consensus 2 Atomic Test-And-Set (TAS), Fetch-And-Add

(FAA), Swap

Consensus Infinite Atomic Compare-And-Swap (CAS) Atomic Load-Linked/Store-Conditionally

ReadWrite

Read

M=f(M,…)

Page 5: Wait-Free Multi-Word Compare- And-Swap using Greedy Helping and Grabbing Håkan Sundell PDPTA 2009

Universal and Conditional Synchronization primitive Compare-And-Swap (CAS)

bool CAS(int *p, int old, int new) {atomic {

if(*p == old) {*p=new;return true;

}else return false;

}}

This single-word transaction primitive is supported (or equivalent) in hardware on all contemporary systems

However, multi-word transactions must be done in software

Page 6: Wait-Free Multi-Word Compare- And-Swap using Greedy Helping and Grabbing Håkan Sundell PDPTA 2009

Mutual Exclusion

Mutual exclusion (e.g. locks) can be used for multi-word atomicity in software

Access to shared data will be atomic because of lock

Reduced parallelism by definition Blocking, Danger of priority inversion and deadlocks.

• Solutions exists, but with high overhead, especially for multi-processor systems

T1T2

T3

Page 7: Wait-Free Multi-Word Compare- And-Swap using Greedy Helping and Grabbing Håkan Sundell PDPTA 2009

Non-blocking Synchronization

Avoids blocking by performing the operation/changes using atomic primitives

Lock-Free Synchronization Optimistic approach

• Retries until succeeding

Guarantees progress of at least one operation

Wait-Free Synchronization Always finishes in a finite number of its own

steps• Requires coordination with all concurrent

operations

Page 8: Wait-Free Multi-Word Compare- And-Swap using Greedy Helping and Grabbing Håkan Sundell PDPTA 2009

Wait-Free Synchronization

Wait-Free Algorithms Usually very complex

• Hard to design and prove correct

Offers strong real-time guarantees Usually offers significantly worse average

performance than lock-free. Dynamic memory allocation needs wait-free

memory management• By definition, all sub-operations of a wait-free

operation also has to be wait-free• Atomic primitives are assumed to be wait-free

Page 9: Wait-Free Multi-Word Compare- And-Swap using Greedy Helping and Grabbing Håkan Sundell PDPTA 2009

Multi-Word Compare-And-Swap Operations:

bool CASN(int *p1, int o1, int n1, … , int *pN, int oN, int nN,);

int Read(int *p);

Not supported by hardware Contemporary hardware only supports atomic update

of one memory word Achieved by lifting abstraction level

All operations on affected memory words has to go via the new abstraction layer

Using the underlying hardware primitives

Page 10: Wait-Free Multi-Word Compare- And-Swap using Greedy Helping and Grabbing Håkan Sundell PDPTA 2009

Multi-Word Compare-And-Swap Standard setup:

Assign a lock to each invididual memory word. Standard algoritmic approach (CASN):

1. Try to acquire a lock on all positions of interest. 2. If already taken, help (i.e. perform) corresponding

operation 3. If all taken and all match, change status of

operation 4. Remove locks and possibly write new values

Concurrent Read() must check if word is locked or not in order to decide the current value

Page 11: Wait-Free Multi-Word Compare- And-Swap using Greedy Helping and Grabbing Håkan Sundell PDPTA 2009

Conflict resolution

The concurrent CASN operations possibly need to lock a subset of same words If done in different order (i.e. not sorting the

pointers arguments of the CASN call, p1<…<pN) this can lead to deadlock scenarios

Page 12: Wait-Free Multi-Word Compare- And-Swap using Greedy Helping and Grabbing Håkan Sundell PDPTA 2009

Wait-Free Multi-Word Compare-And-SwapNew Approach

Wait-free memory management (IPDPS 2005) for handling descriptor (used for representing the ongoing CASN state) allocation.

Improved performance Greedy helping

• Never help more than absolutely necessary to continue Fast look-up of word’s current value

• Improves Read operation performance• Improves CASN operation performance

Allow un-sorted pointers arguments Grabbing

• Help until definitive conflict and then apply deterministic lock stealing and lock hand-over to resolve the deadlock

Page 13: Wait-Free Multi-Word Compare- And-Swap using Greedy Helping and Grabbing Håkan Sundell PDPTA 2009

Descriptor structure allowing fast look-up of current value Allows 31 bits of a 32-bit memory word

to represent the actual value. The corresponding old or new value can

be indexed (1…N) directly

Page 14: Wait-Free Multi-Word Compare- And-Swap using Greedy Helping and Grabbing Håkan Sundell PDPTA 2009

Experiments

Micro benchmark using Read() and CASN()Each thread repeatedly performs

updates of N memory words• Runs for 5 seconds, and number of

successful updates are measuredFor each experiment varies parameters

• N is either 2, 4, 8, or 16 words• N is selected (for each update) randomly from 2,4,

… , 16384 words• Number of threads is varied between 8, 16 or 32

Page 15: Wait-Free Multi-Word Compare- And-Swap using Greedy Helping and Grabbing Håkan Sundell PDPTA 2009

Experiments

In each micro benchmark compares with 2 of the latest (e.g. fastest) CASN implementations in the literature Harris et al, 2002. Lock-Free.

• Allows 30 bits of word to be used for value.• Requires pointers arguments to be sorted.

Ha and Tsigas, 2004. Lock-Free.• Needs underlying LL/SC primitive implementation (e.g.

Michael 2004). • Allows (with selected LL/SC) any number of bits for

value.• Requires pointers arguments to be sorted.

Page 16: Wait-Free Multi-Word Compare- And-Swap using Greedy Helping and Grabbing Håkan Sundell PDPTA 2009

Experiments – Some results

Page 17: Wait-Free Multi-Word Compare- And-Swap using Greedy Helping and Grabbing Håkan Sundell PDPTA 2009

Conclusions

New Wait-Free Algorithm for Multi-Word Compare-And-SwapGreedy helping and Grabbing

Extraordinary PerformanceEven better average performance

than corresponding lock-free in many scenarios!

• Especially in high contention