Upload
emma-owen
View
216
Download
0
Tags:
Embed Size (px)
Citation preview
Wait-Free Multi-Word Compare-And-Swap using Greedy Helping and Grabbing Håkan Sundell
PDPTA 2009
Outline
Synchronization Methods Multi-Word Compare-And-Swap
ProblemsNew Wait-Free Algorithm
Experiments Conclusions
Synchronization
Shared memory easily enables shared (e.g. multi-thread accessible) data structures
Shared data structures needs synchronization!
Accesses and especially updates must be coordinated to establish consistency. Updates should be done as atomic transactions.
T1T2
T3
Hardware Synchronization Primitives Consensus 1
Atomic Read/Write
Consensus 2 Atomic Test-And-Set (TAS), Fetch-And-Add
(FAA), Swap
Consensus Infinite Atomic Compare-And-Swap (CAS) Atomic Load-Linked/Store-Conditionally
ReadWrite
Read
M=f(M,…)
Universal and Conditional Synchronization primitive Compare-And-Swap (CAS)
bool CAS(int *p, int old, int new) {atomic {
if(*p == old) {*p=new;return true;
}else return false;
}}
This single-word transaction primitive is supported (or equivalent) in hardware on all contemporary systems
However, multi-word transactions must be done in software
Mutual Exclusion
Mutual exclusion (e.g. locks) can be used for multi-word atomicity in software
Access to shared data will be atomic because of lock
Reduced parallelism by definition Blocking, Danger of priority inversion and deadlocks.
• Solutions exists, but with high overhead, especially for multi-processor systems
T1T2
T3
Non-blocking Synchronization
Avoids blocking by performing the operation/changes using atomic primitives
Lock-Free Synchronization Optimistic approach
• Retries until succeeding
Guarantees progress of at least one operation
Wait-Free Synchronization Always finishes in a finite number of its own
steps• Requires coordination with all concurrent
operations
Wait-Free Synchronization
Wait-Free Algorithms Usually very complex
• Hard to design and prove correct
Offers strong real-time guarantees Usually offers significantly worse average
performance than lock-free. Dynamic memory allocation needs wait-free
memory management• By definition, all sub-operations of a wait-free
operation also has to be wait-free• Atomic primitives are assumed to be wait-free
Multi-Word Compare-And-Swap Operations:
bool CASN(int *p1, int o1, int n1, … , int *pN, int oN, int nN,);
int Read(int *p);
Not supported by hardware Contemporary hardware only supports atomic update
of one memory word Achieved by lifting abstraction level
All operations on affected memory words has to go via the new abstraction layer
Using the underlying hardware primitives
Multi-Word Compare-And-Swap Standard setup:
Assign a lock to each invididual memory word. Standard algoritmic approach (CASN):
1. Try to acquire a lock on all positions of interest. 2. If already taken, help (i.e. perform) corresponding
operation 3. If all taken and all match, change status of
operation 4. Remove locks and possibly write new values
Concurrent Read() must check if word is locked or not in order to decide the current value
Conflict resolution
The concurrent CASN operations possibly need to lock a subset of same words If done in different order (i.e. not sorting the
pointers arguments of the CASN call, p1<…<pN) this can lead to deadlock scenarios
Wait-Free Multi-Word Compare-And-SwapNew Approach
Wait-free memory management (IPDPS 2005) for handling descriptor (used for representing the ongoing CASN state) allocation.
Improved performance Greedy helping
• Never help more than absolutely necessary to continue Fast look-up of word’s current value
• Improves Read operation performance• Improves CASN operation performance
Allow un-sorted pointers arguments Grabbing
• Help until definitive conflict and then apply deterministic lock stealing and lock hand-over to resolve the deadlock
Descriptor structure allowing fast look-up of current value Allows 31 bits of a 32-bit memory word
to represent the actual value. The corresponding old or new value can
be indexed (1…N) directly
Experiments
Micro benchmark using Read() and CASN()Each thread repeatedly performs
updates of N memory words• Runs for 5 seconds, and number of
successful updates are measuredFor each experiment varies parameters
• N is either 2, 4, 8, or 16 words• N is selected (for each update) randomly from 2,4,
… , 16384 words• Number of threads is varied between 8, 16 or 32
Experiments
In each micro benchmark compares with 2 of the latest (e.g. fastest) CASN implementations in the literature Harris et al, 2002. Lock-Free.
• Allows 30 bits of word to be used for value.• Requires pointers arguments to be sorted.
Ha and Tsigas, 2004. Lock-Free.• Needs underlying LL/SC primitive implementation (e.g.
Michael 2004). • Allows (with selected LL/SC) any number of bits for
value.• Requires pointers arguments to be sorted.
Experiments – Some results
Conclusions
New Wait-Free Algorithm for Multi-Word Compare-And-SwapGreedy helping and Grabbing
Extraordinary PerformanceEven better average performance
than corresponding lock-free in many scenarios!
• Especially in high contention