26
Challenges in Non-Blocking Synchronization Håkan Sundell , Ph.D. Guest seminar at Department of Computer Science, University of Tromsö, Norway, 8 Dec 2005

Challenges in Non-Blocking Synchronization Håkan Sundell, Ph.D. Guest seminar at Department of Computer Science, University of Tromsö, Norway, 8 Dec 2005

Embed Size (px)

Citation preview

Challenges in Non-Blocking Synchronization

Håkan Sundell , Ph.D.

Guest seminar at Department of Computer Science, University of Tromsö, Norway, 8 Dec 2005

8 Dec 2005Håkan Sundell2

Outline

Shared Data Structures and Synchronization

Non-Blocking Synchronization Problems Results Open

Applications and Performance Future Conclusions

8 Dec 2005Håkan Sundell3

Shared Memory

CPU CPU CPU

CPU CPU CPU CPU CPU CPU

Cache Cache Cache

Cache bus Cache bus Cache bus

Memory

Memory Memory Memory

...

. . .

... .... . .

- Uniform Memory Access (UMA)

- Non-Uniform Memory Access (NUMA)

8 Dec 2005Håkan Sundell4

Synchronization

Shared data structures needs synchronization!

Accesses and updates must be coordinated to establish consistency.

8 Dec 2005Håkan Sundell5

Mutual Exclusion

Access to shared data will be atomic because of lock

Reduced Parallelism by definitionBlocking, Danger of priority inversion

and deadlocks.• Solutions exists, but with high overhead,

especially for multi-processor systems

P1P2

P3

8 Dec 2005Håkan Sundell6

Hardware Synchronization Primitives Weak

Atomic Read/Write

Stronger Atomic Test-And-Set (TAS), Fetch-And-Add

(FAA), Swap

Universal Atomic Compare-And-Swap (CAS) Atomic Load-Linked/Store-Conditionally

ReadWrite

Read

M=f(M,…)

8 Dec 2005Håkan Sundell7

Universal and Conditional Synchronization primitive Compare-And-Swap (CAS)

bool CAS(int *p, int old, int new) {atomic {

if(*p == old) {*p=new;return true;

}else return false;

}}

8 Dec 2005Håkan Sundell8

Non-blocking Synchronization

Perform operation/changes using atomic primitives

Lock-Free SynchronizationOptimistic approach

• Retries until succeeding

Wait-Free SynchronizationAlways finishes in a finite number of

its own steps• Coordination with all participants

8 Dec 2005Håkan Sundell9

Non-Blocking Synchronization: Example

Lock-Free Stack (i.e. Free-list): Create a linked-list of the free nodes,

allocate/reclaim using CAS

How to make sure that the next-pointer of the first item is not changed before CAS?

Head Mem 1 Mem 2 Mem i…

Used 1Reclaim

Allocate

8 Dec 2005Håkan Sundell10

Non-Blocking Synchronization: Problems Algorithmic design

Operations on shared data structures usually involve updates of several shared variables

• Modern shared memory systems only support atomic primitives on single memory words!

Want parallelism – Avoid bottlenecksAll sub-operations also have to be

lock-free/wait-free

8 Dec 2005Håkan Sundell11

Non-Blocking Synchronization: Problems Correctness

Linearizability. In order for an implementation to be linearizable, for every concurrent execution, there should exist an equal sequential execution that respects the partial order of the operations in the concurrent execution

Proofs• Series of intuitive lemmas• Formal proofs using semi-automatic proof-engines

requiring hundreds of invariants and several man-years of work.

8 Dec 2005Håkan Sundell12

Non-Blocking Synchronization: Problems Implementation

Modern shared memory systems do not offer sequential or equivalent level of consistency by default

• Out-of-order execution

Need to specify required read/write <-> read/write relative order for each memory access as needed

• Extra instructions inserted• Degrades out-of-order execution, i.e. significantly

degrades speed!

8 Dec 2005Håkan Sundell13

Non-Blocking Synchronization: Problems

Practical? Efficient?

AlgorithmicDesign

Correctness

Implementation

8 Dec 2005Håkan Sundell14

Non-Blocking Synchronization: Details Memory Management

Memory allocation Memory reclamation (garbage collection)

Atomic primitives Common shared data structures

Stack Queue Deque Priority Queue Dictionary Hash Table Linked Lists

8 Dec 2005Håkan Sundell15

Lock-Free Memory Management Memory Allocation

Valois 1995, fixed block-size, fixed purpose Michael 2004, Gidenstam et al. 2004, any

size, any purpose Garbage Collection

Valois 1995, (Detlefs et al. 2001); reference counting

Michael 2002, (Herlihy et al. 2002); hazard pointers

Gidenstam, Papatriantafilou, Sundell and Tsigas 2005, ”Efficient and Reliable Memory Reclamation Based on Reference Counting”

8 Dec 2005Håkan Sundell16

Wait-Free Memory Management

Hesselink and Groote 2001. Limited to shared tokens.

Sundell 2005. ”Wait-Free Reference Counting and Memory Management”

• Memory Allocation – fixed block-size, fixed purpose

• Garbage Collection – reference counting

8 Dec 2005Håkan Sundell17

Software Synchronization Primitives Atomic Read/Write. Several results

published WF/LF.

Multi-variable Read/Write, i.e. Snapshot. Several results published WF/LF.

LL/SC. Several results published WF/LF. Multi-word Compare-And-Swap (CASN) i.e.

transactions Several results published LF.

8 Dec 2005Håkan Sundell18

Shared Data Structures

Lock-FreeStack

• Valois 1995, Michael 2002

Queue• Valois 1995, Tsigas and Zhang 2001,

Michael 2002 and much more

Deque• Michael 2003, Sundell and Tsigas 2004

8 Dec 2005Håkan Sundell19

Shared Data Structures

Lock-Free Priority Queue

• Sundell and Tsigas 2003

Dictionary• (Harris et al. 2001) , Sundell and Tsigas 2003

Hash Tables• Michael 2002 and much more

Linked Lists• Singly-Linked: Valois 1995, (Harris et al. 2001)• Doubly-Linked: Sundell and Tsigas 2004

8 Dec 2005Håkan Sundell20

Non-Blocking Synchronization: Open Wait-Free Memory Management.

Improvement Atomic primitives.

Wait-Free multi-word compare-and-swap Lock-Free Data Structures.

Tree Improvement, disjoint-access-parallelism

Wait-Free Data Structures. Stack, queue, priority queue, dictionary,

hash table, linked lists, trees

8 Dec 2005Håkan Sundell21

Applications and Performance

Ocean simulates eddy currents in an ocean basin.

Radiosity computes the equilibrium distribution of light in a scene using the radiosity method.

Volrend renders 3D volume data into an image using a ray-casting method.

Water Evaluates forces and potentials that occur over time between water molecules.

Spark98 a collection of sparse matrix kernels.

Each kernel performs a sequence of sparse matrix vector product operations using matrices that are derived from a family of three-dimensional finite element earthquake applications.

Tsigas and Zhang 2001

8 Dec 2005Håkan Sundell22

Tsigas and Zhang 2001

58P

58P

58P

58P

32P24P24P

8 Dec 2005Håkan Sundell23

Applications

Professional Lock-Free Queues are used in music

applications Lock-Free Dictionary by Sundell-Tsigas used

by several american financial services companies

Operating systems kernels Software libraries

Sundell and Tsigas 2002. ”NOBLE: A Non-Blocking Inter-Process Communication Library”

JAVA – J2SE 5.0

8 Dec 2005Håkan Sundell24

Future?

Distributed Shared MemoryAtomic synchronization primitives?Relaxed Memory Models?

Linearizability versus Weak memory consistency + forced consistency?

CPU CPU CPU CPU CPU CPUCache bus Cache bus Cache bus

Memory

... ... .... . .

I/O MemoryI/O MemoryI/O

8 Dec 2005Håkan Sundell25

Conclusions

Non-blocking synchronization Can be Practical and Efficient Large scope of lock-free shared data

structures available Used in practice

Future work Wait-free dynamic data structures. Exploit modern processor architectures Distributed architectures

8 Dec 2005Håkan Sundell26

Questions?

Contact Information: Address:

Håkan Sundell Computing ScienceChalmers University of

Technology Email:

[email protected] Web:

http://www.cs.chalmers.se/~phs