217
Spin Locks and Contention Management The Art of Multiprocessor Programming Spring 2007

Spin Locks and Contention Management The Art of Multiprocessor Programming Spring 2007

Embed Size (px)

Citation preview

Spin Locks and Contention

ManagementThe Art of

Multiprocessor Programming

Spring 2007

© Herlihy-Shavit 2007 2

Focus so far: Correctness

• Models– Accurate (we never lied to you)

– But idealized (so we forgot to mention a few things)

• Protocols– Elegant– Important– But naïve

© Herlihy-Shavit 2007 3

New Focus: Performance

• Models– More complicated (not the same as complex!)

– Still focus on principles (not soon obsolete)

• Protocols– Elegant (in their fashion)

– Important (why else would we pay attention)

– And realistic (your mileage may vary)

© Herlihy-Shavit 2007 4

Kinds of Architectures

• SISD (Uniprocessor)– Single instruction stream– Single data stream

• SIMD (Vector)– Single instruction– Multiple data

• MIMD (Multiprocessors)– Multiple instruction– Multiple data.

© Herlihy-Shavit 2007 5

Kinds of Architectures

• SISD (Uniprocessor)– Single instruction stream– Single data stream

• SIMD (Vector)– Single instruction– Multiple data

• MIMD (Multiprocessors)– Multiple instruction– Multiple data.

Our space

(1)

© Herlihy-Shavit 2007 6

MIMD Architectures

• Memory Contention• Communication Contention • Communication Latency

Shared Bus

memory

Distributed

© Herlihy-Shavit 2007 7

Today: Revisit Mutual Exclusion

• Think of performance, not just correctness and progress

• Begin to understand how performance depends on our software properly utilizing the multipprocessor machines hardware

• And get to know a collection of locking algorithms…

(1)

© Herlihy-Shavit 2007 8

What Should you do if you can’t get a lock?

• Keep trying– “spin” or “busy-wait”– Good if delays are short

• Give up the processor– Good if delays are long– Always good on uniprocessor

(1)

© Herlihy-Shavit 2007 9

What Should you do if you can’t get a lock?

• Keep trying– “spin” or “busy-wait”– Good if delays are short

• Give up the processor– Good if delays are long– Always good on uniprocessor

our focus

© Herlihy-Shavit 2007 10

Basic Spin-Lock

CS

Resets lock upon exit

spin lock

critical section

...

© Herlihy-Shavit 2007 11

Basic Spin-Lock

CS

Resets lock upon exit

spin lock

critical section

...

…lock introduces sequntial bottleneck

© Herlihy-Shavit 2007 12

Basic Spin-Lock

CS

Resets lock upon exit

spin lock

critical section

...

…lock suffers from contention

© Herlihy-Shavit 2007 13

Basic Spin-Lock

CS

Resets lock upon exit

spin lock

critical section

...Notice: these are two different phenomenon

…lock suffers from contention

Seq Bottleneck no parallelism

© Herlihy-Shavit 2007 14

Basic Spin-Lock

CS

Resets lock upon exit

spin lock

critical section

...Contention ???

…lock suffers from contention

© Herlihy-Shavit 2007 15

Review: Test-and-Set

• Boolean value• Test-and-set (TAS)

– Swap true with current value– Return value tells if prior value was

true or false

• Can reset just by writing false• TAS aka “getAndSet”

© Herlihy-Shavit 2007 16

Review: Test-and-Set

public class AtomicBoolean { boolean value; public synchronized boolean getAndSet(boolean newValue) {

boolean prior = value; value = newValue; return prior; }}

(5)

© Herlihy-Shavit 2007 17

Review: Test-and-Set

public class AtomicBoolean { boolean value; public synchronized boolean getAndSet(boolean newValue) {

boolean prior = value; value = newValue; return prior; }}

Packagejava.util.concurrent.atomic

© Herlihy-Shavit 2007 18

Review: Test-and-Set

public class AtomicBoolean { boolean value; public synchronized boolean getAndSet(boolean newValue) {

boolean prior = value; value = newValue; return prior; }}

Swap old and new values

© Herlihy-Shavit 2007 19

Review: Test-and-Set

AtomicBoolean lock = new AtomicBoolean(false)…boolean prior = lock.getAndSet(true)

© Herlihy-Shavit 2007 20

Review: Test-and-Set

AtomicBoolean lock = new AtomicBoolean(false)…boolean prior = lock.getAndSet(true)

(5)

Swapping in true is called “test-and-set” or TAS

© Herlihy-Shavit 2007 21

Test-and-Set Locks

• Locking– Lock is free: value is false– Lock is taken: value is true

• Acquire lock by calling TAS– If result is false, you win– If result is true, you lose

• Release lock by writing false

© Herlihy-Shavit 2007 22

Test-and-set Lock

class TASlock { AtomicBoolean state = new AtomicBoolean(false);

void lock() { while (state.getAndSet(true)) {} } void unlock() { state.set(false); }}

© Herlihy-Shavit 2007 23

Test-and-set Lock

class TASlock { AtomicBoolean state = new AtomicBoolean(false);

void lock() { while (state.getAndSet(true)) {} } void unlock() { state.set(false); }}

Lock state is AtomicBoolean

© Herlihy-Shavit 2007 24

Test-and-set Lock

class TASlock { AtomicBoolean state = new AtomicBoolean(false);

void lock() { while (state.getAndSet(true)) {} } void unlock() { state.set(false); }}

Keep trying until lock acquired

© Herlihy-Shavit 2007 25

Test-and-set Lock

class TASlock { AtomicBoolean state = new AtomicBoolean(false);

void lock() { while (state.getAndSet(true)) {} } void unlock() { state.set(false); }}

Release lock by resetting state to

false

© Herlihy-Shavit 2007 26

Space Complexity

• TAS spin-lock has small “footprint” • N thread spin-lock uses O(1) space• As opposed to O(n)

Peterson/Bakery • How did we overcome the (n)

lower bound? • We used a RMW operation…

© Herlihy-Shavit 2007 27

Performance

• Experiment– n threads– Increment shared counter 1 million

times

• How long should it take?• How long does it take?

© Herlihy-Shavit 2007 28

Graph

ideal

tim e

threads

no speedup because of sequential bottleneck

© Herlihy-Shavit 2007 29

Mystery #1ti

m e

threads

TAS lock

Ideal

(1)

What is going on?

Lets try and fix

it…

© Herlihy-Shavit 2007 30

Test-and-Test-and-Set Locks

• Lurking stage– Wait until lock “looks” free– Spin while read returns true (lock

taken)• Pouncing state

– As soon as lock “looks” available– Read returns false (lock free)– Call TAS to acquire lock– If TAS loses, back to lurking

© Herlihy-Shavit 2007 31

Test-and-test-and-set Lock

class TTASlock { AtomicBoolean state = new AtomicBoolean(false);

void lock() { while (true) { while (state.get()) {} if (!state.getAndSet(true)) return; }}

© Herlihy-Shavit 2007 32

Test-and-test-and-set Lock

class TTASlock { AtomicBoolean state = new AtomicBoolean(false);

void lock() { while (true) { while (state.get()) {} if (!state.getAndSet(true)) return; }} Wait until lock looks free

© Herlihy-Shavit 2007 33

Test-and-test-and-set Lock

class TTASlock { AtomicBoolean state = new AtomicBoolean(false);

void lock() { while (true) { while (state.get()) {} if (!state.getAndSet(true)) return; }}

Then try to acquire it

© Herlihy-Shavit 2007 34

Mystery #2

TAS lock

TTAS lock

Ideal

tim e

threads

© Herlihy-Shavit 2007 35

Mystery

• Both– TAS and TTAS– Do the same thing (in our model)

• Except that– TTAS performs much better than TAS– Neither approaches ideal

© Herlihy-Shavit 2007 36

Opinion

• Our memory abstraction is broken• TAS & TTAS methods

– Are provably the same (in our model)

– Except they aren’t (in field tests)

• Need a more detailed model …

© Herlihy-Shavit 2007 37

Bus-Based Architectures

Bus

cache

memory

cachecache

© Herlihy-Shavit 2007 38

Bus-Based Architectures

Bus

cache

memory

cachecache

Random access memory (10s of cycles)

© Herlihy-Shavit 2007 39

Bus-Based Architectures

cache

memory

cachecache

Shared Bus•broadcast medium•One broadcaster at a time•Processors and memory all “snoop”

Bus

© Herlihy-Shavit 2007 40

Bus-Based Architectures

Bus

cache

memory

cachecache

Per-Processor Caches•Small•Fast: 1 or 2 cycles•Address & state information

© Herlihy-Shavit 2007 41

Jargon Watch

• Cache hit– “I found what I wanted in my cache”– Good Thing™

© Herlihy-Shavit 2007 42

Jargon Watch

• Cache hit– “I found what I wanted in my cache”– Good Thing™

• Cache miss– “I had to shlep all the way to memory

for that data”– Bad Thing™

© Herlihy-Shavit 2007 43

Cave Canem

• This model is still a simplification– But not in any essential way– Illustrates basic principles

• Will discuss complexities later

© Herlihy-Shavit 2007 44

Bus

Processor Issues Load Request

cache

memory

cachecache

data

© Herlihy-Shavit 2007 45

Bus

Processor Issues Load Request

Bus

cache

memory

cachecache

data

Gimmedata

© Herlihy-Shavit 2007 46

cache

Bus

Memory Responds

Bus

memory

cachecache

data

Got your data right here data

© Herlihy-Shavit 2007 47

Bus

Processor Issues Load Request

memory

cachecachedata

data

Gimmedata

© Herlihy-Shavit 2007 48

Bus

Processor Issues Load Request

Bus

memory

cachecachedata

data

Gimmedata

© Herlihy-Shavit 2007 49

Bus

Processor Issues Load Request

Bus

memory

cachecachedata

data

I got data

© Herlihy-Shavit 2007 50

Bus

Other Processor Responds

memory

cachecache

data

I got data

datadata

Bus

© Herlihy-Shavit 2007 51

Bus

Other Processor Responds

memory

cachecache

data

datadata

Bus

© Herlihy-Shavit 2007 52

Modify Cached Data

Bus

data

memory

cachedata

data

(1)

© Herlihy-Shavit 2007 53

Modify Cached Data

Bus

data

memory

cachedata

data

data

(1)

© Herlihy-Shavit 2007 54

memory

Bus

data

Modify Cached Data

cachedata

data

© Herlihy-Shavit 2007 55

memory

Bus

data

Modify Cached Data

cache

What’s up with the other copies?

data

data

© Herlihy-Shavit 2007 56

Cache Coherence

• We have lots of copies of data– Original copy in memory – Cached copies at processors

• Some processor modifies its own copy– What do we do with the others?– How to avoid confusion?

© Herlihy-Shavit 2007 57

Write-Back Caches

• Accumulate changes in cache• Write back when needed

– Need the cache for something else– Another processor wants it

• On first modification– Invalidate other entries– Requires non-trivial protocol …

© Herlihy-Shavit 2007 58

Write-Back Caches

• Cache entry has three states– Invalid: contains raw seething bits– Valid: I can read but I can’t write– Dirty: Data has been modified

• Intercept other load requests• Write back to memory before using cache

© Herlihy-Shavit 2007 59

Bus

Invalidate

memory

cachedatadata

data

© Herlihy-Shavit 2007 60

Bus

Invalidate

Bus

memory

cachedatadata

data

Mine, all mine!

© Herlihy-Shavit 2007 61

Bus

Invalidate

Bus

memory

cachedatadata

data

cache

Uh,oh

© Herlihy-Shavit 2007 62

cache

Bus

Invalidate

memory

cachedata

data

Other caches lose read permission

© Herlihy-Shavit 2007 63

cache

Bus

Invalidate

memory

cachedata

data

Other caches lose read permission

This cache acquires write permission

© Herlihy-Shavit 2007 64

cache

Bus

Invalidate

memory

cachedata

data

Memory provides data only if not present in any cache, so no need

to change it now (expensive)

(2)

© Herlihy-Shavit 2007 65

cache

Bus

Another Processor Asks for Data

memory

cachedata

data

(2)

Bus

© Herlihy-Shavit 2007 66

cache data

Bus

Owner Responds

memory

cachedata

data

(2)

Bus

Here it is!

© Herlihy-Shavit 2007 67

Bus

End of the Day …

memory

cachedata

data

(1)

Reading OK, no writing

data data

© Herlihy-Shavit 2007 68

Mutual Exclusion

• What do we want to optimize?– Bus bandwidth used by spinning

threads– Release/Acquire latency– Acquire latency for idle lock

© Herlihy-Shavit 2007 69

Simple TASLock

• TAS invalidates cache lines• Spinners

– Miss in cache– Go to bus

• Thread wants to release lock– delayed behind spinners

© Herlihy-Shavit 2007 70

Test-and-test-and-set

• Wait until lock “looks” free– Spin on local cache– No bus use while lock busy

• Problem: when lock is released– Invalidation storm …

© Herlihy-Shavit 2007 71

Local Spinning while Lock is Busy

Bus

memory

busybusybusy

busy

© Herlihy-Shavit 2007 72

Bus

On Release

memory

freeinvalidinvalid

free

© Herlihy-Shavit 2007 73

On Release

Bus

memory

freeinvalidinvalid

free

miss miss

Everyone misses, rereads

(1)

© Herlihy-Shavit 2007 74

On Release

Bus

memory

freeinvalidinvalid

free

TAS(…) TAS(…)

Everyone tries TAS

(1)

© Herlihy-Shavit 2007 75

Problems

• Everyone misses– Reads satisfied sequentially

• Everyone does TAS– Invalidates others’ caches

• Eventually quiesces after lock acquired– How long does this take?

© Herlihy-Shavit 2007 76

Measuring Quiescence Time

P1

P2

Pn

X = time of ops that don’t use the busY = time of ops that cause intensive bus traffic

In critical section, run ops X then ops Y. As long as Quiescence time is less than X, no drop in performance.

By gradually varying X, can determine the exact time to quiesce.

© Herlihy-Shavit 2007 77

Quiescence Time

Increses linearly with the number of processors for bus architectureti

m e

threads

© Herlihy-Shavit 2007 78

Mystery Explained

TAS lock

TTAS lock

Ideal

tim e

threads

Better than TAS but still

not as good as ideal

© Herlihy-Shavit 2007 79

Solution: Introduce Delay

spin locktimedr1dr2d

• If the lock looks free• But I fail to get it

• There must be lots of contention• Better to back off than to collide again

© Herlihy-Shavit 2007 80

Dynamic Example: Exponential Backoff

timed2d4d spin lock

If I fail to get lock– wait random duration before retry– Each subsequent failure doubles

expected wait

© Herlihy-Shavit 2007 81

Exponential Backoff Lock

public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getAndSet(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}}

© Herlihy-Shavit 2007 82

Exponential Backoff Lock

public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getAndSet(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}} Fix minimum delay

© Herlihy-Shavit 2007 83

Exponential Backoff Lock

public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getAndSet(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}} Wait until lock looks free

© Herlihy-Shavit 2007 84

Exponential Backoff Lock

public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getAndSet(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}} If we win, return

© Herlihy-Shavit 2007 85

Exponential Backoff Lock

public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getAndSet(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}}

Back off for random duration

© Herlihy-Shavit 2007 86

Exponential Backoff Lock

public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getAndSet(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}}

Double max delay, within reason

© Herlihy-Shavit 2007 87

Spin-Waiting Overhead

TTAS Lock

Backoff lock

tim e

threads

© Herlihy-Shavit 2007 88

Backoff: Other Issues

• Good– Easy to implement– Beats TTAS lock

• Bad– Must choose parameters carefully– Not portable across platforms

© Herlihy-Shavit 2007 89

Idea

• Avoid useless invalidations– By keeping a queue of threads

• Each thread– Notifies next in line– Without bothering the others

© Herlihy-Shavit 2007 90

Anderson Queue Lock

flags

next

T F F F F F F F

idle

© Herlihy-Shavit 2007 91

Anderson Queue Lock

flags

next

T F F F F F F F

acquiring

getAndIncrement

© Herlihy-Shavit 2007 92

Anderson Queue Lock

flags

next

T F F F F F F F

acquiring

getAndIncrement

© Herlihy-Shavit 2007 93

Anderson Queue Lock

flags

next

T F F F F F F F

acquired

Mine!

© Herlihy-Shavit 2007 94

Anderson Queue Lock

flags

next

T F F F F F F F

acquired acquiring

© Herlihy-Shavit 2007 95

Anderson Queue Lock

flags

next

T F F F F F F F

acquired acquiring

getAndIncrement

© Herlihy-Shavit 2007 96

Anderson Queue Lock

flags

next

T F F F F F F F

acquired acquiring

getAndIncrement

© Herlihy-Shavit 2007 97

acquired

Anderson Queue Lock

flags

next

T F F F F F F F

acquiring

© Herlihy-Shavit 2007 98

released

Anderson Queue Lock

flags

next

T T F F F F F F

acquired

© Herlihy-Shavit 2007 99

released

Anderson Queue Lock

flags

next

T T F F F F F F

acquired

Yow!

© Herlihy-Shavit 2007 100

Anderson Queue Lock

class ALock implements Lock { boolean[] flags={true,false,…,false}; AtomicInteger next = new AtomicInteger(0); int[] slot = new int[n];

© Herlihy-Shavit 2007 101

Anderson Queue Lock

class ALock implements Lock { boolean[] flags={true,false,…,false}; AtomicInteger next = new AtomicInteger(0); int[] slot = new int[n];

One flag per thread

© Herlihy-Shavit 2007 102

Anderson Queue Lock

class ALock implements Lock { boolean[] flags={true,false,…,false}; AtomicInteger next = new AtomicInteger(0); int[] slot = new int[n];

Next flag to use

© Herlihy-Shavit 2007 103

Anderson Queue Lock

class ALock implements Lock { boolean[] flags={true,false,…,false}; AtomicInteger next = new AtomicInteger(0); ThreadLocal<Integer> mySlot;

Thread-local variable

© Herlihy-Shavit 2007 104

Anderson Queue Lock

public lock() { int mySlot = next.getAndIncrement(); while (!flags[mySlot % n]) {}; flags[mySlot % n] = false;}

public unlock() { flags[(mySlot+1) % n] = true;}

© Herlihy-Shavit 2007 105

Anderson Queue Lock

public lock() { int mySlot = next.getAndIncrement(); while (!flags[mySlot % n]) {}; flags[mySlot % n] = false;}

public unlock() { flags[(mySlot+1) % n] = true;} Take next slot

© Herlihy-Shavit 2007 106

Anderson Queue Lock

public lock() { int mySlot = next.getAndIncrement(); while (!flags[mySlot % n]) {}; flags[mySlot % n] = false;}

public unlock() { flags[(mySlot+1) % n] = true;} Spin until told to go

© Herlihy-Shavit 2007 107

Anderson Queue Lock

public lock() { int slot[i]=next.getAndIncrement(); while (!flags[slot[i] % n]) {}; flags[slot[i] % n] = false;}

public unlock() { flags[slot[i]+1 % n] = true;} Prepare slot for re-use

© Herlihy-Shavit 2007 108

Anderson Queue Lock

public lock() { int mySlot = next.getAndIncrement(); while (!flags[mySlot % n]) {}; flags[mySlot % n] = false;}

public unlock() { flags[(mySlot+1) % n] = true;}

Tell next thread to go

© Herlihy-Shavit 2007 109

Performance

• Curve is practically flat

• Scalable performance

• FIFO fairness

queue

TTAS

© Herlihy-Shavit 2007 110

Anderson Queue Lock

• Good– First truly scalable lock– Simple, easy to implement

• Bad– Space hog– One bit per thread

• Unknown number of threads?• Small number of actual contenders?

© Herlihy-Shavit 2007 111

CLH Lock

• FIFO order• Small, constant-size overhead per

thread

© Herlihy-Shavit 2007 112

Initially

false

tail

idle

© Herlihy-Shavit 2007 113

Initially

false

tail

idle

Queue tail

© Herlihy-Shavit 2007 114

Initially

false

tail

idle

Lock is free

© Herlihy-Shavit 2007 115

Initially

false

tail

idle

© Herlihy-Shavit 2007 116

Purple Wants the Lock

false

tail

acquiring

© Herlihy-Shavit 2007 117

Purple Wants the Lock

false

tail

acquiring

true

© Herlihy-Shavit 2007 118

Purple Wants the Lock

falsetail

acquiring

true

Swap

© Herlihy-Shavit 2007 119

Purple Has the Lock

false

tail

acquired

true

© Herlihy-Shavit 2007 120

Red Wants the Lock

false

tail

acquired acquiring

true true

© Herlihy-Shavit 2007 121

Red Wants the Lock

false

tail

acquired acquiring

true

Swap

true

© Herlihy-Shavit 2007 122

Red Wants the Lock

false

tail

acquired acquiring

true true

© Herlihy-Shavit 2007 123

Red Wants the Lock

false

tail

acquired acquiring

true true

© Herlihy-Shavit 2007 124

Red Wants the Lock

false

tail

acquired acquiring

true true

trueActually, it spins on cached copy

© Herlihy-Shavit 2007 125

Purple Releases

false

tail

release acquiring

false true

falseBingo

!

© Herlihy-Shavit 2007 126

Purple Releases

tail

released acquired

true

© Herlihy-Shavit 2007 127

Space Usage

• Let– L = number of locks– N = number of threads

• ALock– O(LN)

• CLH lock– O(L+N)

© Herlihy-Shavit 2007 128

CLH Queue Lock

class Qnode { AtomicBoolean locked = new AtomicBoolean(true);}

© Herlihy-Shavit 2007 129

CLH Queue Lock

class Qnode { AtomicBoolean locked = new AtomicBoolean(true);}

Not released yet

© Herlihy-Shavit 2007 130

CLH Queue Lockclass CLHLock implements Lock { AtomicReference<Qnode> tail; ThreadLocal<Qnode> myNode = new Qnode(); public void lock() { Qnode pred = queue.getAndSet(myNode); while (pred.locked) {} }}

(3)

© Herlihy-Shavit 2007 131

CLH Queue Lockclass CLHLock implements Lock { AtomicReference<Qnode> tail; ThreadLocal<Qnode> myNode = new Qnode(); public void lock() { Qnode pred = queue.getAndSet(myNode); while (pred.locked) {} }}

(3)

Tail of the queue

© Herlihy-Shavit 2007 132

CLH Queue Lockclass CLHLock implements Lock { AtomicReference<Qnode> tail; ThreadLocal<Qnode> myNode = new Qnode(); public void lock() { Qnode pred = queue.getAndSet(myNode); while (pred.locked) {} }}

(3)

Thread-local Qnode

© Herlihy-Shavit 2007 133

CLH Queue Lockclass CLHLock implements Lock { AtomicReference<Qnode> tail; ThreadLocal<Qnode> myNode = new Qnode(); public void lock() { Qnode pred = queue.getAndSet(myNode); while (pred.locked) {} }}

(3)

Swap in my node

© Herlihy-Shavit 2007 134

CLH Queue Lockclass CLHLock implements Lock { AtomicReference<Qnode> tail; ThreadLocal<Qnode> myNode = new Qnode(); public void lock() { Qnode pred = queue.getAndSet(myNode); while (pred.locked) {} }}

(3)

Spin until predecessorreleases lock

© Herlihy-Shavit 2007 135

CLH Queue LockClass CLHLock implements Lock { … public void unlock() { myNode.locked.set(false); myNode = pred; }}

(3)

© Herlihy-Shavit 2007 136

CLH Queue LockClass CLHLock implements Lock { … public void unlock() { myNode.locked.set(false); myNode = pred; }}

(3)

Notify successor

© Herlihy-Shavit 2007 137

CLH Queue LockClass CLHLock implements Lock { … public void unlock() { myNode.locked.set(false); myNode = pred; }}

(3)

Recycle predecessor’s

node

© Herlihy-Shavit 2007 138

CLH Lock

• Good– Lock release affects predecessor only– Small, constant-sized space

• Bad– Doesn’t work for uncached NUMA

architectures

© Herlihy-Shavit 2007 139

NUMA Architecturs

• Acronym:– Non-Uniform Memory Architecture

• Illusion:– Flat shared memory

• Truth:– No caches (sometimes)– Some memory regions faster than

others

© Herlihy-Shavit 2007 140

NUMA Machines

Spinning on local memory is fast

© Herlihy-Shavit 2007 141

NUMA Machines

Spinning on remote memory is slow

© Herlihy-Shavit 2007 142

CLH Lock

• Each thread spin’s on predecessor’s memory

• Could be far away …

© Herlihy-Shavit 2007 143

MCS Lock

• FIFO order• Spin on local memory only• Small, Constant-size overhead

© Herlihy-Shavit 2007 144

Initially

false

tail false

idle

© Herlihy-Shavit 2007 145

Acquiring

false

queuefalse

true

acquiring

(allocate Qnode)

© Herlihy-Shavit 2007 146

Acquiring

false

tail false

true

acquired

swap

© Herlihy-Shavit 2007 147

Acquiring

false

tail false

true

acquired

© Herlihy-Shavit 2007 148

Acquired

false

tail false

true

acquired

© Herlihy-Shavit 2007 149

Acquiring

tail

true

acquiredacquiring

trueswap

© Herlihy-Shavit 2007 150

Acquiring

tail

acquiredacquiring

true

true

© Herlihy-Shavit 2007 151

Acquiring

tail

acquiredacquiring

true

true

© Herlihy-Shavit 2007 152

Acquiring

tail

acquiredacquiring

true

true

© Herlihy-Shavit 2007 153

Acquiring

tail

acquiredacquiring

true

true

false

© Herlihy-Shavit 2007 154

Acquiring

tail

acquiredacquiring

true

trueYes!

false

© Herlihy-Shavit 2007 155

MCS Queue Lock

class Qnode { boolean locked = false; qnode next = null;}

© Herlihy-Shavit 2007 156

MCS Queue Lockclass MCSLock implements Lock { AtomicReference tail; public void lock() { Qnode qnode = new Qnode(); Qnode pred = tail.getAndSet(qnode); if (pred != null) { qnode.locked = true; pred.next = qnode; while (qnode.locked) {} }}}

(3)

© Herlihy-Shavit 2007 157

MCS Queue Lockclass MCSLock implements Lock { AtomicReference tail; public void lock() { Qnode qnode = new Qnode(); Qnode pred = tail.getAndSet(qnode); if (pred != null) { qnode.locked = true; pred.next = qnode; while (qnode.locked) {} }}}

(3)

Make a QNode

© Herlihy-Shavit 2007 158

MCS Queue Lockclass MCSLock implements Lock { AtomicReference tail; public void lock() { Qnode qnode = new Qnode(); Qnode pred = tail.getAndSet(qnode); if (pred != null) { qnode.locked = true; pred.next = qnode; while (qnode.locked) {} }}}

(3)

add my Node to the tail of

queue

© Herlihy-Shavit 2007 159

MCS Queue Lockclass MCSLock implements Lock { AtomicReference tail; public void lock() { Qnode qnode = new Qnode(); Qnode pred = tail.getAndSet(qnode); if (pred != null) { qnode.locked = true; pred.next = qnode; while (qnode.locked) {} }}}

(3)

Fix if queue was non-

empty

© Herlihy-Shavit 2007 160

MCS Queue Lockclass MCSLock implements Lock { AtomicReference tail; public void lock() { Qnode qnode = new Qnode(); Qnode pred = tail.getAndSet(qnode); if (pred != null) { qnode.locked = true; pred.next = qnode; while (qnode.locked) {} }}}

(3)

Wait until unlocked

© Herlihy-Shavit 2007 161

MCS Queue Lockclass MCSLock implements Lock { AtomicReference tail; public void unlock() { if (qnode.next == null) { if (queue.CAS(qnode, null) return; while (qnode.next == null) {} } qnode.next.locked = false;}}

(3)

© Herlihy-Shavit 2007 162

MCS Queue Lockclass MCSLock implements Lock { AtomicReference tail; public void unlock() { if (qnode.next == null) { if (queue.CAS(qnode, null) return; while (qnode.next == null) {} } qnode.next.locked = false;}}

(3)

Missingsuccessor?

© Herlihy-Shavit 2007 163

MCS Queue Lockclass MCSLock implements Lock { AtomicReference tail; public void unlock() { if (qnode.next == null) { if (queue.CAS(qnode, null) return; while (qnode.next == null) {} } qnode.next.locked = false;}}

(3)

If really no successor, return

© Herlihy-Shavit 2007 164

MCS Queue Lockclass MCSLock implements Lock { AtomicReference tail; public void unlock() { if (qnode.next == null) { if (queue.CAS(qnode, null) return; while (qnode.next == null) {} } qnode.next.locked = false;}}

(3)

Otherwise wait for successor to catch up

© Herlihy-Shavit 2007 165

MCS Queue Lockclass MCSLock implements Lock { AtomicReference queue; public void unlock() { if (qnode.next == null) { if (queue.CAS(qnode, null) return; while (qnode.next == null) {} } qnode.next.locked = false;}}

(3)

Pass lock to successor

© Herlihy-Shavit 2007 166

Purple Release

false

releasing swap

false

(2)

© Herlihy-Shavit 2007 167

Purple Release

false

releasing swap

false

By looking at the queue, I see another

thread is active

(2)

© Herlihy-Shavit 2007 168

Purple Release

false

releasing swap

false

By looking at the queue, I see another

thread is active

I have to wait for that thread to finish

(2)

© Herlihy-Shavit 2007 169

Purple Release

false

releasing spinning

true

© Herlihy-Shavit 2007 170

Purple Release

false

releasing spinning

truefalse

© Herlihy-Shavit 2007 171

Purple Release

false

releasing spinning

true

locked

false

© Herlihy-Shavit 2007 172

Abortable Locks

• What if you want to give up waiting for a lock?

• For example– Timeout– Database transaction aborted by user

© Herlihy-Shavit 2007 173

Back-off Lock

• Aborting is trivial– Just return from lock() call

• Extra benefit:– No cleaning up– Wait-free– Immediate return

© Herlihy-Shavit 2007 174

Queue Locks

• Can’t just quit– Thread in line behind will starve

• Need a graceful way out

© Herlihy-Shavit 2007 175

Queue Locks

spinning

true

spinning

truetrue

spinning

© Herlihy-Shavit 2007 176

Queue Locks

spinning

true

spinning

truefalse

locked

© Herlihy-Shavit 2007 177

Queue Locks

spinning

true

locked

false

© Herlihy-Shavit 2007 178

Queue Locks

locked

false

© Herlihy-Shavit 2007 179

Queue Locks

spinning

true

spinning

truetrue

spinning

© Herlihy-Shavit 2007 180

Queue Locks

spinning

truetruetrue

spinning

© Herlihy-Shavit 2007 181

Queue Locks

spinning

truetruefalse

locked

© Herlihy-Shavit 2007 182

Queue Locks

spinning

truefalse

© Herlihy-Shavit 2007 183

Queue Locks

pwned

truefalse

© Herlihy-Shavit 2007 184

Abortable CLH Lock

• When a thread gives up– Removing node in a wait-free way is

hard

• Idea:– let successor deal with it.

© Herlihy-Shavit 2007 185

Initially

tail

idlePointer to

predecessor (or null)

A

© Herlihy-Shavit 2007 186

Initially

tail

idleDistinguished available

node means lock is free

A

© Herlihy-Shavit 2007 187

Acquiring

tail

acquiring

A

© Herlihy-Shavit 2007 188

Acquiringacquiring

A

Null predecessor means lock not

released or aborted

© Herlihy-Shavit 2007 189

Acquiringacquiring

A

Swap

© Herlihy-Shavit 2007 190

Acquiringacquiring

A

© Herlihy-Shavit 2007 191

Acquiredlocked

A

Pointer to AVAILABLE

means lock is free.

© Herlihy-Shavit 2007 192

Normal Case

spinningspinninglocked

Null means lock is not free & request not

aborted

© Herlihy-Shavit 2007 193

One Thread Aborts

spinningTimed outlocked

© Herlihy-Shavit 2007 194

Successor Notices

spinningTimed outlocked

Non-Null means predecessor

aborted

© Herlihy-Shavit 2007 195

Recycle Predecessor’s Node

spinninglocked

© Herlihy-Shavit 2007 196

Spin on Earlier Node

spinninglocked

© Herlihy-Shavit 2007 197

Spin on Earlier Node

spinningreleased

A

The lock is now mine

© Herlihy-Shavit 2007 198

Time-out Lockpublic class TOLock implements Lock { static Qnode AVAILABLE = new Qnode(); AtomicReference<Qnode> tail; ThreadLocal<Qnode> myNode;

(3)

© Herlihy-Shavit 2007 199

Time-out Lockpublic class TOLock implements Lock { static Qnode AVAILABLE = new Qnode(); AtomicReference<Qnode> tail; ThreadLocal<Qnode> myNode;

(3)

Distinguished node to signify free lock

© Herlihy-Shavit 2007 200

Time-out Lockpublic class TOLock implements Lock { static Qnode AVAILABLE = new Qnode(); AtomicReference<Qnode> tail; ThreadLocal<Qnode> myNode;

(3)

Tail of the queue

© Herlihy-Shavit 2007 201

Time-out Lockpublic class TOLock implements Lock { static Qnode AVAILABLE = new Qnode(); AtomicReference<Qnode> tail; ThreadLocal<Qnode> myNode;

(3)

Remember my node …

© Herlihy-Shavit 2007 202

Time-out Lockpublic boolean lock(long timeout) { Qnode qnode = new Qnode(); myNode.set(qnode); qnode.prev = null; Qnode pred = tail.getAndSet(qnode); if (pred == null || pred.prev == AVAILABLE) { return true; }…

(3)

© Herlihy-Shavit 2007 203

Time-out Lockpublic boolean lock(long timeout) { Qnode qnode = new Qnode(); myNode.set(qnode); qnode.prev = null; Qnode pred = tail.getAndSet(qnode); if (pred == null || pred.prev == AVAILABLE) { return true; }

(3)

Create & initialize node

© Herlihy-Shavit 2007 204

Time-out Lockpublic boolean lock(long timeout) { Qnode qnode = new Qnode(); myNode.set(qnode); qnode.prev = null; Qnode pred = tail.getAndSet(qnode); if (pred == null || pred.prev == AVAILABLE) { return true; }

(3)

Swap with tail

© Herlihy-Shavit 2007 205

Time-out Lockpublic boolean lock(long timeout) { Qnode qnode = new Qnode(); myNode.set(qnode); qnode.prev = null; Qnode pred = tail.getAndSet(qnode); if (pred == null || pred.prev == AVAILABLE) { return true; } ...

(3)

If predecessor absent or released, we are

done

© Herlihy-Shavit 2007 206

Time-out Lock… long start = now(); while (now()- start < timeout) { Qnode predPred = pred.prev; if (predPred == AVAILABLE) { return true; } else if (predPred != null) { pred = predPred; } } …

(3)

© Herlihy-Shavit 2007 207

Time-out Lock… long start = now(); while (now()- start < timeout) { Qnode predPred = pred.prev; if (predPred == AVAILABLE) { return true; } else if (predPred != null) { pred = predPred; } } …

(3)

Keep trying for a while …

© Herlihy-Shavit 2007 208

Time-out Lock… long start = now(); while (now()- start < timeout) { Qnode predPred = pred.prev; if (predPred == AVAILABLE) { return true; } else if (predPred != null) { pred = predPred; } } …

(3)

Spin on predecessor’s prev field

© Herlihy-Shavit 2007 209

Time-out Lock… long start = now(); while (now()- start < timeout) { Qnode predPred = pred.prev; if (predPred == AVAILABLE) { return true; } else if (predPred != null) { pred = predPred; } } …

(3)

Predecessor released lock

© Herlihy-Shavit 2007 210

Time-out Lock… long start = now(); while (now()- start < timeout) { Qnode predPred = pred.prev; if (predPred == AVAILABLE) { return true; } else if (predPred != null) { pred = predPred; } } …

(3)

Predecessor aborted, advance one

© Herlihy-Shavit 2007 211

Time-out Lock…if (!tail.compareAndSet(qnode, pred)) qnode.prev = pred; return false; }}

(3)

© Herlihy-Shavit 2007 212

Time-out Lock…if (!tail.compareAndSet(qnode, pred)) qnode.prev = pred; return false; }}

(3)

Try to put predecessor back

© Herlihy-Shavit 2007 213

Time-out Lock…if (!tail.compareAndSet(qnode, pred)) qnode.prev = pred; return false; }}

(3)

Otherwise, if it’s too late, redirect to predecessor

© Herlihy-Shavit 2007 214

Time-out Lockpublic void unlock() { Qnode qnode = myNode.get(); if (!tail.compareAndSet(qnode, null)) qnode.prev = AVAILABLE;}

(3)

© Herlihy-Shavit 2007 215

Time-out Lockpublic void unlock() { Qnode qnode = myNode.get(); if (!tail.compareAndSet(qnode, null)) qnode.prev = AVAILABLE;}

(3)

Clean up if no one waiting

© Herlihy-Shavit 2007 216

Time-out Lockpublic void unlock() { Qnode qnode = myNode.get(); if (!tail.compareAndSet(qnode, null)) qnode.prev = AVAILABLE;}

(3)

Notify successor by pointing to AVAILABLE

node

© Herlihy-Shavit 2007 217

One Lock To Rule Them All?

• TTAS+Backoff, CLH, MCS, ToLock…• Each better than others in some

way• There is no one solution• Lock we pick really depends on:

– the application– the hardware– which properties are important