39
Transactional Locking Nir Shavit Tel Aviv University (Joint work with Dave Dice and Ori Shalev)

No Slide Titleshanir/TL2-TV06.… · PPT file · Web view · 2006-12-04Concurrent Programming A FIFO Queue A Concurrent FIFO Queue Fine Grain Locks Lock-Free (JSR-166 ... V# VALIDATE

  • Upload
    dodieu

  • View
    214

  • Download
    0

Embed Size (px)

Citation preview

Page 1: No Slide Titleshanir/TL2-TV06.… · PPT file · Web view · 2006-12-04Concurrent Programming A FIFO Queue A Concurrent FIFO Queue Fine Grain Locks Lock-Free (JSR-166 ... V# VALIDATE

Transactional Locking

Nir ShavitTel Aviv University

(Joint work with Dave Dice and Ori Shalev)

Page 2: No Slide Titleshanir/TL2-TV06.… · PPT file · Web view · 2006-12-04Concurrent Programming A FIFO Queue A Concurrent FIFO Queue Fine Grain Locks Lock-Free (JSR-166 ... V# VALIDATE

object

object

Shared Memory

Concurrent Programming

How do we make the programmer’s life simple without slowing computation down to a halt?!

Page 3: No Slide Titleshanir/TL2-TV06.… · PPT file · Web view · 2006-12-04Concurrent Programming A FIFO Queue A Concurrent FIFO Queue Fine Grain Locks Lock-Free (JSR-166 ... V# VALIDATE

A FIFO Queue

b c d

TailHead

a

Enqueue(d)Dequeue() => a

Page 4: No Slide Titleshanir/TL2-TV06.… · PPT file · Web view · 2006-12-04Concurrent Programming A FIFO Queue A Concurrent FIFO Queue Fine Grain Locks Lock-Free (JSR-166 ... V# VALIDATE

A Concurrent FIFO Queuesynchronized{}

Object lock

b c d

TailHead

a

P: Dequeue() => a Q: Enqueue(d)

Page 5: No Slide Titleshanir/TL2-TV06.… · PPT file · Web view · 2006-12-04Concurrent Programming A FIFO Queue A Concurrent FIFO Queue Fine Grain Locks Lock-Free (JSR-166 ... V# VALIDATE

Fine Grain Locks

b c d

TailHead

a

P: Dequeue() => a Q: Enqueue(d)

Better Performance, More Complex Code

Verification nightmare: worry about deadlock, livelock…

Page 6: No Slide Titleshanir/TL2-TV06.… · PPT file · Web view · 2006-12-04Concurrent Programming A FIFO Queue A Concurrent FIFO Queue Fine Grain Locks Lock-Free (JSR-166 ... V# VALIDATE

Lock-Free (JSR-166)

b c d

TailHead

a

P: Dequeue() => a Q: Enqueue(d)

Even Better Performance, Even More Complex Code

Worry about deadlock, livelock, subtle bugs, hard to modify…

Page 7: No Slide Titleshanir/TL2-TV06.… · PPT file · Web view · 2006-12-04Concurrent Programming A FIFO Queue A Concurrent FIFO Queue Fine Grain Locks Lock-Free (JSR-166 ... V# VALIDATE

Transactional Memory[HerlihyMoss93]

Page 8: No Slide Titleshanir/TL2-TV06.… · PPT file · Web view · 2006-12-04Concurrent Programming A FIFO Queue A Concurrent FIFO Queue Fine Grain Locks Lock-Free (JSR-166 ... V# VALIDATE

Transactional Memory

b c d

TailHead

a

P: Dequeue() => a Q: Enqueue(d)

Don’t worry about deadlock, livelock, subtle bugs, etc…

Great Performance, Simple Code

Page 9: No Slide Titleshanir/TL2-TV06.… · PPT file · Web view · 2006-12-04Concurrent Programming A FIFO Queue A Concurrent FIFO Queue Fine Grain Locks Lock-Free (JSR-166 ... V# VALIDATE

Transactional Memory [Herlihy-Moss]

b c d

TailHead

a

P: Dequeue() => a Q: Enqueue(d)

Don’t worry about deadlock, livelock, subtle bugs, etc…

b

TailHead

a

Great Performance, Simple Code

Page 10: No Slide Titleshanir/TL2-TV06.… · PPT file · Web view · 2006-12-04Concurrent Programming A FIFO Queue A Concurrent FIFO Queue Fine Grain Locks Lock-Free (JSR-166 ... V# VALIDATE

TM: How Does It Work

synchronized{<sequence of instructions>}

atomic

Execute all synchronized instructions as an atomic transaction…

Page 11: No Slide Titleshanir/TL2-TV06.… · PPT file · Web view · 2006-12-04Concurrent Programming A FIFO Queue A Concurrent FIFO Queue Fine Grain Locks Lock-Free (JSR-166 ... V# VALIDATE

Hardware TM [Herlihy-Moss]

• Limitations: atomic{<~10-20-30?…but not ~1000 instructions>}

• Machines will differ in their support

• When we build 1000 instruction transactions, it will not be for free…

Page 12: No Slide Titleshanir/TL2-TV06.… · PPT file · Web view · 2006-12-04Concurrent Programming A FIFO Queue A Concurrent FIFO Queue Fine Grain Locks Lock-Free (JSR-166 ... V# VALIDATE

Software Transactional Memory• Implement transactions in Software• All the flexibility of hardware…today• Ability to extend hardware when it is

available (Hybrid TM)• But there are problems:

– Performance– Interaction with memory system– Safety/Containment

Page 13: No Slide Titleshanir/TL2-TV06.… · PPT file · Web view · 2006-12-04Concurrent Programming A FIFO Queue A Concurrent FIFO Queue Fine Grain Locks Lock-Free (JSR-166 ... V# VALIDATE

The Brief History of STM19

93ST

M (S

havi

t,Tou

itou)

2003

DST

M (H

erlih

y et

al)

2003

WST

M (F

rase

r, H

arris

)

Lock-free

2003

OST

M (F

rase

r, H

arris

)

2004

ASTM

(Mar

athe

et a

l)

2004

T-M

onito

r (Ja

gann

atha

n…)

Obstruction-free Lock-based

2005

Lock

-OST

M (E

nnal

s)

2004

Hyb

ridTM

(Moi

r)

2004

Met

a Tr

ans

(Her

lihy,

Sha

vit)

2005

McT

M (S

aha

et a

l)

2006

Atom

Java

(Hin

dman

…)

1997

Tran

s Su

ppor

t TM

(Moi

r)

2005

TL (D

ice,

Sha

vit))

2004

Soft

Tran

s (A

nani

an, R

inar

d)

Page 14: No Slide Titleshanir/TL2-TV06.… · PPT file · Web view · 2006-12-04Concurrent Programming A FIFO Queue A Concurrent FIFO Queue Fine Grain Locks Lock-Free (JSR-166 ... V# VALIDATE

As Good As Fine GrainedPostulate (i.e. take it or leave it):

If we could implement fine-grained locking with the same simplicity of course grained, we would never think of building a transactional memory.

Implication:

Lets try to provide TMs that get as close as possible to hand-crafted fine-grained locking.

Page 15: No Slide Titleshanir/TL2-TV06.… · PPT file · Web view · 2006-12-04Concurrent Programming A FIFO Queue A Concurrent FIFO Queue Fine Grain Locks Lock-Free (JSR-166 ... V# VALIDATE

Premise of Lock-based STMs1. Performance: ballpark fine grained2. Memory Lifecycle: work with GC or any

malloc/free3. HardwareSoftware: support

voluptuous transactions

4. Safety: need to work on coherent stateUnfortunately: OSTM, HyTM, Ennals, Saha, AtomJava deliver only 1 and 3 (in some cases)…

Page 16: No Slide Titleshanir/TL2-TV06.… · PPT file · Web view · 2006-12-04Concurrent Programming A FIFO Queue A Concurrent FIFO Queue Fine Grain Locks Lock-Free (JSR-166 ... V# VALIDATE

Transactional Locking

• Focus on TL2 algorithm [Dice,Shalev,Shavit]

• Delivers all four properties• Combines experience of prior art + - Uses Commit time locking instead of

Encounter order locking - Introduces a Global Version Clock

mechanism for validation

Page 17: No Slide Titleshanir/TL2-TV06.… · PPT file · Web view · 2006-12-04Concurrent Programming A FIFO Queue A Concurrent FIFO Queue Fine Grain Locks Lock-Free (JSR-166 ... V# VALIDATE

Locking STM Design Choices

Map Array of Versioned-Write-Locks

Application Memory

PS = Lock per Stripe (separate array of locks)

PO = Lock per Object(embedded in object)

V#

Page 18: No Slide Titleshanir/TL2-TV06.… · PPT file · Web view · 2006-12-04Concurrent Programming A FIFO Queue A Concurrent FIFO Queue Fine Grain Locks Lock-Free (JSR-166 ... V# VALIDATE

Encounter Order Locking (Undo Log)

1. To Read: load lock + location2. Check unlocked add to Read-Set3. To Write: lock location, store value 4. Add old value to undo-set5. Validate read-set v#’s unchanged6. Release each lock with v#+1

V# 0 V# 0 V# 0 V# 0 V# 0 V# 0 V# 0

X V# 1

V# 0 Y V# 1

V# 0 V# 0

Mem Locks

V#+1 0

V#+1 0

V# 0

V# 0

V# 0 V#+1 0 V# 0 V# 0 V# 0 V# 0

V#+1 0

V# 0

X

Y

Quick read of values freshly written by the reading transaction

[Ennals,Saha,Harris,…]

Page 19: No Slide Titleshanir/TL2-TV06.… · PPT file · Web view · 2006-12-04Concurrent Programming A FIFO Queue A Concurrent FIFO Queue Fine Grain Locks Lock-Free (JSR-166 ... V# VALIDATE

Commit Time Locking (Write Buff)

1. To Read: load lock + location2. Location in write-set? (Bloom Filter)3. Check unlocked add to Read-Set4. To Write: add value to write set5. Acquire Locks6. Validate read/write v#’s unchanged7. Release each lock with v#+1

V# 0 V# 0 V# 0 V# 0 V# 0 V# 0 V# 0

V# 0

V# 0 V# 0

V# 0 V# 0

Mem Locks

V#+1 0

V# 0

V# 0

Hold locks for very short duration

V# 1

V# 1

V# 1 X

Y

V#+1 0

V# 1 V#+1 0

V# 0 V#+1 0 V# 0 V# 0 V# 0 V# 0

V#+1 0

V# 0

X

Y

[TL,TL2]

Page 20: No Slide Titleshanir/TL2-TV06.… · PPT file · Web view · 2006-12-04Concurrent Programming A FIFO Queue A Concurrent FIFO Queue Fine Grain Locks Lock-Free (JSR-166 ... V# VALIDATE

Why COM and not ENC?

1. Under low load they perform pretty much the same.

2. COM withstands high loads (small structures or high write %). ENC does not withstand high loads.

3. COM works seamlessly with Malloc/Free. ENC does not work with Malloc/Free.

Page 21: No Slide Titleshanir/TL2-TV06.… · PPT file · Web view · 2006-12-04Concurrent Programming A FIFO Queue A Concurrent FIFO Queue Fine Grain Locks Lock-Free (JSR-166 ... V# VALIDATE

COM vs. ENC High Load

ENC

Hand

MCS

COM

Red-Black Tree 20% Delete 20% Update 60% Lookup

Page 22: No Slide Titleshanir/TL2-TV06.… · PPT file · Web view · 2006-12-04Concurrent Programming A FIFO Queue A Concurrent FIFO Queue Fine Grain Locks Lock-Free (JSR-166 ... V# VALIDATE

COM vs. ENC Low Load

COMENC

Hand

MCS

Red-Black Tree 5% Delete 5% Update 90% Lookup

Page 23: No Slide Titleshanir/TL2-TV06.… · PPT file · Web view · 2006-12-04Concurrent Programming A FIFO Queue A Concurrent FIFO Queue Fine Grain Locks Lock-Free (JSR-166 ... V# VALIDATE

COM: Works with Malloc/FreePS Lock ArrayA

B

To free B from transactional space: 1. Wait till its lock is free. 2. Free(B)

B is never written inconsistently because any write is preceded by a validation while holding lock

V# VALIDATEX FAILSIF INCONSISTENT

Page 24: No Slide Titleshanir/TL2-TV06.… · PPT file · Web view · 2006-12-04Concurrent Programming A FIFO Queue A Concurrent FIFO Queue Fine Grain Locks Lock-Free (JSR-166 ... V# VALIDATE

ENC: Fails with Malloc/FreePS Lock ArrayA

B

Cannot free B from transactional space because undo-log means locations are written after every lock acquisition and before validation.

Possible solution: validate after every lock acquisition (yuck)

V# VALIDATEX

Page 25: No Slide Titleshanir/TL2-TV06.… · PPT file · Web view · 2006-12-04Concurrent Programming A FIFO Queue A Concurrent FIFO Queue Fine Grain Locks Lock-Free (JSR-166 ... V# VALIDATE

Problem: Application Safety

1. All current lock based STMs work on inconsistent states.

2. They must introduce validation into user code at fixed intervals or loops, use traps, OS support,…

3. And still there are cases, however rare, where an error could occur in user code…

Page 26: No Slide Titleshanir/TL2-TV06.… · PPT file · Web view · 2006-12-04Concurrent Programming A FIFO Queue A Concurrent FIFO Queue Fine Grain Locks Lock-Free (JSR-166 ... V# VALIDATE

Solution: TL2’s “Version Clock”

• Have one shared global version clock• Incremented by (small subset of) writing

transactions• Read by all transactions• Used to validate that state worked on is

always consistent

Later: how we learned not to worry about contention and love the clock

Page 27: No Slide Titleshanir/TL2-TV06.… · PPT file · Web view · 2006-12-04Concurrent Programming A FIFO Queue A Concurrent FIFO Queue Fine Grain Locks Lock-Free (JSR-166 ... V# VALIDATE

Version Clock: Read-Only COM Trans

1. RV VClock2. On Read: read lock, read mem,

read lock: check unlocked, unchanged, and v# <= RV

3. Commit.

87 0 87 0

34 0 88 0 V# 0 44 0 V# 0

34 0

99 0 99 0

50 0 50 0

Mem Locks

Reads form a snapshot of memory.No read set!

100 VClock

87 0 34 0

99 0

50 0

87 0 34 0 88 0 V# 0 44 0 V# 0

99 0

50 0

100 RV

Page 28: No Slide Titleshanir/TL2-TV06.… · PPT file · Web view · 2006-12-04Concurrent Programming A FIFO Queue A Concurrent FIFO Queue Fine Grain Locks Lock-Free (JSR-166 ... V# VALIDATE

Version Clock: Writing COM Trans

1. RV VClock2. On Read/Write: check

unlocked and v# <= RV then add to Read/Write-Set

3. Acquire Locks4. WV = F&I(VClock)5. Validate each v# <= RV6. Release locks with v# WV

Reads+Inc+Writes=Linearizable

100 VClock

87 0 87 0

34 0 88 0

44 0 V# 0

34 0

99 0 99 0

50 0 50 0

Mem Locks

87 0

34 0

99 0

50 0

34 1

99 1

87 0 X

Y

Commit

121 0

121 0

50 0

87 0 121 0 88 0 V# 0 44 0 V# 0

121 0

50 0

100 RV

100120121

X

Y

Page 29: No Slide Titleshanir/TL2-TV06.… · PPT file · Web view · 2006-12-04Concurrent Programming A FIFO Queue A Concurrent FIFO Queue Fine Grain Locks Lock-Free (JSR-166 ... V# VALIDATE

Version Clock Implementation

• On sys-on-chip like Sun T2000™ Niagara: almost no contention, just CAS and be happy

• On others: add TID to VClock, if VClock has changed since last write can use new value +TID. Reduces contention by a factor of N.

• Future: Coherent Hardware VClock that guarantees unique tick per access.

Page 30: No Slide Titleshanir/TL2-TV06.… · PPT file · Web view · 2006-12-04Concurrent Programming A FIFO Queue A Concurrent FIFO Queue Fine Grain Locks Lock-Free (JSR-166 ... V# VALIDATE

Performance Benchmarks

• Mechanically Transformed Sequential Red-Black Tree using TL2

• Compare to STMs and hand-crafted fine-grained Red-Black implementation

• On a 16–way Sun Fire™ running Solaris™ 10

Page 31: No Slide Titleshanir/TL2-TV06.… · PPT file · Web view · 2006-12-04Concurrent Programming A FIFO Queue A Concurrent FIFO Queue Fine Grain Locks Lock-Free (JSR-166 ... V# VALIDATE

Uncontended Large Red-Black Tree5% Delete 5% Update 90% Lookup Hand-

crafted

TL/PSTL2/PS

TL/PO TL2/P0

Ennals

FarserHarrisLock-free

Page 32: No Slide Titleshanir/TL2-TV06.… · PPT file · Web view · 2006-12-04Concurrent Programming A FIFO Queue A Concurrent FIFO Queue Fine Grain Locks Lock-Free (JSR-166 ... V# VALIDATE

Uncontended Small RB-Tree5% Delete 5% Update 90% Lookup TL/P0

TL2/P0

Page 33: No Slide Titleshanir/TL2-TV06.… · PPT file · Web view · 2006-12-04Concurrent Programming A FIFO Queue A Concurrent FIFO Queue Fine Grain Locks Lock-Free (JSR-166 ... V# VALIDATE

Contended Small RB-Tree30% Delete 30% Update 40% Lookup

Ennals

TL/P0

TL2/P0

Page 34: No Slide Titleshanir/TL2-TV06.… · PPT file · Web view · 2006-12-04Concurrent Programming A FIFO Queue A Concurrent FIFO Queue Fine Grain Locks Lock-Free (JSR-166 ... V# VALIDATE

Speedup: Normalized Throughput

Hand-Crafted

TL/POLarge RB-Tree 5% Delete 5% Update 90% Lookup

Page 35: No Slide Titleshanir/TL2-TV06.… · PPT file · Web view · 2006-12-04Concurrent Programming A FIFO Queue A Concurrent FIFO Queue Fine Grain Locks Lock-Free (JSR-166 ... V# VALIDATE

Overhead Overhead Overhead

• STM scalability is as good if not better than hand-crafted, but overheads are much higher

• Overhead is the dominant performance factor – bodes well for HTM

• Read set and validation cost (not locking cost) dominates performance

Page 36: No Slide Titleshanir/TL2-TV06.… · PPT file · Web view · 2006-12-04Concurrent Programming A FIFO Queue A Concurrent FIFO Queue Fine Grain Locks Lock-Free (JSR-166 ... V# VALIDATE

On Sun T2000™ (Niagara): maybe a long way to go…

RB-tree 5% Delete 5% Update 90% LookupHand-crafted

STMs

Page 37: No Slide Titleshanir/TL2-TV06.… · PPT file · Web view · 2006-12-04Concurrent Programming A FIFO Queue A Concurrent FIFO Queue Fine Grain Locks Lock-Free (JSR-166 ... V# VALIDATE

Conclusions

• COM time locking, implemented efficiently, has clear advantages over ENC order locking: – No meltdown under contention– Seamless operation with malloc/free

• VCounter can guarantee safety so we – don’t need to embed repeated validation in

user code

Page 38: No Slide Titleshanir/TL2-TV06.… · PPT file · Web view · 2006-12-04Concurrent Programming A FIFO Queue A Concurrent FIFO Queue Fine Grain Locks Lock-Free (JSR-166 ... V# VALIDATE

What Next?

• Cut TL read-set and validation overhead, maybe with hardware support?

• Global clock can help simplify validation of other algs (checkpointing, debugging…)

• Verification of interaction between runtime and TM.

Page 39: No Slide Titleshanir/TL2-TV06.… · PPT file · Web view · 2006-12-04Concurrent Programming A FIFO Queue A Concurrent FIFO Queue Fine Grain Locks Lock-Free (JSR-166 ... V# VALIDATE

Thank You