Wait-Free Linked-Lists

Preview:

DESCRIPTION

Wait-Free Linked-Lists. Shahar Timnat , Anastasia Braginsky , Alex Kogan, Erez Petrank Technion , Israel Presented by Shahar Timnat. 4. 6. 9. -∞. +∞. Our Contribution. A fast, wait-free linked-list The first wait-free list fast enough to be used in practice . Agenda. - PowerPoint PPT Presentation

Citation preview

Wait-Free Linked-Lists

Shahar Timnat, Anastasia Braginsky, Alex Kogan, Erez PetrankTechnion, Israel

Presented by Shahar Timnat

4 6 9-∞ +∞

Our Contribution• A fast, wait-free linked-list• The first wait-free list fast enough to be

used in practice

Agenda• What is a wait-free linked-list?• Related work and existing tools• Wait-Free Linked-List design• Performance

3

Concurrent Data Structures• Allow several threads to read or modify

the data-structure simultaneously• Increasing demands due to highly-

parallel systems

Progress Guarantees• Obstruction Free – A thread running

exclusively will make a progress• Lock Free – At least one of the running

threads will make a progress• Wait Free – every thread that gets the

CPU will make a progress.

Wait Free Algorithms• Provides the strongest progress

guarantee• Always desirable, particularly in real-

time systems.• Relatively rare• Hard to design• Typically slower

The Linked List Interface • Following the traditional choice;

a sorted list-based set of integersinsert(int x);delete(int x);contains(int x);

4 6 9-∞ +∞

Prior Wait-Free Lists• Only Universal Constructions• Non-scalable (by nature ?)• Achieve good complexity, but poor

performance• State-of-the-art construction (Chuong,

Ellen, Ramachandran) significantlyunder-perform our construction.

Our wait-free versus a universal construction

1 5 9 13 17 21 25 290

500000

1000000

1500000

2000000

2500000

3000000

3500000

4000000

4500000

WF Universal Threads

Ope

ratio

ns d

one

in 2

seco

nds (

Mill

ions

)

1 5 9 13 17 21 25 290

20

40

60

80

100

120

140

160

180

Threads

Ratio

Linked-Lists with Progress Guarantee • No practical wait-free linked-lists

available• Lock-free linked-lists exists• Most notably: Harris’s linked-list

Existing Lock-Free List(by Harris)• Deletion in two steps• Logical: Mark the next field using a CAS

• Physical: Remove the node4 6 9

4 6 9

Existing Lock-Free List(by Harris)• Use the least significant bit in each next field,

as a mark bit• The mark bit signals that a node is logically

deleted• The Node’s next field cannot be changed (the

CAS will fail) if it is logically deleted

4 6 9

4 6 9

Help Mechanism• A common technique to achieve wait-

freedom• Each thread declares in a designated

state array the operation it desires• Many threads may attempt to execute

it

Help Mechanism - Difficulties• Multiple threads should be able to work

concurrently on the same operation• Many potential races• Difficult to design• Usually slower

Complication: Deletion Owning• T1, T2 both attempt delete(6)

4 6 9-∞ +∞

Complication: Deletion Owning• T1, T2 both attempt delete(6)• T1, T2 both declare in the state array

4 6 9-∞ +∞

Complication: Deletion Owning• T1, T2 both attempt delete(6)• T1, T2 both declare in the state array• T3 sees T1 declaration and tries to help it,

while T4 helps T2

4 6 9-∞ +∞

Complication: Deletion Owning• T1, T2 both attempt delete(6)• T1, T2 both declare in the state array• T3 sees T1 declaration and tries to help it,

while T4 helps T2

4 6 9-∞ +∞

Complication: Deletion Owning• If both helpers T3, T4 “go to sleep” after

the mark was done, which thread (T1 or T2) should return true and which false?

4 6 9-∞ +∞

"Solution: use a “success bit• Each node holds an extra “success bit”

(initially 0)• Potential owners compete to CAS it to 1

(no help in this part)• Note the node is deleted before it is

decided which thread owns its deletion

Helping an Insert Operation• Search• Direct• Insert• Report

Helping an Insert Operation• Search• Direct• Insert• Report

4 6 9

Status: PendingOperation: InsertNew node:

7

Helping an Insert Operation• Search• Direct• Insert• Report

4 6 9

Status: PendingOperation: InsertNew node:

7

Helping an Insert Operation• Search• Direct• Insert• Report

4 6 9

Status: PendingOperation: InsertNew node:

7

Helping an Insert Operation• Search• Direct• Insert• Report

4 6 9

CAS

Status: PendingOperation: InsertNew node:

7

Helping an Insert Operation• Search• Direct• Insert• Report

4 6 9

Status: PendingOperation: InsertNew node:

7

Status: SuccessOperation: InsertNew node:

CAS

Incorrect Result Returnedconsider 2 threads helping insert(7)

T1 { found (6,9) node.next = &9 inserts new node.

CAS(state[tid],s,success)

} 4 6 9

T2 {

found(6,7) CAS(state[tid],s,failure)} 7

Incorrect Result Returnedconsider 2 threads helping insert(7)

T1 { found (6,9) node.next = &9 inserts new node

CAS(state[tid],s,success)

} 4 6 9

T2 {

found(6,7) CAS(state[tid],s,failure)} 7

Incorrect Result Returnedconsider 2 threads helping insert(7)

T1 { found (6,9) node.next = &9 inserts new node

CAS(state[tid],s,success)

} 4 6 9

T2 {

found(6,7) CAS(state[tid],s,failure)}

7

Incorrect Result Returnedconsider 2 threads helping insert(7)

T1 {

found (6,9) node.next = &9 inserts new node.

CAS(state[tid],s,success)

} 4 6 9

T2 {

found(6,7) CAS(state[tid],s,failure)}

7

Incorrect Result Returnedconsider 2 threads helping insert(7)

T1 {

found (6,9) node.next = &9 inserts new node.

CAS(state[tid],s,success)

} 4 6 9

T2 {

found(6,7) CAS(state[tid],s,failure)}

7

Incorrect Result Returnedconsider 2 threads helping insert(7)

T1 {

found (6,9) node.next = &9 inserts new node

CAS(state[tid],s,success)

} 4 6 9

T2 {

found(6,7) CAS(state[tid],s,failure)}

7

Incorrect Result Returned 2

T1 { found (6,9) node.next = &9 inserts new node

CAS(->success)

}4 6 9

T2 {

found(6,7) CAS(->failure}

T3 {

Delete(7)Insert(7)}

7

Incorrect Result Returned 2

T1 { found (6,9) node.next = &9 inserts new node

CAS(->success)

}4 6 9

T2 {

found(6,7) CAS(->failure}

T3 {

Delete(7)Insert(7)}

7

Incorrect Result Returned 2

T1 { found (6,9) node.next = &9 inserts new node

CAS(->success)

}4 6 7

T2 {

found(6,7) CAS(->failure}

T3 {

Delete(7)Insert(7)}

9

Incorrect Result Returned 2

T1 { found (6,9) node.next = &9 inserts new node

CAS(->success)

}4 6 7

T2 {

found(6,7) CAS(->failure}

T3 {

Delete(7)Insert(7)}

9

Incorrect Result Returned 2

T1 { found (6,9) node.next = &9 inserts new node

CAS(->success)

}4 6 7

T2 {

found(6,7) CAS(->failure}

T3 {

Delete(7)Insert(7)}

9

7’

Incorrect Result Returned 2T1 { found (6,9) node.next = &9 inserts new node

CAS(->success)

}4 6 7

T2 {

found(6,7) CAS(->failure}

T3 {

Delete(7)Insert(7)}

9

7’

Incorrect Result Returned 2

T1 { found (6,9) node.next = &9 inserts new node

CAS(->success)

}4 6 7

T2 {

found(6,7) CAS(->failure}

T3 {

Delete(7)Insert(7)}

9

7’

Incorrect Result Returned 2

T1 { found (6,9) node.next = &9 inserts new node

CAS(->success)

}4 6 7

T2 {

found(6,7) CAS(->failure}

T3 {

Delete(7)Insert(7)}

9

7’

Ill-timed Directconsider 2 threads helping insert(7)

T1 { found (6,9) node.next = &9}

4 6 9

T2 { found (6,9) node.next = &9 inserts the new node CAS(->success) . ..Insert(8) (after 7)}7

Ill-timed Directconsider 2 threads helping insert(7)

T1 { found (6,9) node.next = &9}

4 6 9

T2 { found (6,9) node.next = &9 inserts the new node CAS(->success) . ..Insert(8) (after 7)}7

Ill-timed Directconsider 2 threads helping insert(7)

T1 { found (6,9) node.next = &9}

4 6 9

T2 { found (6,9) node.next = &9 inserts the new node CAS(->success) . ..Insert(8) (after 7)}7

Ill-timed Directconsider 2 threads helping insert(7)

T1 { found (6,9) node.next = &9}

4 6 9

T2 { found (6,9) node.next = &9 inserts the new node CAS(->success) . ..Insert(8) (after 7)}

7

Ill-timed Directconsider 2 threads helping insert(7)

T1 { found (6,9) node.next = &9}

4 6 9

T2 { found (6,9) node.next = &9 inserts the new node CAS(->success) . ..Insert(8) (after 7)}

7

Ill-timed Directconsider 2 threads helping insert(7)

T1 { found (6,9) node.next = &9}

4 6 8

T2 { found (6,9) node.next = &9 inserts the new node CAS(->success) ...Insert(8) (after 7)}

7 9

Ill-timed Directconsider 2 threads helping insert(7)

T1 { found (6,9) node.next = &9}

4 6 8

T2 { found (6,9) node.next = &9 inserts the new node CAS(->success) ...Insert(8) (after 7)}

7 9

More Races Exist• Additional races were handled in both

the delete and insert operations• We constructed a formal proof for the

correctness of the algorithm

Main Invariant• Each modification of a node’s next field

belongs into one of four categories• Marking (change the mark bit to true)• Snipping (removing a marked node)• Redirection (of an infant node)• Insertion (a non-infant to an infant)

• Proof by induction and by following the code lines

Fast-Path-Slow-Path(Kogan and Petrank, PPOPP 2012)• Each thread:• Tries to complete the operation without

help• Asks For help Only if it failed due to

contention

• (Almost) as fast as the lock-free • Gives the stronger wait-free guarantee

Fast-Path-Slow-Path• Previously implemented for a queue• Requires the wait-free algorithm and

the lock-free one to work concurrently• Our algorithm was carefully chosen to

allow a fast-path-slow-path execution

Performance• We measured our Algorithm against

Harris’s lock-free algorithm• We measured our algorithm using• Immediate help• Deferred help• FPSP

Performance• We report the results of a micro-

benchmark:• 1024 possible keys, 512 on average• 60% contains, 20% insert, 20% delete

• Measured on:• Intel Xeon (8 concurrent threads)• Sun ULTRA SPARC (32 concurrent

threads)

Performance

0

200

400

600

800

1000

1200

1400

1600

1800

2000

UltraSPARC T1

LF FPSP Deferred-HelpImmed-Help

Number Of Threads

Operations (thousands)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 160

500

1000

1500

2000

2500

Intel(R) Xeon(R)

LF FPSP Deferred-Help Immed-Help

Number Of Threads

Performance• When employing the FPSP technique

together with our algorithm:• 0-2% difference on Intel (R) Exon (R)• 9-11% difference on on UltraSPARC

Conclusions• We designed the first practical wait-

free linked-list• Performance measurement shows our

algorithm to work almost as fast as the lock-free list, and give a stronger progress guarantee• A formal correctness proof is available

Questions?

Recommended