IMC Summit 2016 Breakout - Henning Andersen - Using Lock-free and Wait-free In-memory Algorithms to Turbo-charge High Volume Data Management

Embed Size (px)

Text of IMC Summit 2016 Breakout - Henning Andersen - Using Lock-free and Wait-free In-memory Algorithms to...

Slide 1

Using lock-free and wait-free in-memory algorithms to turbo-charge high volume data managementHenning andersen, stibo systems A/SSee all the presentations from the In-Memory Computing Summit at http://imcsummit.org

001

BIo20 years of professional career at Stibo Systems A/SDeveloped software for the last 30+ yearsTechnical lead on many projects, including:Migrating from C++ to Java platform (performance & scalability)Establishing a component platformIn-Memory component

012

Get to Know Stibo Systems

023

Travel/Hospitality

Distribution

Retail

Manufacturing

Our Growing Family

402

2015 MQ MDM of Product Domain

035

Complete, Seamless Multidomain MDM Solution

03-056

Integrating In-Memory into STEPSTEPSTEPSTEP Server (J2EE)STEP Server (J2EE)DB ServerSTEP

DB ServerSTEP

In-Memory DBOFF-HEAP

05-077

Benchmark ResultsLarge Retailer DataLarge Distributor DataScalability Test Data

07-088

RequirementsGreat performanceCompact memory layoutDataPer Entry OverheadLookup by KeyComplex QueryingIndexingFriendly to our existing architectureFast Startup/Initialization

08-109

performance BY simplicityMVCC/ImmutabilityWait-free index scansCode GenerationCustom API/Direct Access

mov (%rdi,%r11,1),%r11

10-1110

Basic Hash Table Closed AddressingNextKey=K1Value=10

hash(key)%tablesizeBucket Table

1111

Basic Hash Table Closed AddressingNextKey=K1Value=10

hash(key)%tablesizeNextKey=K3Value=20

Bucket Table

1212

NextKey=K4Value=30

Basic Hash Table Collisionhash(key)%tablesizeNextKey=K3Value=20

NextKey=K1Value=10

Bucket Table

12-1313

MVCC Hash Table

NextPrevTSNKey=K1Value=10

hash(key)%tablesizeTSN = Transaction Sequence NumberBucket TableTx IDTSN

Transaction TablePublished TSN2

13-15Two additional structuresTSN global time, like oracle system change number (SCN). Monotonically increasing over time.

14

Transaction/Commit PhasesPrepareCommitFinishPublishVacuumAbortSTEP

In-Memory DBSTEP

In-Memory DBSTEP

In-Memory DBSTEP

In-Memory DB

LeaderCommit Phases

15-1615

Transaction/Commit PhasesPrepareCommitFinishPublishVacuumAbortSTEP

In-Memory DBSTEP

In-Memory DBSTEP

In-Memory DBSTEP

In-Memory DBLeader

Commit Phases

1516

Transaction/Commit PhasesPrepareCommitFinishPublishVacuumAbortSTEP

In-Memory DBSTEP

In-Memory DBSTEP

In-Memory DBSTEP

In-Memory DBLeader

TSN=3TSN=3TSN=3Commit Phases

1617

Transaction/Commit PhasesPrepareCommitFinishPublishVacuumAbortSTEP

In-Memory DBSTEP

In-Memory DBSTEP

In-Memory DBSTEP

In-Memory DBLeader

TSN=3TSN=3TSN=3Commit Phases

16-1718

Tx IDTSN

MVCC Hash Table Update - put(K1,15)

NextPrevTSN=2Key=K1Value=10

hash(key)%tablesizeTx IDTSNUUID=1234

Bucket TableTransaction TablePrepareFinishPublishNextPrevInfiniteKey=K1Value=15

PreparePublished TSN2

17-1819

Tx IDTSNUUID=1234

Tx IDTSNUUID=12343

MVCC Hash Table Update - put(K1,15)

NextPrevTSN=2Key=K1Value=10

hash(key)%tablesizeBucket TableTransaction TablePrepareFinishPublishNextPrevInfiniteKey=K1Value=15

Finish1. Pull new TSN

Published TSN2

1820

Tx IDTSNUUID=1234

Tx IDTSNUUID=12343

MVCC Hash Table Update - put(K1,15)

NextPrevTSN=2Key=K1Value=10

hash(key)%tablesizeBucket TableTransaction TablePrepareFinishPublishNextPrevInfiniteKey=K1Value=15

FinishNextPrevTSN=3Key=K1Value=15

2. Apply TSN

Published TSN2

18-1921

Tx IDTSNUUID=1234

Tx IDTSNUUID=12343

MVCC Hash Table Update - put(K1,15)

NextPrevTSN=2Key=K1Value=10

hash(key)%tablesizeBucket TableTransaction TablePrepareFinishPublishNextPrevInfiniteKeyValue

FinishNextPrevTSN=3Key=K1Value=15

3. Link to Prev

Published TSN2

1922

Tx IDTSNUUID=1234

Tx IDTSNUUID=12343

MVCC Hash Table Update - put(K1,15)

NextPrevTSN=2Key=K1Value=10

hash(key)%tablesizeBucket TableTransaction TablePrepareFinishPublishNextPrevInfiniteKeyValue

FinishNextPrevTSN=3Key=K1Value=15

4. Update Bucket Table

Published TSN2

19-2023

Tx IDTSNUUID=1234

Tx IDTSNUUID=12343

MVCC Hash Table Reader

NextPrevTSN=2Key=K1Value=10

hash(key)%tablesizeBucket TableTransaction TablePrepareFinishPublishNextPrevInfiniteKeyValue

NextPrevTSN=3Key=K1Value=15

Reader TSN=2Lookup K1

Published TSN2

20-2124

PrepareFinishPublishTx IDTSNUUID=1234

Tx IDTSNUUID=12343

MVCC Hash Table Update (Publish)

NextPrevTSN=2Key=K1Value=10

hash(key)%tablesizeBucket TableTransaction TableNextPrevInfiniteKeyValue

NextPrevTSN=3Key=K1Value=15

Update Published TSN PublishPublished TSN2

3

2125

Tx IDTSNUUID=1234

Tx IDTSNUUID=12343

MVCC Hash Table Reader

NextPrevTSN=2Key=K1Value=10

hash(key)%tablesizeBucket TableTransaction TablePrepareFinishPublishNextPrevInfiniteKeyValue

NextPrevTSN=3Key=K1Value=15

Reader TSN=3Lookup K1

Published TSN3

21-2226

MVCC Hash Table

2227

Indexing using Skip Lists

22-2328

Skip ListsH1020304050

50% have height >=225% have height >=3Head Height ~= log2(n)

30>=next.value?Find Value=30?

23-2529

Skip Lists - Insertion20304050

15

Pick random height

10

H

25-2630

Skip Lists - Insertion20304050

15

Pick random height

10

H

2631

Skip Lists - InsertionH1020304050

15

Pick random height

26-2732

Skip Lists Insertion ResultH1020304050

15

2733

NextPrevTSN=3Key=K1Value=15Index

Tx IDTSNUUID=1234

Tx IDTSNUUID=12343

MVCC Indexing Using Skip Lists

hash(key)%tablesizeBucket TableTransaction TablePrepareFinishPublishPublished TSN2

2

Finish5. Update Index NextPrevTSN=2Key=K1Value=10Index

27-2834

5. Update Index NextPrevTSN=3Key=K1Value=15Index

Tx IDTSNUUID=1234

Tx IDTSNUUID=12343

MVCC Indexing Using Skip Lists

hash(key)%tablesizeBucket TableTransaction TablePrepareFinishPublishPublished TSN2

2

FinishNextPrevTSN=2Key=K1Value=10Index

2835

Skip Lists 5. Update inDexH

10K1PN2

15PN3

20K3PN2

30K4PN2

40K5PN2

50K6PN2NextPrevTSNKeyValueIndex L0Index L1Index L2K1

28-29Explain fields on the left36

Skip Lists 5. Update inDexH

10K1PN2

15PN3

20K3PN2

30K4PN2

40K5PN2

50K6PN2NextPrevTSNKeyValueIndex L0Index L1Index L2K1

2937

Skip Lists Insertion ResultH

10K1PN2

15PN3

20K3PN2

30K4PN2

40K5PN2

50K6PN2NextPrevTSNKeyValueIndex L0Index L1Index L2

K1

2938

Skip Lists FIND [12-25], TSN=2H

10K1PN2

15PN3

20K3PN2

30K4PN2

40K5PN2

50K6PN2NextPrevTSNKeyValueIndex L0Index L1Index L2K1

29-30Let us find values in the range 12-25 for TSN=2

39

Skip Lists FIND [12-25], TSN=3H

10K1PN2

15PN3

20K3PN2

30K4PN2

40K5PN2

50K6PN2NextPrevTSNKeyValueIndex L0Index L1Index L2K1

30-31However, had we used TSN=3, we would have seen both 15 and 20, since both are OK to use for TSN=3.40

Lock-free Insertions SummaryCAS (compare-and-swap) on previous entity one winnerBottom-up preserves skip-list for every level, allowing wait-free readersHelp vacuum ensures lock-freedom

H1020304050

15

17

31-3241

Lock-free Insertions SummaryCAS (compare-and-swap) on previous entity one winnerBottom-up preserves skip-list for every level, allowing wait-free readersHelp vacuum ensures lock-freedom

H1020304050

15

17

31-3242

Tx IDTSNUUID=1234

Tx IDTSNUUID=12343

vacuum, epoch based deferred reclamation

hash(key)%tablesizeBucket TableTransaction TablePublished TSN2

3

H

T

10K1PN2

15PN3

20K3PN2

30K4PN2

40K5PN2

50K6PN2

K1Snapshot RegistryReaderTSNEpochThread=1234217Thread=1235317

Vacuum wait

32-33Snapshot registry, readers and writers (finish phase). Cannot remove old version yet since a TSN=2 reader is reading.43

ReaderTSNEpochThread=1234217Thread=1235317

Tx IDTSNUUID=1234

Tx IDTSNUUID=12343

vacuum, epoch based deferred reclamation

hash(key)%tablesizeBucket TableTransaction TablePublished TSN2

3

H

T

10K1PN2

15PN3

20K3PN2

30K4PN2

40K5PN2

50K6PN2

K1Snapshot RegistryVacuum wait

33Last TSN=2 reader completes.44

ReaderTSNEpochThread=1235317

Tx IDTSNUUID=1234

Tx IDTSNUUID=12343

vacuum, epoch based deferred reclamation

hash(key)%tablesizeBucket TableTransaction TablePublished TSN2

3

H

T

10K1PN2

15PN3

20K3PN2

30K4PN2

40K5PN2

50K6PN2

K1Snapshot Registry

3345

ReaderTSNEpochThread=1235317

Tx IDTSNUUID=1234

Tx IDTSNUUID=12343