Transcript
Page 1: IMC Summit 2016 Breakout - Henning Andersen - Using Lock-free and Wait-free In-memory Algorithms to Turbo-charge High Volume Data Management

USING LOCK-FREE AND WAIT-FREE IN-MEMORY ALGORITHMS TO TURBO-CHARGE HIGH VOLUME DATA MANAGEMENTHENNING ANDERSEN, STIBO SYSTEMS A/S

See all the presentations from the In-Memory Computing Summit at http://imcsummit.org

Page 2: IMC Summit 2016 Breakout - Henning Andersen - Using Lock-free and Wait-free In-memory Algorithms to Turbo-charge High Volume Data Management

BIO

20 years of professional career at Stibo Systems A/S Developed software for the last 30+ years Technical lead on many projects, including:

Migrating from C++ to Java platform (performance & scalability) Establishing a component platform In-Memory component

Page 3: IMC Summit 2016 Breakout - Henning Andersen - Using Lock-free and Wait-free In-memory Algorithms to Turbo-charge High Volume Data Management

GET TO KNOW STIBO SYSTEMS

Page 4: IMC Summit 2016 Breakout - Henning Andersen - Using Lock-free and Wait-free In-memory Algorithms to Turbo-charge High Volume Data Management

Travel/HospitalityDistributionRetail Manufacturing

OUR GROWING FAMILY

Page 5: IMC Summit 2016 Breakout - Henning Andersen - Using Lock-free and Wait-free In-memory Algorithms to Turbo-charge High Volume Data Management

2015 MQ MDM OF PRODUCT DOMAIN

Page 6: IMC Summit 2016 Breakout - Henning Andersen - Using Lock-free and Wait-free In-memory Algorithms to Turbo-charge High Volume Data Management

COMPLETE, SEAMLESS MULTIDOMAIN MDM SOLUTION

Page 7: IMC Summit 2016 Breakout - Henning Andersen - Using Lock-free and Wait-free In-memory Algorithms to Turbo-charge High Volume Data Management

INTEGRATING IN-MEMORY INTO STEP

STEPSTEP

STEP Server (J2EE)STEP Server

(J2EE)

DB Server

STEP

DB Server

STEP

In-Memory DB

OFF-HEAP

Page 8: IMC Summit 2016 Breakout - Henning Andersen - Using Lock-free and Wait-free In-memory Algorithms to Turbo-charge High Volume Data Management

BENCHMARK RESULTSLarge Retailer Data Large Distributor Data Scalability Test Data

Page 9: IMC Summit 2016 Breakout - Henning Andersen - Using Lock-free and Wait-free In-memory Algorithms to Turbo-charge High Volume Data Management

REQUIREMENTS

Great performance Compact memory layout

Data Per Entry Overhead

Lookup by Key Complex Querying

Indexing “Friendly” to our existing architecture Fast Startup/Initialization

Page 10: IMC Summit 2016 Breakout - Henning Andersen - Using Lock-free and Wait-free In-memory Algorithms to Turbo-charge High Volume Data Management

PERFORMANCE BY SIMPLICITY

MVCC/Immutability Wait-free index scans Code Generation Custom API/Direct Access

mov (%rdi,%r11,1),%r11

Page 11: IMC Summit 2016 Breakout - Henning Andersen - Using Lock-free and Wait-free In-memory Algorithms to Turbo-charge High Volume Data Management

BASIC HASH TABLE CLOSED ADDRESSING

Next Key=K1

Value=10

hash(key)%tablesizeBucket Table

Page 12: IMC Summit 2016 Breakout - Henning Andersen - Using Lock-free and Wait-free In-memory Algorithms to Turbo-charge High Volume Data Management

BASIC HASH TABLE CLOSED ADDRESSING

Next Key=K1

Value=10

hash(key)%tablesize

Next Key=K3

Value=20

Bucket Table

Page 13: IMC Summit 2016 Breakout - Henning Andersen - Using Lock-free and Wait-free In-memory Algorithms to Turbo-charge High Volume Data Management

Next Key=K4

Value=30

BASIC HASH TABLE COLLISION

hash(key)%tablesize

Next Key=K3

Value=20

Next Key=K1

Value=10

Bucket Table

Page 14: IMC Summit 2016 Breakout - Henning Andersen - Using Lock-free and Wait-free In-memory Algorithms to Turbo-charge High Volume Data Management

MVCC HASH TABLE

Next Prev TSN Key=K1 Value=10

hash(key)%tablesize

TSN = Transaction Sequence Number

Bucket TableTx ID TSN

Transaction Table

Published TSN2

Page 15: IMC Summit 2016 Breakout - Henning Andersen - Using Lock-free and Wait-free In-memory Algorithms to Turbo-charge High Volume Data Management

TRANSACTION/COMMIT PHASESPrepar

eCommi

tFinish

Publish

Vacuum

Abort

STEP

In-Memory

DB

STEP

In-Memory

DB

STEP

In-Memory

DB

STEP

In-Memory

DB

Leader

Commit Phases

Page 16: IMC Summit 2016 Breakout - Henning Andersen - Using Lock-free and Wait-free In-memory Algorithms to Turbo-charge High Volume Data Management

TRANSACTION/COMMIT PHASESPrepar

eCommi

tFinish

Publish

Vacuum

Abort

STEP

In-Memory

DB

STEP

In-Memory

DB

STEP

In-Memory

DB

STEP

In-Memory

DB

Leader

Commit Phases

Page 17: IMC Summit 2016 Breakout - Henning Andersen - Using Lock-free and Wait-free In-memory Algorithms to Turbo-charge High Volume Data Management

TRANSACTION/COMMIT PHASESPrepar

eCommi

tFinish

Publish

Vacuum

Abort

STEP

In-Memory

DB

STEP

In-Memory

DB

STEP

In-Memory

DB

STEP

In-Memory

DB

Leader

TSN=3TSN=

3TSN=3

Commit Phases

Page 18: IMC Summit 2016 Breakout - Henning Andersen - Using Lock-free and Wait-free In-memory Algorithms to Turbo-charge High Volume Data Management

TRANSACTION/COMMIT PHASESPrepar

eCommi

tFinish

Publish

Vacuum

Abort

STEP

In-Memory

DB

STEP

In-Memory

DB

STEP

In-Memory

DB

STEP

In-Memory

DB

Leader

TSN=3TSN=

3TSN=3

Commit Phases

Page 19: IMC Summit 2016 Breakout - Henning Andersen - Using Lock-free and Wait-free In-memory Algorithms to Turbo-charge High Volume Data Management

Tx ID TSN

MVCC HASH TABLE UPDATE - PUT(K1,15)

Next Prev TSN=2 Key=K1 Value=10

hash(key)%tablesize Tx ID TSN

UUID=1234

Bucket Table Transaction Table

PrepareFinishPublish

Next Prev Infinite Key=K1 Value=15

Prepare

Published TSN2

Page 20: IMC Summit 2016 Breakout - Henning Andersen - Using Lock-free and Wait-free In-memory Algorithms to Turbo-charge High Volume Data Management

Tx ID TSN

UUID=1234

Tx ID TSN

UUID=1234 3

MVCC HASH TABLE UPDATE - PUT(K1,15)

Next Prev TSN=2 Key=K1 Value=10

hash(key)%tablesizeBucket Table Transaction Table

PrepareFinishPublish

Next Prev Infinite Key=K1 Value=15

Finish1. Pull new TSN

Published TSN2

Page 21: IMC Summit 2016 Breakout - Henning Andersen - Using Lock-free and Wait-free In-memory Algorithms to Turbo-charge High Volume Data Management

Tx ID TSN

UUID=1234

Tx ID TSN

UUID=1234 3

MVCC HASH TABLE UPDATE - PUT(K1,15)

Next Prev TSN=2 Key=K1 Value=10

hash(key)%tablesizeBucket Table Transaction Table

PrepareFinishPublish

Next Prev Infinite Key=K1 Value=15

Finish

Next Prev TSN=3 Key=K1 Value=15

2. Apply TSN

Published TSN2

Page 22: IMC Summit 2016 Breakout - Henning Andersen - Using Lock-free and Wait-free In-memory Algorithms to Turbo-charge High Volume Data Management

Tx ID TSN

UUID=1234

Tx ID TSN

UUID=1234 3

MVCC HASH TABLE UPDATE - PUT(K1,15)

Next Prev TSN=2 Key=K1 Value=10

hash(key)%tablesizeBucket Table Transaction Table

PrepareFinishPublish

Next Prev Infinite Key Value’

Finish

Next Prev TSN=3 Key=K1 Value=15

3. Link to Prev

Published TSN2

Page 23: IMC Summit 2016 Breakout - Henning Andersen - Using Lock-free and Wait-free In-memory Algorithms to Turbo-charge High Volume Data Management

Tx ID TSN

UUID=1234

Tx ID TSN

UUID=1234 3

MVCC HASH TABLE UPDATE - PUT(K1,15)

Next Prev TSN=2 Key=K1 Value=10

hash(key)%tablesizeBucket Table Transaction Table

PrepareFinishPublish

Next Prev Infinite Key Value’

Finish

Next Prev TSN=3 Key=K1 Value=15

4. Update Bucket Table

Published TSN2

Page 24: IMC Summit 2016 Breakout - Henning Andersen - Using Lock-free and Wait-free In-memory Algorithms to Turbo-charge High Volume Data Management

Tx ID TSN

UUID=1234

Tx ID TSN

UUID=1234 3

MVCC HASH TABLE READER

Next Prev TSN=2 Key=K1 Value=10

hash(key)%tablesizeBucket Table Transaction Table

PrepareFinishPublish

Next Prev Infinite Key Value’Next Prev TSN=3 Key=K1 Value=15

Reader TSN=2Lookup K1

Published TSN2

Page 25: IMC Summit 2016 Breakout - Henning Andersen - Using Lock-free and Wait-free In-memory Algorithms to Turbo-charge High Volume Data Management

PrepareFinishPublish

Tx ID TSN

UUID=1234

Tx ID TSN

UUID=1234 3

MVCC HASH TABLE UPDATE (PUBLISH)

Next Prev TSN=2 Key=K1 Value=10

hash(key)%tablesizeBucket Table Transaction Table

Next Prev Infinite Key Value’Next Prev TSN=3 Key=K1 Value=15

Update Published TSN

Publish

Published TSN23

Page 26: IMC Summit 2016 Breakout - Henning Andersen - Using Lock-free and Wait-free In-memory Algorithms to Turbo-charge High Volume Data Management

Tx ID TSN

UUID=1234

Tx ID TSN

UUID=1234 3

MVCC HASH TABLE READER

Next Prev TSN=2 Key=K1 Value=10

hash(key)%tablesizeBucket Table Transaction Table

PrepareFinishPublish

Next Prev Infinite Key Value’Next Prev TSN=3 Key=K1 Value=15

Reader TSN=3Lookup K1

Published TSN3

Page 27: IMC Summit 2016 Breakout - Henning Andersen - Using Lock-free and Wait-free In-memory Algorithms to Turbo-charge High Volume Data Management

MVCC HASH TABLE

Page 28: IMC Summit 2016 Breakout - Henning Andersen - Using Lock-free and Wait-free In-memory Algorithms to Turbo-charge High Volume Data Management

INDEXING USING SKIP LISTS

Page 29: IMC Summit 2016 Breakout - Henning Andersen - Using Lock-free and Wait-free In-memory Algorithms to Turbo-charge High Volume Data Management

SKIP LISTS

H 10 20 30 40 5050% have height >=225% have height >=3Head Height ~= log2(n)

30>=next.value?Find Value=30?

Page 30: IMC Summit 2016 Breakout - Henning Andersen - Using Lock-free and Wait-free In-memory Algorithms to Turbo-charge High Volume Data Management

SKIP LISTS - INSERTION

20 30 40 50

15Pick random height

10H

Page 31: IMC Summit 2016 Breakout - Henning Andersen - Using Lock-free and Wait-free In-memory Algorithms to Turbo-charge High Volume Data Management

SKIP LISTS - INSERTION

20 30 40 50

15Pick random height

10H

Page 32: IMC Summit 2016 Breakout - Henning Andersen - Using Lock-free and Wait-free In-memory Algorithms to Turbo-charge High Volume Data Management

SKIP LISTS - INSERTION

H 10 20 30 40 50

15Pick random height

Page 33: IMC Summit 2016 Breakout - Henning Andersen - Using Lock-free and Wait-free In-memory Algorithms to Turbo-charge High Volume Data Management

SKIP LISTS – INSERTION RESULT

H 10 20 30 40 5015

Page 34: IMC Summit 2016 Breakout - Henning Andersen - Using Lock-free and Wait-free In-memory Algorithms to Turbo-charge High Volume Data Management

Next Prev TSN=3 Key=K1 Value=15

Index

Tx ID TSN

UUID=1234

Tx ID TSN

UUID=1234 3

MVCC INDEXING USING SKIP LISTS

hash(key)%tablesizeBucket Table Transaction Table

PrepareFinishPublish

Published TSN22

Finish 5. Update Index

Next Prev TSN=2 Key=K1 Value=10

Index

Page 35: IMC Summit 2016 Breakout - Henning Andersen - Using Lock-free and Wait-free In-memory Algorithms to Turbo-charge High Volume Data Management

5. Update Index

Next Prev TSN=3 Key=K1 Value=15

Index

Tx ID TSN

UUID=1234

Tx ID TSN

UUID=1234 3

MVCC INDEXING USING SKIP LISTS

hash(key)%tablesizeBucket Table Transaction Table

PrepareFinishPublish

Published TSN22

Finish

Next Prev TSN=2 Key=K1 Value=10

Index

Page 36: IMC Summit 2016 Breakout - Henning Andersen - Using Lock-free and Wait-free In-memory Algorithms to Turbo-charge High Volume Data Management

SKIP LISTS – 5. UPDATE INDEX

H10K1PN

2

15

PN

3

20K3PN

2

30K4PN

2

40K5PN

2

50K6PN

2

Next

Prev

TSN

Key

Value

Index L0

Index L1

Index L2

K1

Page 37: IMC Summit 2016 Breakout - Henning Andersen - Using Lock-free and Wait-free In-memory Algorithms to Turbo-charge High Volume Data Management

SKIP LISTS – 5. UPDATE INDEX

H10K1PN

2

15

PN

3

20K3PN

2

30K4PN

2

40K5PN

2

50K6PN

2

Next

Prev

TSN

Key

Value

Index L0

Index L1

Index L2

K1

Page 38: IMC Summit 2016 Breakout - Henning Andersen - Using Lock-free and Wait-free In-memory Algorithms to Turbo-charge High Volume Data Management

SKIP LISTS – INSERTION RESULT

H10K1PN

2

15

PN

3

20K3PN

2

30K4PN

2

40K5PN

2

50K6PN

2

Next

Prev

TSN

Key

Value

Index L0

Index L1

Index L2

K1

Page 39: IMC Summit 2016 Breakout - Henning Andersen - Using Lock-free and Wait-free In-memory Algorithms to Turbo-charge High Volume Data Management

SKIP LISTS – FIND [12-25], TSN=2

H10K1PN

2

15

PN

3

20K3PN

2

30K4PN

2

40K5PN

2

50K6PN

2

Next

Prev

TSN

Key

Value

Index L0

Index L1

Index L2

K1

Page 40: IMC Summit 2016 Breakout - Henning Andersen - Using Lock-free and Wait-free In-memory Algorithms to Turbo-charge High Volume Data Management

SKIP LISTS – FIND [12-25], TSN=3

H10K1PN

2

15

PN

3

20K3PN

2

30K4PN

2

40K5PN

2

50K6PN

2

Next

Prev

TSN

Key

Value

Index L0

Index L1

Index L2

K1

Page 41: IMC Summit 2016 Breakout - Henning Andersen - Using Lock-free and Wait-free In-memory Algorithms to Turbo-charge High Volume Data Management

LOCK-FREE INSERTIONS SUMMARY

CAS (compare-and-swap) on previous entity – one winner Bottom-up preserves skip-list for every level, allowing wait-free readers Help vacuum ensures lock-freedom

H 10 20 30 40

15 17

Page 42: IMC Summit 2016 Breakout - Henning Andersen - Using Lock-free and Wait-free In-memory Algorithms to Turbo-charge High Volume Data Management

LOCK-FREE INSERTIONS SUMMARY

CAS (compare-and-swap) on previous entity – one winner Bottom-up preserves skip-list for every level, allowing wait-free readers Help vacuum ensures lock-freedom

H 10 20 30 4015 17

Page 43: IMC Summit 2016 Breakout - Henning Andersen - Using Lock-free and Wait-free In-memory Algorithms to Turbo-charge High Volume Data Management

Tx ID TSN

UUID=1234

Tx ID TSN

UUID=1234 3

VACUUM, EPOCH BASED DEFERRED RECLAMATION

hash(key)%tablesizeBucket Table Transaction Table

Published TSN23

H

T

10K1

PN

2

15

PN

3

20K3

PN

2

30K4

PN

2

40K5

PN

2

50K6

PN

2K1

Snapshot RegistryReader TSN Epoch

Thread=1234 2 17

Thread=1235 3 17Vacuum wait

Page 44: IMC Summit 2016 Breakout - Henning Andersen - Using Lock-free and Wait-free In-memory Algorithms to Turbo-charge High Volume Data Management

Reader TSN Epoch

Thread=1234 2 17

Thread=1235 3 17

Tx ID TSN

UUID=1234

Tx ID TSN

UUID=1234 3

VACUUM, EPOCH BASED DEFERRED RECLAMATION

hash(key)%tablesizeBucket Table Transaction Table

Published TSN23

H

T

10K1

PN

2

15

PN

3

20K3

PN

2

30K4

PN

2

40K5

PN

2

50K6

PN

2K1

Snapshot Registry

Vacuum wait

Page 45: IMC Summit 2016 Breakout - Henning Andersen - Using Lock-free and Wait-free In-memory Algorithms to Turbo-charge High Volume Data Management

Reader TSN Epoch

Thread=1235 3 17

Tx ID TSN

UUID=1234

Tx ID TSN

UUID=1234 3

VACUUM, EPOCH BASED DEFERRED RECLAMATION

hash(key)%tablesizeBucket Table Transaction Table

Published TSN23

H

T

10K1

PN

2

15

PN

3

20K3

PN

2

30K4

PN

2

40K5

PN

2

50K6

PN

2K1

Snapshot Registry

Page 46: IMC Summit 2016 Breakout - Henning Andersen - Using Lock-free and Wait-free In-memory Algorithms to Turbo-charge High Volume Data Management

Reader TSN Epoch

Thread=1235 3 17

Tx ID TSN

UUID=1234

Tx ID TSN

UUID=1234 3

VACUUM, EPOCH BASED DEFERRED RECLAMATION

hash(key)%tablesizeBucket Table Transaction Table

Published TSN23

T

10K1

PN

2

15

PN

3

20K3

PN

2

30K4

PN

2

40K5

PN

2

50K6

PN

2K1

Snapshot Registry

Vacuum phase 1

H

Page 47: IMC Summit 2016 Breakout - Henning Andersen - Using Lock-free and Wait-free In-memory Algorithms to Turbo-charge High Volume Data Management

Reader TSN Epoch

Thread=1235 3 17

Tx ID TSN

UUID=1234

Tx ID TSN

UUID=1234 3

VACUUM, EPOCH BASED DEFERRED RECLAMATION

hash(key)%tablesizeBucket Table Transaction Table

Published TSN23

T

10K1

PN

2

15

PN

3

20K3

PN

2

30K4

PN

2

40K5

PN

2

50K6

PN

2K1

Snapshot Registry

Vacuum epoch wait

H

Page 48: IMC Summit 2016 Breakout - Henning Andersen - Using Lock-free and Wait-free In-memory Algorithms to Turbo-charge High Volume Data Management

Reader TSN Epoch

Thread=2345 3 18

Thread=1235 3 17

Tx ID TSN

UUID=1234

Tx ID TSN

UUID=1234 3

VACUUM, EPOCH BASED DEFERRED RECLAMATION

hash(key)%tablesizeBucket Table Transaction Table

Published TSN23

T

10K1

PN

2

15

PN

3

20K3

PN

2

30K4

PN

2

40K5

PN

2

50K6

PN

2K1

Snapshot Registry

Vacuum epoch wait

H

Page 49: IMC Summit 2016 Breakout - Henning Andersen - Using Lock-free and Wait-free In-memory Algorithms to Turbo-charge High Volume Data Management

Reader TSN Epoch

Thread=2345 3 18

Thread=1235 3 17

Tx ID TSN

UUID=1234

Tx ID TSN

UUID=1234 3

VACUUM, EPOCH BASED DEFERRED RECLAMATION

hash(key)%tablesizeBucket Table Transaction Table

Published TSN23

T

10K1

PN

2

15

PN

3

20K3

PN

2

30K4

PN

2

40K5

PN

2

50K6

PN

2K1

Snapshot Registry

Vacuum epoch wait

H

Page 50: IMC Summit 2016 Breakout - Henning Andersen - Using Lock-free and Wait-free In-memory Algorithms to Turbo-charge High Volume Data Management

Reader TSN Epoch

Thread=2345 3 18

Tx ID TSN

UUID=1234

Tx ID TSN

UUID=1234 3

VACUUM, EPOCH BASED DEFERRED RECLAMATION

hash(key)%tablesizeBucket Table Transaction Table

Published TSN23

T

10K1

PN

2

15

PN

3

20K3

PN

2

30K4

PN

2

40K5

PN

2

50K6

PN

2K1

Snapshot Registry

Vacuum epoch wait

H

Page 51: IMC Summit 2016 Breakout - Henning Andersen - Using Lock-free and Wait-free In-memory Algorithms to Turbo-charge High Volume Data Management

Reader TSN Epoch

Thread=2345 3 18

Tx ID TSN

UUID=1234

Tx ID TSN

UUID=1234 3

VACUUM, EPOCH BASED DEFERRED RECLAMATION

hash(key)%tablesizeBucket Table Transaction Table

Published TSN23

T

10K1

PN

2

15

PN

3

20K3

PN

2

30K4

PN

2

40K5

PN

2

50K6

PN

2K1

Snapshot Registry

Vacuum phase 2

H

Page 52: IMC Summit 2016 Breakout - Henning Andersen - Using Lock-free and Wait-free In-memory Algorithms to Turbo-charge High Volume Data Management

MVCC SUMMARY

Map and indexes both under MVCC Index scans are wait-free (and simple/fast) Insert/update/delete are lock-free Automated reclamation of storage

Page 53: IMC Summit 2016 Breakout - Henning Andersen - Using Lock-free and Wait-free In-memory Algorithms to Turbo-charge High Volume Data Management

EFFICIENT AND SAFE APItransactionManager.read((snapshot) -> { QueryIterator<ProductCO> products = snapshot.query(ProductCO._ID.range(‘IMC’,’Stibo’)); while (products.next()) { CacheEntry<ProductCO> entry = queryIterator.entry(); long typeId = entry.longValue(ProductCO::getObjectType); CacheEntry<ObjectTypeCO> type = snapshot.get(typeId);// can do gets, queries etc. on the same snapshot safely for all kinds of objects }}

public class ProductCO { long getObjectType(ValuePointer ptr) { … }} No object copies, no GC, efficient accessOften JVM can inline entire query to one native

method

12345

Page 54: IMC Summit 2016 Breakout - Henning Andersen - Using Lock-free and Wait-free In-memory Algorithms to Turbo-charge High Volume Data Management

DIY USEFUL LEARNING

• Memory model (java different from C++) and CAS operations• Assembly• CPU memory architecture• Wait-free and lock-free algorithms

• Enumerate all states• Think about state transitions• Try to formally proof it right• Deletions are often the most tricky part• Do not even think about “this will never happen”, because it will

Page 55: IMC Summit 2016 Breakout - Henning Andersen - Using Lock-free and Wait-free In-memory Algorithms to Turbo-charge High Volume Data Management

IN-MEMORY VENDOR QUESTIONS

Direct access to data or only access to copies of data? And direct access to individual fields in an entry?

Index/Query engine MVCC consistent with map gets and/or additional queries? Will index scans/queries acquire locks? Will index inserts acquire locks? Will map get/put operations acquire locks? Memory overhead per entry? Memory overhead per index (per entry)? How do you avoid memory fragmentation? Do you lock pages in memory and use huge/large pages?

Page 56: IMC Summit 2016 Breakout - Henning Andersen - Using Lock-free and Wait-free In-memory Algorithms to Turbo-charge High Volume Data Management