IMC Summit 2016 Breakout - Henning Andersen - Using Lock-free and Wait-free In-memory Algorithms to...

Preview:

Citation preview

USING LOCK-FREE AND WAIT-FREE IN-MEMORY ALGORITHMS TO TURBO-CHARGE HIGH VOLUME DATA MANAGEMENTHENNING ANDERSEN, STIBO SYSTEMS A/S

See all the presentations from the In-Memory Computing Summit at http://imcsummit.org

BIO

20 years of professional career at Stibo Systems A/S Developed software for the last 30+ years Technical lead on many projects, including:

Migrating from C++ to Java platform (performance & scalability) Establishing a component platform In-Memory component

GET TO KNOW STIBO SYSTEMS

Travel/HospitalityDistributionRetail Manufacturing

OUR GROWING FAMILY

2015 MQ MDM OF PRODUCT DOMAIN

COMPLETE, SEAMLESS MULTIDOMAIN MDM SOLUTION

INTEGRATING IN-MEMORY INTO STEP

STEPSTEP

STEP Server (J2EE)STEP Server

(J2EE)

DB Server

STEP

DB Server

STEP

In-Memory DB

OFF-HEAP

BENCHMARK RESULTSLarge Retailer Data Large Distributor Data Scalability Test Data

REQUIREMENTS

Great performance Compact memory layout

Data Per Entry Overhead

Lookup by Key Complex Querying

Indexing “Friendly” to our existing architecture Fast Startup/Initialization

PERFORMANCE BY SIMPLICITY

MVCC/Immutability Wait-free index scans Code Generation Custom API/Direct Access

mov (%rdi,%r11,1),%r11

BASIC HASH TABLE CLOSED ADDRESSING

Next Key=K1

Value=10

hash(key)%tablesizeBucket Table

BASIC HASH TABLE CLOSED ADDRESSING

Next Key=K1

Value=10

hash(key)%tablesize

Next Key=K3

Value=20

Bucket Table

Next Key=K4

Value=30

BASIC HASH TABLE COLLISION

hash(key)%tablesize

Next Key=K3

Value=20

Next Key=K1

Value=10

Bucket Table

MVCC HASH TABLE

Next Prev TSN Key=K1 Value=10

hash(key)%tablesize

TSN = Transaction Sequence Number

Bucket TableTx ID TSN

Transaction Table

Published TSN2

TRANSACTION/COMMIT PHASESPrepar

eCommi

tFinish

Publish

Vacuum

Abort

STEP

In-Memory

DB

STEP

In-Memory

DB

STEP

In-Memory

DB

STEP

In-Memory

DB

Leader

Commit Phases

TRANSACTION/COMMIT PHASESPrepar

eCommi

tFinish

Publish

Vacuum

Abort

STEP

In-Memory

DB

STEP

In-Memory

DB

STEP

In-Memory

DB

STEP

In-Memory

DB

Leader

Commit Phases

TRANSACTION/COMMIT PHASESPrepar

eCommi

tFinish

Publish

Vacuum

Abort

STEP

In-Memory

DB

STEP

In-Memory

DB

STEP

In-Memory

DB

STEP

In-Memory

DB

Leader

TSN=3TSN=

3TSN=3

Commit Phases

TRANSACTION/COMMIT PHASESPrepar

eCommi

tFinish

Publish

Vacuum

Abort

STEP

In-Memory

DB

STEP

In-Memory

DB

STEP

In-Memory

DB

STEP

In-Memory

DB

Leader

TSN=3TSN=

3TSN=3

Commit Phases

Tx ID TSN

MVCC HASH TABLE UPDATE - PUT(K1,15)

Next Prev TSN=2 Key=K1 Value=10

hash(key)%tablesize Tx ID TSN

UUID=1234

Bucket Table Transaction Table

PrepareFinishPublish

Next Prev Infinite Key=K1 Value=15

Prepare

Published TSN2

Tx ID TSN

UUID=1234

Tx ID TSN

UUID=1234 3

MVCC HASH TABLE UPDATE - PUT(K1,15)

Next Prev TSN=2 Key=K1 Value=10

hash(key)%tablesizeBucket Table Transaction Table

PrepareFinishPublish

Next Prev Infinite Key=K1 Value=15

Finish1. Pull new TSN

Published TSN2

Tx ID TSN

UUID=1234

Tx ID TSN

UUID=1234 3

MVCC HASH TABLE UPDATE - PUT(K1,15)

Next Prev TSN=2 Key=K1 Value=10

hash(key)%tablesizeBucket Table Transaction Table

PrepareFinishPublish

Next Prev Infinite Key=K1 Value=15

Finish

Next Prev TSN=3 Key=K1 Value=15

2. Apply TSN

Published TSN2

Tx ID TSN

UUID=1234

Tx ID TSN

UUID=1234 3

MVCC HASH TABLE UPDATE - PUT(K1,15)

Next Prev TSN=2 Key=K1 Value=10

hash(key)%tablesizeBucket Table Transaction Table

PrepareFinishPublish

Next Prev Infinite Key Value’

Finish

Next Prev TSN=3 Key=K1 Value=15

3. Link to Prev

Published TSN2

Tx ID TSN

UUID=1234

Tx ID TSN

UUID=1234 3

MVCC HASH TABLE UPDATE - PUT(K1,15)

Next Prev TSN=2 Key=K1 Value=10

hash(key)%tablesizeBucket Table Transaction Table

PrepareFinishPublish

Next Prev Infinite Key Value’

Finish

Next Prev TSN=3 Key=K1 Value=15

4. Update Bucket Table

Published TSN2

Tx ID TSN

UUID=1234

Tx ID TSN

UUID=1234 3

MVCC HASH TABLE READER

Next Prev TSN=2 Key=K1 Value=10

hash(key)%tablesizeBucket Table Transaction Table

PrepareFinishPublish

Next Prev Infinite Key Value’Next Prev TSN=3 Key=K1 Value=15

Reader TSN=2Lookup K1

Published TSN2

PrepareFinishPublish

Tx ID TSN

UUID=1234

Tx ID TSN

UUID=1234 3

MVCC HASH TABLE UPDATE (PUBLISH)

Next Prev TSN=2 Key=K1 Value=10

hash(key)%tablesizeBucket Table Transaction Table

Next Prev Infinite Key Value’Next Prev TSN=3 Key=K1 Value=15

Update Published TSN

Publish

Published TSN23

Tx ID TSN

UUID=1234

Tx ID TSN

UUID=1234 3

MVCC HASH TABLE READER

Next Prev TSN=2 Key=K1 Value=10

hash(key)%tablesizeBucket Table Transaction Table

PrepareFinishPublish

Next Prev Infinite Key Value’Next Prev TSN=3 Key=K1 Value=15

Reader TSN=3Lookup K1

Published TSN3

MVCC HASH TABLE

INDEXING USING SKIP LISTS

SKIP LISTS

H 10 20 30 40 5050% have height >=225% have height >=3Head Height ~= log2(n)

30>=next.value?Find Value=30?

SKIP LISTS - INSERTION

20 30 40 50

15Pick random height

10H

SKIP LISTS - INSERTION

20 30 40 50

15Pick random height

10H

SKIP LISTS - INSERTION

H 10 20 30 40 50

15Pick random height

SKIP LISTS – INSERTION RESULT

H 10 20 30 40 5015

Next Prev TSN=3 Key=K1 Value=15

Index

Tx ID TSN

UUID=1234

Tx ID TSN

UUID=1234 3

MVCC INDEXING USING SKIP LISTS

hash(key)%tablesizeBucket Table Transaction Table

PrepareFinishPublish

Published TSN22

Finish 5. Update Index

Next Prev TSN=2 Key=K1 Value=10

Index

5. Update Index

Next Prev TSN=3 Key=K1 Value=15

Index

Tx ID TSN

UUID=1234

Tx ID TSN

UUID=1234 3

MVCC INDEXING USING SKIP LISTS

hash(key)%tablesizeBucket Table Transaction Table

PrepareFinishPublish

Published TSN22

Finish

Next Prev TSN=2 Key=K1 Value=10

Index

SKIP LISTS – 5. UPDATE INDEX

H10K1PN

2

15

PN

3

20K3PN

2

30K4PN

2

40K5PN

2

50K6PN

2

Next

Prev

TSN

Key

Value

Index L0

Index L1

Index L2

K1

SKIP LISTS – 5. UPDATE INDEX

H10K1PN

2

15

PN

3

20K3PN

2

30K4PN

2

40K5PN

2

50K6PN

2

Next

Prev

TSN

Key

Value

Index L0

Index L1

Index L2

K1

SKIP LISTS – INSERTION RESULT

H10K1PN

2

15

PN

3

20K3PN

2

30K4PN

2

40K5PN

2

50K6PN

2

Next

Prev

TSN

Key

Value

Index L0

Index L1

Index L2

K1

SKIP LISTS – FIND [12-25], TSN=2

H10K1PN

2

15

PN

3

20K3PN

2

30K4PN

2

40K5PN

2

50K6PN

2

Next

Prev

TSN

Key

Value

Index L0

Index L1

Index L2

K1

SKIP LISTS – FIND [12-25], TSN=3

H10K1PN

2

15

PN

3

20K3PN

2

30K4PN

2

40K5PN

2

50K6PN

2

Next

Prev

TSN

Key

Value

Index L0

Index L1

Index L2

K1

LOCK-FREE INSERTIONS SUMMARY

CAS (compare-and-swap) on previous entity – one winner Bottom-up preserves skip-list for every level, allowing wait-free readers Help vacuum ensures lock-freedom

H 10 20 30 40

15 17

LOCK-FREE INSERTIONS SUMMARY

CAS (compare-and-swap) on previous entity – one winner Bottom-up preserves skip-list for every level, allowing wait-free readers Help vacuum ensures lock-freedom

H 10 20 30 4015 17

Tx ID TSN

UUID=1234

Tx ID TSN

UUID=1234 3

VACUUM, EPOCH BASED DEFERRED RECLAMATION

hash(key)%tablesizeBucket Table Transaction Table

Published TSN23

H

T

10K1

PN

2

15

PN

3

20K3

PN

2

30K4

PN

2

40K5

PN

2

50K6

PN

2K1

Snapshot RegistryReader TSN Epoch

Thread=1234 2 17

Thread=1235 3 17Vacuum wait

Reader TSN Epoch

Thread=1234 2 17

Thread=1235 3 17

Tx ID TSN

UUID=1234

Tx ID TSN

UUID=1234 3

VACUUM, EPOCH BASED DEFERRED RECLAMATION

hash(key)%tablesizeBucket Table Transaction Table

Published TSN23

H

T

10K1

PN

2

15

PN

3

20K3

PN

2

30K4

PN

2

40K5

PN

2

50K6

PN

2K1

Snapshot Registry

Vacuum wait

Reader TSN Epoch

Thread=1235 3 17

Tx ID TSN

UUID=1234

Tx ID TSN

UUID=1234 3

VACUUM, EPOCH BASED DEFERRED RECLAMATION

hash(key)%tablesizeBucket Table Transaction Table

Published TSN23

H

T

10K1

PN

2

15

PN

3

20K3

PN

2

30K4

PN

2

40K5

PN

2

50K6

PN

2K1

Snapshot Registry

Reader TSN Epoch

Thread=1235 3 17

Tx ID TSN

UUID=1234

Tx ID TSN

UUID=1234 3

VACUUM, EPOCH BASED DEFERRED RECLAMATION

hash(key)%tablesizeBucket Table Transaction Table

Published TSN23

T

10K1

PN

2

15

PN

3

20K3

PN

2

30K4

PN

2

40K5

PN

2

50K6

PN

2K1

Snapshot Registry

Vacuum phase 1

H

Reader TSN Epoch

Thread=1235 3 17

Tx ID TSN

UUID=1234

Tx ID TSN

UUID=1234 3

VACUUM, EPOCH BASED DEFERRED RECLAMATION

hash(key)%tablesizeBucket Table Transaction Table

Published TSN23

T

10K1

PN

2

15

PN

3

20K3

PN

2

30K4

PN

2

40K5

PN

2

50K6

PN

2K1

Snapshot Registry

Vacuum epoch wait

H

Reader TSN Epoch

Thread=2345 3 18

Thread=1235 3 17

Tx ID TSN

UUID=1234

Tx ID TSN

UUID=1234 3

VACUUM, EPOCH BASED DEFERRED RECLAMATION

hash(key)%tablesizeBucket Table Transaction Table

Published TSN23

T

10K1

PN

2

15

PN

3

20K3

PN

2

30K4

PN

2

40K5

PN

2

50K6

PN

2K1

Snapshot Registry

Vacuum epoch wait

H

Reader TSN Epoch

Thread=2345 3 18

Thread=1235 3 17

Tx ID TSN

UUID=1234

Tx ID TSN

UUID=1234 3

VACUUM, EPOCH BASED DEFERRED RECLAMATION

hash(key)%tablesizeBucket Table Transaction Table

Published TSN23

T

10K1

PN

2

15

PN

3

20K3

PN

2

30K4

PN

2

40K5

PN

2

50K6

PN

2K1

Snapshot Registry

Vacuum epoch wait

H

Reader TSN Epoch

Thread=2345 3 18

Tx ID TSN

UUID=1234

Tx ID TSN

UUID=1234 3

VACUUM, EPOCH BASED DEFERRED RECLAMATION

hash(key)%tablesizeBucket Table Transaction Table

Published TSN23

T

10K1

PN

2

15

PN

3

20K3

PN

2

30K4

PN

2

40K5

PN

2

50K6

PN

2K1

Snapshot Registry

Vacuum epoch wait

H

Reader TSN Epoch

Thread=2345 3 18

Tx ID TSN

UUID=1234

Tx ID TSN

UUID=1234 3

VACUUM, EPOCH BASED DEFERRED RECLAMATION

hash(key)%tablesizeBucket Table Transaction Table

Published TSN23

T

10K1

PN

2

15

PN

3

20K3

PN

2

30K4

PN

2

40K5

PN

2

50K6

PN

2K1

Snapshot Registry

Vacuum phase 2

H

MVCC SUMMARY

Map and indexes both under MVCC Index scans are wait-free (and simple/fast) Insert/update/delete are lock-free Automated reclamation of storage

EFFICIENT AND SAFE APItransactionManager.read((snapshot) -> { QueryIterator<ProductCO> products = snapshot.query(ProductCO._ID.range(‘IMC’,’Stibo’)); while (products.next()) { CacheEntry<ProductCO> entry = queryIterator.entry(); long typeId = entry.longValue(ProductCO::getObjectType); CacheEntry<ObjectTypeCO> type = snapshot.get(typeId);// can do gets, queries etc. on the same snapshot safely for all kinds of objects }}

public class ProductCO { long getObjectType(ValuePointer ptr) { … }} No object copies, no GC, efficient accessOften JVM can inline entire query to one native

method

12345

DIY USEFUL LEARNING

• Memory model (java different from C++) and CAS operations• Assembly• CPU memory architecture• Wait-free and lock-free algorithms

• Enumerate all states• Think about state transitions• Try to formally proof it right• Deletions are often the most tricky part• Do not even think about “this will never happen”, because it will

IN-MEMORY VENDOR QUESTIONS

Direct access to data or only access to copies of data? And direct access to individual fields in an entry?

Index/Query engine MVCC consistent with map gets and/or additional queries? Will index scans/queries acquire locks? Will index inserts acquire locks? Will map get/put operations acquire locks? Memory overhead per entry? Memory overhead per index (per entry)? How do you avoid memory fragmentation? Do you lock pages in memory and use huge/large pages?