Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
WTM’12, Bern, April 10, 2012
In-‐Place Metainfo Support in DeuceSTM
Ricardo J. Dias, Tiago M. Vale and João M. Lourenço CITI / Universidade Nova de Lisboa
1
WTM’12, Bern, April 10, 2012
MoEvaEon
• Fair comparison of different STM algorithms
• Require a flexible framework to support different STM implementaEons – MulE-‐version, lock-‐based, …
– Same transacEonal interface
– Same benchmarking applicaEons
2
WTM’12, Bern, April 10, 2012
The DeuceSTM • J TransacEons defined by a single Java method annotaEon: @Atomic
• J Well-‐defined API for implemenEng STM algorithms
• J Efficient implementaEon of some STM algorithms, e.g., TL2, LSA, …
• J Macro-‐benchmarks available
• L Does not allow the efficient implementaEon of other STM algorithms, e.g., mulE-‐version
• L Fair comparison is not possible
3
WTM’12, Bern, April 10, 2012
TransacEonal InformaEon
Thread 1
Thread 2
Thread 3
…
1
2
3
4
TxWrite
TxRead
Tx Desc. Clock = 3
Tx Desc. Clock = 2
Tx Desc. Clock = 3
Lock, version …
Lock, version …
Lock, version …
Lock, version …
Metainfo
Memory Thread-‐local
4
Data
Data
Data
Data
WTM’12, Bern, April 10, 2012
Out-‐place: N-‐1
5
…
1
2
3
4
Lock
Lock
Lock
Memory • Efficient implementaEon
using an hash funcEon
• False sharing • Algorithms:
– TL2, SwissTM, LSA
…
External Mapping Table
Lock
Data
Data
Data
Data
WTM’12, Bern, April 10, 2012
Out-‐place: 1-‐1
6
…
1
2
3
4
Version List
Version List
Version List
Memory
Version List
• Hash table with collision list • Bad performance
• Algorithms:
– MulE-‐version algorithms
…
External Mapping Table
Data
Data
Data
Data
WTM’12, Bern, April 10, 2012
In-‐place: 1-‐1
7
1
2
3
4
Version List
Version List
Version List
Version List
• Direct access to metainfo using memory reference
• Mostly used in managed runEmes (OO languages)
• Algorithms:
– TL2, SwissTM, LSA, MulE-‐version algorithms
…
Memory
Data
Data
Data
Data
Add support for In-‐place in DeuceSTM
WTM’12, Bern, April 10, 2012
TransacEonal Interface
Out-‐place in DeuceSTM
(obj1,field) (obj2,field) (obj3,field)
[metainfo1] [metainfo2] [metainfo3]
Metainfo table
TxRead(obj, field) TxWrite(obj, field, val)
8
Object instance
Field offset
Table Key
WTM’12, Bern, April 10, 2012
class C { int x; static int x_off = Offset(x);
@Atomic foo() { int t = TxRead(this, x_off ); TxWrite(this, x_off , t+1); }}
Out-‐place InstrumentaEon
9
class C { int x;
@Atomic foo() { x = x+1; }}
WTM’12, Bern, April 10, 2012
TxRead(obj, field) TxWrite(obj, field, val)
Object A field1 field1m fields* methods()*
Our In-‐place Approach
Object A field1 fields* methods()*
Object B fields* methods()*
Object M
[metainfo]
10
TxRead(metainfo) TxWrite(metainfo, val)
TransacEonal Interface
WTM’12, Bern, April 10, 2012
Our In-‐place Approach
11
class C { int x;
@Atomic foo() { x = x+1; }}
class C { int x; TxField x_m = new TxField();
@Atomic foo() { int t = TxRead(x_m); TxWrite(x_m, t+1); }}
WTM’12, Bern, April 10, 2012
Metainfo and arrays
5 [0]
3 [1]
8 [2]
Original array: int[]
New array: TxArrField[] [0]
[1]
[2]
TxArrField
array index = 0 [metainfo]
TxArrField
array index = 1 [metainfo]
TxArrField
array index = 2 [metainfo]
12
WTM’12, Bern, April 10, 2012
Metainfo and arrays
13
class C { int[] a = new int[10]; @Atomic foo() { a[1] = a[2]+1; } void bar() {
a[2] = 3; }}
class C { TxArrInt[] a = new TxArrInt[10]; { int[] t = new int[10]; for (int i=0; i < 10; i++) { a[i] = new TxArrInt(i, t); } } TxField a_m = new TxField();
@Atomic foo() { int t = TxRead(a[2]); TxWrite(a[1], t+1); } void bar() {
a[0].array[3] = 3; }}
WTM’12, Bern, April 10, 2012
Experimental EvaluaEon Overhead
• Benchmarking algorithm: TL2 using a lock table
• Base case: – Using out-‐place strategy (original DeuceSTM)
• Comparing case: – Using in-‐place strategy
• Metainfo objects are created for each field • We use the metainfo object as the key for the external lock table
14
WTM’12, Bern, April 10, 2012
Experimental EvaluaEon Overhead
Write-‐Update: 0% Using Arrays
No arrays: o
verhead 3%
With arrays: o
verhead 20%
15
Not using Arrays
Overhead in %
0
5
10
15
20
25
30
35
1 2 4 8 16 24 32 40
Ove
rhea
d (%
)
Threads
IntSet RBTree, update=0%
0
5
10
15
20
25
30
35
1 2 4 8 16 24 32 40
Ove
rhea
d (%
)
Threads
IntSet SkipList, update=0%
WTM’12, Bern, April 10, 2012
Experimental EvaluaEon Overhead
Write-‐Update: 10%
No arrays: o
verhead 7%
With arrays: o
verhead 25%
16
Overhead in %
Using Arrays Not using Arrays
0
5
10
15
20
25
30
35
1 2 4 8 16 24 32 40
Ove
rhea
d (%
)
Threads
IntSet RBTree, update=10%
0
5
10
15
20
25
30
35
1 2 4 8 16 24 32 40
Ove
rhea
d (%
)
Threads
IntSet SkipList, update=10%
WTM’12, Bern, April 10, 2012
Experimental Results In-‐place vs. Out-‐place for MulE-‐version
• Two implementaEons of JVSTM in DeuceSTM – Out-‐place strategy
• Versions kept in external table with collision list – In-‐place strategy
• No external table • Versions kept in meta-‐info field
• How much faster is the in-‐place implementaEon?
17
WTM’12, Bern, April 10, 2012
0
5
10
15
20
25
30
35
1 2 4 8 16 24 32 40
Spee
dup
(x fa
ster
)
Threads
IntSet RBTree, update=10%
0
5
10
15
20
25
30
35
1 2 4 8 16 24 32 40
Spee
dup
(x fa
ster
)
JVSTM-In
Threads
IntSet SkipList, update=10%
JVSTM-In
Experimental EvaluaEon MulE-‐version
Average 5 M
mes faster
18
Speedup in X faster
Using Arrays Not using Arrays Write-‐Update: 10%
WTM’12, Bern, April 10, 2012
Experimental Results MulE-‐version
• JVSTM algorithm has a performance bopleneck in garbage collecEon of unused versions – Is it limiEng the speedup for in-‐place?
• JVSTM-‐noGC: an extension of JVSTM where – Version lists are fixed sized – No garbage collecEon of unused versions
19
WTM’12, Bern, April 10, 2012
0
5
10
15
20
25
30
35
1 2 4 8 16 24 32 40
Spee
dup
(x fa
ster
)
Threads
IntSet RBTree, update=10%
0
5
10
15
20
25
30
35
1 2 4 8 16 24 32 40
Spee
dup
(x fa
ster
)
JVSTM-InJVSTM-noGC
Threads
IntSet SkipList, update=10%
JVSTM-InJVSTM-noGC
Experimental EvaluaEon MulE-‐version
Average 20—
25 Mmes fas
ter
20
Speedup in X faster
Using Arrays Not using Arrays Write-‐Update: 10%
WTM’12, Bern, April 10, 2012
Experimental EvaluaEon JVSTM vs TL2
21
Using Arrays Not using Arrays Write-‐Update: 10%
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
1 2 4 8 16 32
Thro
ughp
ut (t
rans
actio
ns/s
x 1
e6)
Threads
IntSet RBTree, update=10%
TL2JVSTM-noGC
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
1 2 4 8 16 32
Thro
ughp
ut (t
rans
actio
ns/s
x 1
e6)
Threads
IntSet SkipList, update=10%
TL2JVSTM-noGC
WTM’12, Bern, April 10, 2012
Concluding Remarks
• In-‐place strategy support allows: – Efficient support of primiEve types
• Avoids boxing and unboxing – Efficient ImplementaEon of mulE-‐version algorithms
– Fair comparison of different kinds of STM algorithms
– Support for distributed STM algorithms • Ongoing work
22
WTM’12, Bern, April 10, 2012
The End
23