In3Place(Metainfo(Supportin( DeuceSTM(mcouceiro/eurotm/wtm2012/... · 2012. 4. 18. · WTM’12,(Bern,(April(10,(2012(In3Place(Metainfo(Supportin(DeuceSTM(Ricardo(J.(Dias,(Tiago(M.(Vale(and(João(M.(Lourenço(

WTM’12, Bern, April 10, 2012

In-‐Place Metainfo Support in DeuceSTM

Ricardo J. Dias, Tiago M. Vale and João M. Lourenço CITI / Universidade Nova de Lisboa

1


MoEvaEon

•  Fair comparison of different STM algorithms

•  Require a flexible framework to support different STM implementaEons – MulE-‐version, lock-‐based, …

– Same transacEonal interface

– Same benchmarking applicaEons

2


The DeuceSTM •  J TransacEons defined by a single Java method annotaEon: @Atomic

•  J Well-‐defined API for implemenEng STM algorithms

•  J Efficient implementaEon of some STM algorithms, e.g., TL2, LSA, …

•  J Macro-‐benchmarks available

•  L Does not allow the efficient implementaEon of other STM algorithms, e.g., mulE-‐version

•  L Fair comparison is not possible

3


TransacEonal InformaEon

Thread 1

Thread 2

Thread 3

…

1

2

3

4

TxWrite

TxRead

Tx Desc. Clock = 3

Tx Desc. Clock = 2

Tx Desc. Clock = 3

Lock, version …

Lock, version …

Lock, version …

Lock, version …

Metainfo

Memory Thread-‐local

4

Data

Data

Data

Data


Out-‐place: N-‐1

5

…

1

2

3

4

Lock

Lock

Lock

Memory •  Efficient implementaEon

using an hash funcEon

•  False sharing •  Algorithms:

–  TL2, SwissTM, LSA

…

External Mapping Table

Lock

Data

Data

Data

Data


Out-‐place: 1-‐1

6

…

1

2

3

4

Version List

Version List

Version List

Memory

Version List

•  Hash table with collision list •  Bad performance

•  Algorithms:

– MulE-‐version algorithms

…

External Mapping Table

Data

Data

Data

Data


In-‐place: 1-‐1

7

1

2

3

4

Version List

Version List

Version List

Version List

•  Direct access to metainfo using memory reference

•  Mostly used in managed runEmes (OO languages)

•  Algorithms:

–  TL2, SwissTM, LSA, MulE-‐version algorithms

…

Memory

Data

Data

Data

Data

Add support for In-‐place in DeuceSTM


TransacEonal Interface

Out-‐place in DeuceSTM

(obj1,field) (obj2,field) (obj3,field)

[metainfo1] [metainfo2] [metainfo3]

Metainfo table

TxRead(obj, field) TxWrite(obj, field, val)

8

Object instance

Field offset

Table Key


class C { int x; static int x_off = Offset(x);

@Atomic foo() { int t = TxRead(this, x_off ); TxWrite(this, x_off , t+1); }}

Out-‐place InstrumentaEon

9

class C { int x;

@Atomic foo() { x = x+1; }}


TxRead(obj, field) TxWrite(obj, field, val)

Object A field1 field1m fields* methods()*

Our In-‐place Approach

Object A field1 fields* methods()*

Object B fields* methods()*

Object M

[metainfo]

10

TxRead(metainfo) TxWrite(metainfo, val)

TransacEonal Interface


Our In-‐place Approach

11

class C { int x;

@Atomic foo() { x = x+1; }}

class C { int x; TxField x_m = new TxField();

@Atomic foo() { int t = TxRead(x_m); TxWrite(x_m, t+1); }}


Metainfo and arrays

5 [0]

3 [1]

8 [2]

Original array: int[]

New array: TxArrField[] [0]

[1]

[2]

TxArrField

array index = 0 [metainfo]

TxArrField


TxArrField


12


Metainfo and arrays

13

class C { int[] a = new int[10]; @Atomic foo() { a[1] = a[2]+1; } void bar() {

a[2] = 3; }}

class C { TxArrInt[] a = new TxArrInt[10]; { int[] t = new int[10]; for (int i=0; i < 10; i++) { a[i] = new TxArrInt(i, t); } } TxField a_m = new TxField();

@Atomic foo() { int t = TxRead(a[2]); TxWrite(a[1], t+1); } void bar() {

a[0].array[3] = 3; }}


Experimental EvaluaEon Overhead

•  Benchmarking algorithm: TL2 using a lock table

•  Base case: – Using out-‐place strategy (original DeuceSTM)

•  Comparing case: – Using in-‐place strategy

•  Metainfo objects are created for each field • We use the metainfo object as the key for the external lock table

14



Write-‐Update: 0% Using Arrays

No arrays: o

verhead 3%

With arrays: o

verhead 20%

15

Not using Arrays

Overhead in %

0

5

10

15

20

25

30

35

1 2 4 8 16 24 32 40

Ove

rhea

d (%

)

Threads

IntSet RBTree, update=0%

0

5

10

15

20

25

30

35

1 2 4 8 16 24 32 40

Ove

rhea

d (%

)

Threads

IntSet SkipList, update=0%



Write-‐Update: 10%

No arrays: o

verhead 7%

With arrays: o

verhead 25%

16

Overhead in %

Using Arrays Not using Arrays

0

5

10

15

20

25

30

35

1 2 4 8 16 24 32 40

Ove

rhea

d (%

)

Threads


0

5

10

15

20

25

30

35

1 2 4 8 16 24 32 40

Ove

rhea

d (%

)

Threads



Experimental Results In-‐place vs. Out-‐place for MulE-‐version

•  Two implementaEons of JVSTM in DeuceSTM – Out-‐place strategy

•  Versions kept in external table with collision list –  In-‐place strategy

•  No external table •  Versions kept in meta-‐info field

•  How much faster is the in-‐place implementaEon?

17


0

5

10

15

20

25

30

35

1 2 4 8 16 24 32 40

Spee

dup

(x fa

ster

)

Threads


0

5

10

15

20

25

30

35

1 2 4 8 16 24 32 40

Spee

dup

(x fa

ster

)

JVSTM-In

Threads


JVSTM-In

Experimental EvaluaEon MulE-‐version

Average 5 M

mes faster

18

Speedup in X faster

Using Arrays Not using Arrays Write-‐Update: 10%


Experimental Results MulE-‐version

•  JVSTM algorithm has a performance bopleneck in garbage collecEon of unused versions –  Is it limiEng the speedup for in-‐place?

•  JVSTM-‐noGC: an extension of JVSTM where – Version lists are fixed sized – No garbage collecEon of unused versions

19


0

5

10

15

20

25

30

35

1 2 4 8 16 24 32 40

Spee

dup

(x fa

ster

)

Threads


0

5

10

15

20

25

30

35

1 2 4 8 16 24 32 40

Spee

dup

(x fa

ster

)

JVSTM-InJVSTM-noGC

Threads


JVSTM-InJVSTM-noGC

Experimental EvaluaEon MulE-‐version

Average 20—

25 Mmes fas

ter

20

Speedup in X faster



Experimental EvaluaEon JVSTM vs TL2

21


0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

1 2 4 8 16 32

Thro

ughp

ut (t

rans

actio

ns/s

x 1

e6)

Threads


TL2JVSTM-noGC

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

1 2 4 8 16 32

Thro

ughp

ut (t

rans

actio

ns/s

x 1

e6)

Threads


TL2JVSTM-noGC


Concluding Remarks

•  In-‐place strategy support allows: – Efficient support of primiEve types

•  Avoids boxing and unboxing – Efficient ImplementaEon of mulE-‐version algorithms

– Fair comparison of different kinds of STM algorithms

– Support for distributed STM algorithms •  Ongoing work

22


The End

23

Documents

In3Place(Metainfo(Supportin( DeuceSTM(mcouceiro/eurotm/wtm2012/... · 2012. 4. 18. · WTM’12,(Bern,(April(10,(2012(In3Place(Metainfo(Supportin(DeuceSTM(Ricardo(J.(Dias,(Tiago(M.(Vale(and(João(M.(Lourenço(