25
Nonblocking Transactions Nonblocking Transactions Without Indirection Using Without Indirection Using Alert-on-Update Alert-on-Update Michael Spear Arrvindh Shriraman Luke Dalessandro Sandhya Dwarkadas Michael Scott University of Rochester

Nonblocking Transactions Without Indirection Using Alert-on-Update

  • Upload
    knoton

  • View
    44

  • Download
    0

Embed Size (px)

DESCRIPTION

Nonblocking Transactions Without Indirection Using Alert-on-Update. Michael Spear Arrvindh Shriraman Luke Dalessandro Sandhya Dwarkadas Michael Scott University of Rochester. Software Transactional Memory. Memory transactions Code regions identified by the programmer - PowerPoint PPT Presentation

Citation preview

Page 1: Nonblocking Transactions Without Indirection Using Alert-on-Update

Nonblocking Transactions Without Nonblocking Transactions Without Indirection Using Alert-on-UpdateIndirection Using Alert-on-Update

Michael Spear Arrvindh Shriraman Luke Dalessandro

Sandhya Dwarkadas Michael Scott

University of Rochester

Page 2: Nonblocking Transactions Without Indirection Using Alert-on-Update

Nonblocking Transactions Without Indirection Using AOU

2M. Spear

Software Transactional MemorySoftware Transactional Memory

• Memory transactions– Code regions identified

by the programmer– Guaranteed to be atomic,

consistent, and isolated– An alternative to locks

• Speculative parallelism

• Under the hood:– Rollback / retry mechanism– Frequent checks ensure

consistency of reads

Attach version# to every location

To read: remember {location, version#}

To write:store in private buffer

To commit: 1. lock all write locations

2. check version#s of reads

abort/retry on conflict

3. replay writes from private buffer

4. release locks, update version#s

Simple 2-phase locking STM

Page 3: Nonblocking Transactions Without Indirection Using Alert-on-Update

Nonblocking Transactions Without Indirection Using AOU

3M. Spear

Nonblocking STMNonblocking STM

• How can we commit speculative writes atomicallywithout locking?

Tx1 will modify O1…O4

1. Tx1 generates speculative writes

2. Tx1 acquires O1…O4

3. Single atomic operation– Changes Tx1 to Committed– Makes writes permanent– Releases O1…O4

O1

AAAAA

Tx 1Active

Tx1

Committed

O2

BBBBB

O4

DDDDD

O3

CCCCC

O1’11111

O2’22222

O3’33333

O4’44444

Page 4: Nonblocking Transactions Without Indirection Using Alert-on-Update

Nonblocking Transactions Without Indirection Using AOU

4M. Spear

Indirection-Based Nonblocking STMIndirection-Based Nonblocking STM

• Locator object– Lists last version– Lists next version– Choice depends on

state of owner

• Costs of indirection:– Increased working set– More capacity/coherence

misses

• Existing indirection-free solutions are complex

Owner

Old Version

New Version

O1’BBBBB

O1AAAAA

Tx 1Active

DSTM-style Metadata[Herlihy et al. PODC 03]

Page 5: Nonblocking Transactions Without Indirection Using Alert-on-Update

Nonblocking Transactions Without Indirection Using AOU

5M. Spear

OutlineOutline

• Background• Alert-on-Update (AOU)• AOU for indirection-free STM• AOU for lightweight validation• Evaluation• Future work• Conclusions

Page 6: Nonblocking Transactions Without Indirection Using Alert-on-Update

Nonblocking Transactions Without Indirection Using AOU

6M. Spear

Alert-on-UpdateAlert-on-Update

• Claim: some cache coherence events are interesting• Alert-on-Update (AOU)

– Special instruction marks cache lines of interest– Cache controller notifies processor when marked line is

evicted– Processor immediately jumps to user-mode handler

• No O/S involvement or context switching(but can be virtualized across context switches)

Page 7: Nonblocking Transactions Without Indirection Using Alert-on-Update

Nonblocking Transactions Without Indirection Using AOU

7M. Spear

AOU Hardware RequirementsAOU Hardware Requirements

• Registers:– Address of handler, PC at time of alert– Extra status bits for cause of alert, disabling alerts– Extra entry in interrupt vector table

• Cache:– One extra bit per cache line

• Instructions:– Set/clear handler– Mark and load line (aload)– Un-mark line (arelease)– Un-mark all lines– Enable/disable alerts

Lightweight implementation supporting only one AOU line adds one register, removes need for extra bits in cache

Page 8: Nonblocking Transactions Without Indirection Using Alert-on-Update

Nonblocking Transactions Without Indirection Using AOU

8M. Spear

Current Implementation LimitationsCurrent Implementation Limitations

• Virtualization is the responsibility of user code – Context switch clears all alert bits, calls handler on return

• Handler can re-aload lines– Alerts are deferred on other kernel calls

• Limited by size of cache• Limited precision

– Alerts masked within handler– Location causing alert not currently provided

Page 9: Nonblocking Transactions Without Indirection Using Alert-on-Update

Nonblocking Transactions Without Indirection Using AOU

9M. Spear

Simple, Nonblocking, Indirection-Free STMSimple, Nonblocking, Indirection-Free STM

• Only one AOU line required per processor• STM stores speculative writes in per-object buffers• To write (after commit), use AOU revocable locks

– Lock the object, replay stores, release lock– Only lock/replay one location/object at a time

Version#/Owner/Lock

Redo Log

Object Contents

Old Version#

Master Copy

In-ProgressModifications

Data Pointer

Page 10: Nonblocking Transactions Without Indirection Using Alert-on-Update

Nonblocking Transactions Without Indirection Using AOU

10M. Spear

Revocable Locks with AOURevocable Locks with AOU

• Our lock protects an idempotent operation– Anyone can replay stores; none may use object until replay

is complete

• Use AOU to guard lock– Revocation immediately

halts replay in current thread

– Wait (briefly) before re-acquire

– Lock release immediately visible to waiting threads

try set_handler({throw A}) aload(lock) if (version changed) arelease(lock) goto bottom if (lock->locked) wait; overwrite lock replay writes release lock (version++) arelease(lock)catch (A) goto top

Page 11: Nonblocking Transactions Without Indirection Using Alert-on-Update

Nonblocking Transactions Without Indirection Using AOU

11M. Spear

AOU for Lightweight ValidationAOU for Lightweight Validation

• Suppose we can aloadmany lines

• Recall 2PL STM algorithm

• On read, don’t store {location, version#}– Instead, aload(location)

• At commit, don’t validate– Any conflict would have

caused an alert

• On alert, rollback/retry

Attach version# to every location

To read: – remember {location, version#}– aload(location)

To write:– store in private buffer

To commit: 1. lock all write locations

2. check version#s of reads

3. replay writes from private buffer

4. release locks, update version#s

Page 12: Nonblocking Transactions Without Indirection Using Alert-on-Update

Nonblocking Transactions Without Indirection Using AOU

12M. Spear

AOU for Lightweight ValidationAOU for Lightweight Validation

• Many TMs validate on every load of a new location– O(n2) overhead

• AOU eliminates this overhead for n < sizeof(cache)– Limited by associativity

• Fallback to validation only for additional locations

Page 13: Nonblocking Transactions Without Indirection Using Alert-on-Update

Nonblocking Transactions Without Indirection Using AOU

13M. Spear

EvaluationEvaluation

• 6 Runtime Systems– RSTM

(nonblocking, indirection, software only)

– RTM-Lite (RSTM + AOU)– LOCK_TM

(indirection free, no AOU)– AOU_1

(indirection-free, 1 AOU line)– AOU_N

(indirection-free, many AOU lines)

– CGL(coarse locks)

• Simulator– Simics/GEMS– 16-way CMP

(1.2GHz in-order, single issue)

– Private 64KB L1 (1 cycle latency)

– Shared 8MB L2(20 cycle latency)

Page 14: Nonblocking Transactions Without Indirection Using Alert-on-Update

Nonblocking Transactions Without Indirection Using AOU

14M. Spear

Indirection ReductionIndirection Reduction

Hash Table (256 buckets, 33% insert/lookup/remove)

0

1

2

3

4

5

6

7

8

1 2 4 8 16

Threads

No

rmal

ized

Sp

eed

up

CGL RSTM LOCK_TM AOU_1

Reducing indirection has marginal impact- Working set is small- Fewer cache misses at high thread countsAOU adds some overhead-In-order exaggerates try/catch cost

(normalized to RSTM, 1 thread)

Page 15: Nonblocking Transactions Without Indirection Using Alert-on-Update

Nonblocking Transactions Without Indirection Using AOU

15M. Spear

Indirection ReductionIndirection Reduction

Red-Black Tree (4096 elements, 33% lookup/insert/remove)

0

1

2

3

4

5

6

7

8

1 2 4 8 16

Threads

No

rma

lize

d S

pe

ed

up

CGL RSTM LOCK_TM AOU_1

Reducing indirection can hurt- Additional validation required (could reduce with compiler support)Quadratic validation still dominates

(normalized to RSTM, 1 thread)

Page 16: Nonblocking Transactions Without Indirection Using Alert-on-Update

Nonblocking Transactions Without Indirection Using AOU

16M. Spear

Validation ReductionValidation Reduction

Hash Table (256 buckets, 33% lookup/insert/remove)

0

1

2

3

4

5

6

7

8

9

10

11

1 2 4 8 16

Threads

No

rmal

ized

Sp

eed

up

CGL RSTM RTM-Lite AOU_1 AOU_N

AOU scales, doesn’t admit false positivesOutperforms other validation heuristics

(normalized to RSTM, 1 thread)

Page 17: Nonblocking Transactions Without Indirection Using Alert-on-Update

Nonblocking Transactions Without Indirection Using AOU

17M. Spear

Validation ReductionValidation Reduction

Red-Black Tree (4096 elements, 33% lookup/insert/remove)

0123456789

101112131415

1 2 4 8 16

Threads

No

rmal

ized

Sp

eed

up

CGL RSTM RTM-Lite AOU_1 AOU_N

Indirection-free has excess validation- Could reduce by cloning code pathsStill almost 2x speedup, scalable

(normalized to RSTM, 1 thread)

Page 18: Nonblocking Transactions Without Indirection Using Alert-on-Update

Nonblocking Transactions Without Indirection Using AOU

18M. Spear

Future WorkFuture Work

• Non-TM uses (may require AOU for local writes)– Fast user-mode thread wakeup– Active messages– Debugging, watchpoints, code security– Poll-free asynchronous I/O

• Additional hardware acceleration for STM – Programmable Data Isolation

(see our paper at ISCA tomorrow)

Page 19: Nonblocking Transactions Without Indirection Using Alert-on-Update

Nonblocking Transactions Without Indirection Using AOU

19M. Spear

ConclusionsConclusions

• Alert-on-update is a simple, promising extension to modern ISAs– Enables low overhead, indirection-free nonblocking STM– Effectively removes O(n2) validation overhead– Potential benefit to many shared memory algorithms

• The effect of indirection on STM is complex– Read-only objects are no longer immutable– Extra validation can be reduced with compiler support– Effect exaggerated by small objects, in-order simulator

http://www.cs.rochester.edu/research/synchronization

Page 20: Nonblocking Transactions Without Indirection Using Alert-on-Update

Additional Performance ChartsAdditional Performance Charts

Page 21: Nonblocking Transactions Without Indirection Using Alert-on-Update

Nonblocking Transactions Without Indirection Using AOU

21M. Spear

Hash TableHash Table

0

1

2

3

4

5

6

7

8

9

10

11

1 2 4 8 16

Threads

No

rmal

ized

Sp

eed

up

CGL

RSTM

RSTM+C

RTM-Lite

LOCK

AOU_1

AOU_1+C

AOU_N

Page 22: Nonblocking Transactions Without Indirection Using Alert-on-Update

Nonblocking Transactions Without Indirection Using AOU

22M. Spear

Red-Black TreeRed-Black Tree

0

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

1 2 4 8 16

Threads

No

rmal

ized

Sp

eed

up

CGL

RSTM

RSTM+C

RTM-Lite

LOCK

AOU_1

AOU_1+C

AOU_N

Page 23: Nonblocking Transactions Without Indirection Using Alert-on-Update

Nonblocking Transactions Without Indirection Using AOU

23M. Spear

Linked List with Early ReleaseLinked List with Early Release

0

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

1 2 4 8 16

Threads

No

rmal

ized

Sp

eed

up

CGL

RSTM

RSTM+C

RTM-Lite

LOCK

AOU_1

AOU_1+C

AOU_N

Page 24: Nonblocking Transactions Without Indirection Using Alert-on-Update

Nonblocking Transactions Without Indirection Using AOU

24M. Spear

LFUCacheLFUCache

0

0.5

1

1.5

2

2.5

1 2 4 8 16

Threads

No

rmal

ized

Sp

eed

up

CGL

RSTM

RSTM+C

RTM-Lite

LOCK

AOU_1

AOU_1+C

AOU_N

5.88

Page 25: Nonblocking Transactions Without Indirection Using Alert-on-Update

Nonblocking Transactions Without Indirection Using AOU

25M. Spear

Random GraphRandom Graph

0

1

2

3

4

5

6

1 2 4 8 16

Threads

No

rmal

ized

Sp

eed

up

CGL

RSTM

RSTM+C

RTM-Lite

LOCK

AOU_1

AOU_1+C

AOU_N

25.08 16.75 12.54 8.51