43
Hybrid Transactional Memory Nir Shavit MIT and Tel-Aviv University Joint work with Alex Matveev (and describing the work of many in this summer school)

Hybrid Transactional Memory Nir Shavit MIT and Tel-Aviv University Joint work with Alex Matveev (and describing the work of many in this summer school)

Embed Size (px)

Citation preview

Hybrid Transactional Memory

Nir ShavitMIT and Tel-Aviv University

Joint work with Alex Matveev(and describing the work of many in this

summer school)

Haswell

Transactional Memory[HerlihyMoss93]

Transactional Memory• Memory Transactions are collections of reads

and writes executed atomically• Should Provide

– Disjoint Access Parallelism

• Should maintain internal and external consistency– External (Serializability): with respect to the

interleavings of other transactions.– Internal (Opacity): the transaction itself should

operate on a consistent state.

External Consistency

Application Memory

X

Y

0

0

Cannot both return 4

Transaction A: Read yWrite x = 4Return x+y

Transaction B: Read x Write y = 4Return x+y

Canonical synchronization problem all STM/HTM implementations must solve

Locking STMs

Map Array of Versioned-Write-Locks

Application Memory

V#

Commit Time Locking (Write Buff)

1. To Read/Write: Check unlocked add to Read/Write set

2. Acquire Locks3. Validate read/write v#’s

unchanged4. Write Values5. Release each lock with v#+1

V# 0 V# 0

V# 0

V# 0

V# 0

V# 0

V# 0

V# 0

V# 0 V# 0

V# 0 V# 0

Mem Locks

V#+1 0

V# 0

V# 0

V# 1

V# 1

V# 1 X

Y

V#+1 0

V# 1 V#+1 0

V# 0

V#+1 0

V# 0

V# 0

V# 0

V# 0

V#+1 0

V# 0

X

Y

Read/Write Lock UnlockValidate Write

Internal Inconsistency (Opacity)[GuerraouiKapalka07]

X

Y4 Transaction B: Read xRead y

Transaction A:Write x = 4

Transaction A: Write y = 2 DIV by 0 ERROR!

Compute z = 1/(x-y)

84

TL2/TinySTM’s Global Clock

• Have a shared global version clock

• Incremented by writing transactions (as infrequently as possible)

• Read by all transactions

• Used to validate state viewed by transaction is always opaque

[DiceShalevShavit06/ReigelFelberFetzer06]

TL2 Style STM

1. Read Vclock2. Read/Write: if unlocked and v#

less clock add to Read/Write-Set3. Acquire Locks4. Increment Clock5. Validate each v# less than clock6. Write values7. Release locks with v# = new

clock

100 VClock

87 0 87 0

34 0

88 0

44 0

V# 0

34 0

99 0 99 0

50 0 50 0

Mem Locks

87 0

34 0

99 0

50 0

34 1

99 1

87 0

X

Y

121 0

121 0

50 0

87 0

121 0

88 0

V# 0

44 0

V# 0

121 0

50 0

100120121

X

Y

Read/Write Lock UnlockValidate WriteRead Clock Inc

TL2 Style STM

• Advantages– Great Disjoint Access Parallelism

• Disadvantages– Accessing Meta-Data is Expensive– Progress guarantee is only deadlock

freedom

NOrec STM

• Use shared global clock as a seqlock

• Validation in every read if a seqlock change is detected

• Value-based validation: no need for meta-data (local time stamps or locks)

[DalessandroSpearScott10]

NOrec STM

100 seqlock100101

X

Y

Read/Write(with validation if seqlockchanged)

Not odd?seqlock

Lockseqlock(set odd)with validation if seqlockchanged

Unlockseqlock(set even)Write

102103

X

Y

R/W Set

Z

=

=

104104

ZZ

NOrec STM

• Advantages– No Expensive Meta-Data

• Disadvantages– Poor Disjoint Access Parallelism (all writes

are serialized by clock)– Progress guarantee is only starvation

freedom

Hardware TM[HerlihyMoss93,IBM/Intel13]

• Advantages– Everything in Hardware, No Meta Data – Great Disjoint Access Parallelism

• Disadvantages – No Progress Guarantee; Fail because of:

• Unsupported instructions: system or protected instructions

• Exceptions: page faults and similar• Capacity limit: too many accessed locations

Hybrid TM[Moir,Damron et. Al, Kumar et. al]

• Fast-Path: Execute Trans Using Best Effort HTM– If it Aborts because of Special Instructions

or Transaction Too Large, then…• Slow-Path: Execute Trans Using STM

Performance of HTM with progress guarantee of STM

Traditional Hybrid TM

0

0

Update locks

Software Transaction

Hardware Transaction

Test Versioned-Write- Lock in every Read/Write.Update in Write.

0

0

1

1Versioned-Write-Lock

Versioned-Write-Lock

[DamronFedorovaLevLuchangcoMoirNussbaum06]

Traditional Hybrid TM

• Advantages– Progress Guarantee of STM

• Disadvantages – HTM must access meta data– Fast path is actually slow because of extra

load and branch on every read

Traditional Hybrid TM

Phased TM[LevMoirNussbaum07]

• Two modes: all hardware or all software

• Shared global mode indicator

• If some hardware transaction aborts switch to software mode

• Eventually mode reverts back to hardware

Phased TM

• Advantages– Fast-path Pure HTM: No Meta Data

Accesses

• Disadvantages – Single Software Transaction Causes all

HTM to switch to STM slow path– Not clear how to tune to avoid frequent

mode transitions…

Hybrid Norec (1st Attempt)Software Norec:

Hardware:

Read/Write (no validation)Not odd?seqlock

Write seqlock +2

Read/Write(with validation)

Not odd?seqlock

LockSeqlock (set odd)

UnlockSeqlock(set even)Validate Write

Software will fail seqlockvalidation!

Hybrid Norec (1st Attempt)Software Norec:

Hardware:

Read/Write (no validation)

Write seqlock +2

Read/Write(with validation)

Not odd?seqlock

LockSeqlock(set odd)

UnlockSeqlock(set even)Validate Write

Hardware will fail seqlockvalidation!

Not odd?seqlock

Hybrid Norec (1st Attempt)Software Norec:

Hardware:

Read/Write (no validation)

Write seqlock +2

Read/Write(with validation)

Odd?seqlock

LockSeqlock(set odd)

UnlockSeqlock(set even)Validate Write

Hardware will fail seqlockvalidation!

Not odd?seqlock

Guaranteed External Consistency

Hybrid Norec (1st Attempt)Software Norec:

Hardware:

Read/Write (no validation)

Write seqlock +2

Read/Write(with validation)

Not odd?seqlock

LockSeqlock(set odd)

UnlockSeqlock(set even)Validate Write

Hardware will fail seqlockvalidation!

Not odd?seqlock

Problem: hardware opacity

Internal Inconsistency (Opacity)[GuerraouiKapalka07]

X

Y4 Hardware B: Read xRead y

Software A:Lock seqlock +1 Write x = 4

Write y = 2Unlock seqlock+1

DIV by 0 ERROR!

Compute z = 1/(x-y)…Odd? Seqlock

84

Hybrid Norec (2nd Attempt)Software Norec:

Hardware:

Read/Write (no validation)

Write seqlock +2

Read/Write(with validation)

Not odd?seqlock

LockSeqlock(set odd)

UnlockSeqlock(set even)Validate Write

Hardware will detect seqlockinvalidation!

Not odd?seqlock

Guarantee hardware opacity

Hybrid NOrec

• Advantages– Fast-path HTM: No Meta Data Accesses

• Disadvantages – Limited Disjoint Access Parallelism

–Seqlock is in hardware tracking set throughout HTM transaction

–Major sequential bottleneck

Possible Solutions

• Forget Opacity, Use sandboxing [DalessandroCarougeWhiteLevMoirScottSpear2011]

• Hybrid Norec 2 [RiegelMarlierNowackFelberFetzer11]: use non-transactional operations in a hardware transaction to read and validate seqlock has not changed after every read

But sandboxing is complex…and non-transactional ops only available in AMD proposal, not actual IBM or Intel …

Reduced Hardware Approach to HyTM

• Use short hardware transactions in the software slow-path

• I.e. create new “mixed” software/hardware path

• Not in order to make slow-path faster– But rather, in order to remove meta-data

accesses from fast path

• Default to all software if mixed path fails

[MatveevShavit13]

Transactional Writes Imply Hardware Opacity

X

Y4 Hardware B: Read xRead y

Trans A:Write x = 4

Write y = 2 DIV by 0 ERROR!

Compute z = 1/(x-y)

84

2

If in a hardware transaction this cannot happen…

Reduced Hardware NOrec

• In Slow-path commit, use a small hardware transaction to:– Write all values – Check seqlock has not changed– Write seqlock+1

• In Fast-path: – Move seqlock test to end, un-instrumented

read/writes

[MatveevShavit13]

Reduced Hardware NOrecSoftware Norec:

Hardware:

Read/Write (no instrumentation)

Write seqlock +1

Read/Write(with validation)

Changed?seqlock

Lockseqlock(set odd) Validate

In HTM Trans: Write values Changed? seqlock seqlock +1

Hardware will detectwrite conflictwithout seqlock!

Changed?seqlock

Guarantee fast-path opacitywithout having seqlock in TM tracking set for long

Write

Lockseqlock(set even)

Readseqlock

Reduced Hardware NOrec

• Properties– Fast-path: No Meta Data; No

instrumentation of reads or writes– Slow-path:

–short hardware transaction: size of write set

–can repeatedly attempt short hardware transaction in commit

Reduced Hardware NOrec

• Advantages– Hardware Disjoint Access Parallelism

– seqlock accessed only at end of HTM transaction

– Surprise: 1st HyTM that is Obstruction-free and Privatizing

– Disadvantages– Still window of possible abort due to

seqlock increment

Reduced Hardware NOrec

Reduced Hardware NOrec

Reduced Hardware TL2 Style

Software TL2 style:

Hardware:

Read/Write (no validation)

Hardware Will See Software

Read/Write (validate) Validate

Write values With Clock +1

Read Clock

Read Clock

WriteIn HTM Trans: Write values

Hardware will detectwrite conflict

Reduced Hardware TL2 Style

Software TL2 style:

Hardware:

Read/Write (no validation)

Problem: if between validate and hardware write, can have inconsistency

Read/Write (validate) Validate

Write values With Clock +1

Read Clock

Read Clock

In HTM Trans: Write values

Hardware will detectwrite conflict

Solution: combine validation and writes in single transaction

In HTM Trans: Validate and Write values

Reduced Hardware TL2 Style

• Advantages– Complete Disjoint Access Parallelism

– GV6 clock incremented on aborts only – Obstruction-free

– Disadvantages– No privatization– Mixed path transaction size of meta-data

set

RH1: Reduced Hardware TL2 Style

RH1: Reduced Hardware TL2 Style

HyTM: Long Journey

• Combination of ideas: – hardware transactions, – global clocks, – no meta data access, – mixed hardware software paths

• And there is still room for improvement