Upload
geraldine-greer
View
213
Download
0
Embed Size (px)
Citation preview
Hybrid Transactional Memory
Nir ShavitMIT and Tel-Aviv University
Joint work with Alex Matveev(and describing the work of many in this
summer school)
Transactional Memory• Memory Transactions are collections of reads
and writes executed atomically• Should Provide
– Disjoint Access Parallelism
• Should maintain internal and external consistency– External (Serializability): with respect to the
interleavings of other transactions.– Internal (Opacity): the transaction itself should
operate on a consistent state.
External Consistency
Application Memory
X
Y
0
0
Cannot both return 4
Transaction A: Read yWrite x = 4Return x+y
Transaction B: Read x Write y = 4Return x+y
Canonical synchronization problem all STM/HTM implementations must solve
Commit Time Locking (Write Buff)
1. To Read/Write: Check unlocked add to Read/Write set
2. Acquire Locks3. Validate read/write v#’s
unchanged4. Write Values5. Release each lock with v#+1
V# 0 V# 0
V# 0
V# 0
V# 0
V# 0
V# 0
V# 0
V# 0 V# 0
V# 0 V# 0
Mem Locks
V#+1 0
V# 0
V# 0
V# 1
V# 1
V# 1 X
Y
V#+1 0
V# 1 V#+1 0
V# 0
V#+1 0
V# 0
V# 0
V# 0
V# 0
V#+1 0
V# 0
X
Y
Read/Write Lock UnlockValidate Write
Internal Inconsistency (Opacity)[GuerraouiKapalka07]
X
Y4 Transaction B: Read xRead y
Transaction A:Write x = 4
Transaction A: Write y = 2 DIV by 0 ERROR!
Compute z = 1/(x-y)
84
TL2/TinySTM’s Global Clock
• Have a shared global version clock
• Incremented by writing transactions (as infrequently as possible)
• Read by all transactions
• Used to validate state viewed by transaction is always opaque
[DiceShalevShavit06/ReigelFelberFetzer06]
TL2 Style STM
1. Read Vclock2. Read/Write: if unlocked and v#
less clock add to Read/Write-Set3. Acquire Locks4. Increment Clock5. Validate each v# less than clock6. Write values7. Release locks with v# = new
clock
100 VClock
87 0 87 0
34 0
88 0
44 0
V# 0
34 0
99 0 99 0
50 0 50 0
Mem Locks
87 0
34 0
99 0
50 0
34 1
99 1
87 0
X
Y
121 0
121 0
50 0
87 0
121 0
88 0
V# 0
44 0
V# 0
121 0
50 0
100120121
X
Y
Read/Write Lock UnlockValidate WriteRead Clock Inc
TL2 Style STM
• Advantages– Great Disjoint Access Parallelism
• Disadvantages– Accessing Meta-Data is Expensive– Progress guarantee is only deadlock
freedom
NOrec STM
• Use shared global clock as a seqlock
• Validation in every read if a seqlock change is detected
• Value-based validation: no need for meta-data (local time stamps or locks)
[DalessandroSpearScott10]
NOrec STM
100 seqlock100101
X
Y
Read/Write(with validation if seqlockchanged)
Not odd?seqlock
Lockseqlock(set odd)with validation if seqlockchanged
Unlockseqlock(set even)Write
102103
X
Y
R/W Set
Z
=
=
104104
ZZ
NOrec STM
• Advantages– No Expensive Meta-Data
• Disadvantages– Poor Disjoint Access Parallelism (all writes
are serialized by clock)– Progress guarantee is only starvation
freedom
Hardware TM[HerlihyMoss93,IBM/Intel13]
• Advantages– Everything in Hardware, No Meta Data – Great Disjoint Access Parallelism
• Disadvantages – No Progress Guarantee; Fail because of:
• Unsupported instructions: system or protected instructions
• Exceptions: page faults and similar• Capacity limit: too many accessed locations
Hybrid TM[Moir,Damron et. Al, Kumar et. al]
• Fast-Path: Execute Trans Using Best Effort HTM– If it Aborts because of Special Instructions
or Transaction Too Large, then…• Slow-Path: Execute Trans Using STM
Performance of HTM with progress guarantee of STM
Traditional Hybrid TM
0
0
Update locks
Software Transaction
Hardware Transaction
Test Versioned-Write- Lock in every Read/Write.Update in Write.
0
0
1
1Versioned-Write-Lock
Versioned-Write-Lock
[DamronFedorovaLevLuchangcoMoirNussbaum06]
Traditional Hybrid TM
• Advantages– Progress Guarantee of STM
• Disadvantages – HTM must access meta data– Fast path is actually slow because of extra
load and branch on every read
Phased TM[LevMoirNussbaum07]
• Two modes: all hardware or all software
• Shared global mode indicator
• If some hardware transaction aborts switch to software mode
• Eventually mode reverts back to hardware
Phased TM
• Advantages– Fast-path Pure HTM: No Meta Data
Accesses
• Disadvantages – Single Software Transaction Causes all
HTM to switch to STM slow path– Not clear how to tune to avoid frequent
mode transitions…
Hybrid Norec (1st Attempt)Software Norec:
Hardware:
Read/Write (no validation)Not odd?seqlock
Write seqlock +2
Read/Write(with validation)
Not odd?seqlock
LockSeqlock (set odd)
UnlockSeqlock(set even)Validate Write
Software will fail seqlockvalidation!
Hybrid Norec (1st Attempt)Software Norec:
Hardware:
Read/Write (no validation)
Write seqlock +2
Read/Write(with validation)
Not odd?seqlock
LockSeqlock(set odd)
UnlockSeqlock(set even)Validate Write
Hardware will fail seqlockvalidation!
Not odd?seqlock
Hybrid Norec (1st Attempt)Software Norec:
Hardware:
Read/Write (no validation)
Write seqlock +2
Read/Write(with validation)
Odd?seqlock
LockSeqlock(set odd)
UnlockSeqlock(set even)Validate Write
Hardware will fail seqlockvalidation!
Not odd?seqlock
Guaranteed External Consistency
Hybrid Norec (1st Attempt)Software Norec:
Hardware:
Read/Write (no validation)
Write seqlock +2
Read/Write(with validation)
Not odd?seqlock
LockSeqlock(set odd)
UnlockSeqlock(set even)Validate Write
Hardware will fail seqlockvalidation!
Not odd?seqlock
Problem: hardware opacity
Internal Inconsistency (Opacity)[GuerraouiKapalka07]
X
Y4 Hardware B: Read xRead y
Software A:Lock seqlock +1 Write x = 4
Write y = 2Unlock seqlock+1
DIV by 0 ERROR!
Compute z = 1/(x-y)…Odd? Seqlock
84
Hybrid Norec (2nd Attempt)Software Norec:
Hardware:
Read/Write (no validation)
Write seqlock +2
Read/Write(with validation)
Not odd?seqlock
LockSeqlock(set odd)
UnlockSeqlock(set even)Validate Write
Hardware will detect seqlockinvalidation!
Not odd?seqlock
Guarantee hardware opacity
Hybrid NOrec
• Advantages– Fast-path HTM: No Meta Data Accesses
• Disadvantages – Limited Disjoint Access Parallelism
–Seqlock is in hardware tracking set throughout HTM transaction
–Major sequential bottleneck
Possible Solutions
• Forget Opacity, Use sandboxing [DalessandroCarougeWhiteLevMoirScottSpear2011]
• Hybrid Norec 2 [RiegelMarlierNowackFelberFetzer11]: use non-transactional operations in a hardware transaction to read and validate seqlock has not changed after every read
But sandboxing is complex…and non-transactional ops only available in AMD proposal, not actual IBM or Intel …
Reduced Hardware Approach to HyTM
• Use short hardware transactions in the software slow-path
• I.e. create new “mixed” software/hardware path
• Not in order to make slow-path faster– But rather, in order to remove meta-data
accesses from fast path
• Default to all software if mixed path fails
[MatveevShavit13]
Transactional Writes Imply Hardware Opacity
X
Y4 Hardware B: Read xRead y
Trans A:Write x = 4
Write y = 2 DIV by 0 ERROR!
Compute z = 1/(x-y)
84
2
If in a hardware transaction this cannot happen…
Reduced Hardware NOrec
• In Slow-path commit, use a small hardware transaction to:– Write all values – Check seqlock has not changed– Write seqlock+1
• In Fast-path: – Move seqlock test to end, un-instrumented
read/writes
[MatveevShavit13]
Reduced Hardware NOrecSoftware Norec:
Hardware:
Read/Write (no instrumentation)
Write seqlock +1
Read/Write(with validation)
Changed?seqlock
Lockseqlock(set odd) Validate
In HTM Trans: Write values Changed? seqlock seqlock +1
Hardware will detectwrite conflictwithout seqlock!
Changed?seqlock
Guarantee fast-path opacitywithout having seqlock in TM tracking set for long
Write
Lockseqlock(set even)
Readseqlock
Reduced Hardware NOrec
• Properties– Fast-path: No Meta Data; No
instrumentation of reads or writes– Slow-path:
–short hardware transaction: size of write set
–can repeatedly attempt short hardware transaction in commit
Reduced Hardware NOrec
• Advantages– Hardware Disjoint Access Parallelism
– seqlock accessed only at end of HTM transaction
– Surprise: 1st HyTM that is Obstruction-free and Privatizing
– Disadvantages– Still window of possible abort due to
seqlock increment
Reduced Hardware TL2 Style
Software TL2 style:
Hardware:
Read/Write (no validation)
Hardware Will See Software
Read/Write (validate) Validate
Write values With Clock +1
Read Clock
Read Clock
WriteIn HTM Trans: Write values
Hardware will detectwrite conflict
Reduced Hardware TL2 Style
Software TL2 style:
Hardware:
Read/Write (no validation)
Problem: if between validate and hardware write, can have inconsistency
Read/Write (validate) Validate
Write values With Clock +1
Read Clock
Read Clock
In HTM Trans: Write values
Hardware will detectwrite conflict
Solution: combine validation and writes in single transaction
In HTM Trans: Validate and Write values
Reduced Hardware TL2 Style
• Advantages– Complete Disjoint Access Parallelism
– GV6 clock incremented on aborts only – Obstruction-free
– Disadvantages– No privatization– Mixed path transaction size of meta-data
set