Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
Universityof Illinois http://iacoma.cs.uiuc.edu/
http://iacoma.cs.uiuc.edu/
OmniOrder: Directory-Based Conflict Serialization of Transactions
Xuehai QianBenjamin Sahelices and Josep Torrellas
UC Berkeley, Universidad de ValladolidUniversity of Illinois, Urbana-Champaign
http://iacoma.cs.uiuc.eduhttp://iacoma.cs.uiuc.edu
Xuehai Qian OmniOrder: Directory-Based Conflict Serialization of Transactions
Transaction (Atomic Block)
2
• Types of transactions:• SW-demarcated transactions: fixed boundary
• TM or compiler-generated• HW-generated transactions: dynamically-built
• Enforcing SC speculatively• Conflicting transactions need to be serialized• Conventional serialization mechanisms• Squash: one of the transactions aborts• Stall: one of the transactions stalls
T1T2
squashstall
wrrd
Xuehai Qian OmniOrder: Directory-Based Conflict Serialization of Transactions
Conflict Serialization
3
• Serializes conflicting transactions without squash or stall
• Commits conflicting transactions according to the order of dependences
• Only squash on cyclic dependence
T1 T2
commit order
Xuehai Qian OmniOrder: Directory-Based Conflict Serialization of Transactions
Supporting Conflict Serialization
• Record dependence ordering• Forward data (typically)• Detect dependence cycles• Four existing proposals• Dependence-Aware TM (MICRO’08): Snoopy-based • SONTM (MICRO’10): High communication overhead• BulkSMT (HPCA’12): Only within single SMT• Wait-n-GoTM (ASPLOS’13): only SW-demarcated trans.
4
No existing scheme can support conflict serialization of both SW and HW transactions in a directory-based protocol
Xuehai Qian OmniOrder: Directory-Based Conflict Serialization of Transactions
Contribution: OmniOrder
• A directory-based cache coherence protocol that supports conflict serialization of both SW and HW transactions
• Key idea:• Keep only non-spec data in the caches• Keep history of spec updates at word granularity in a buffer• On line transfer due to cache coherence, include history of
spec updates
• Coherence protocol transitions are unmodified
5
Xuehai Qian OmniOrder: Directory-Based Conflict Serialization of Transactions
Outline• Motivation• OmniOrder Design• Evaluation
6
Xuehai Qian OmniOrder: Directory-Based Conflict Serialization of Transactions
OmniOrder Characteristics
• Speculative state management is decoupled from coherence transitions
• No centralized HW structure or operation• Minimum overhead if there is no conflict• When a transaction is squashed: only squash successors
with same-word RAW
• Modest complexity
7
Xuehai Qian OmniOrder: Directory-Based Conflict Serialization of Transactions
S:v0
I
Accesses within Transactions
P1P0 P2
D: v0 I I
Wr:v1
P0: v1 Speculative Update History
P1P0 P2
I IP0: v1
Wr:v2
D:v0
P1: v2
Multiple history
entries for a word
I
P1P0 P2
D:v0 IIP0: v1
Rd
S:v0
Dir
P1: v2 On a future read: Dir will provide both data and update history
Eager value propagation • No extra messages or update
history when there is no conflict
• Original coherence protocol transitions are not affected
33
commit order
commit order
commit order
Eager conflict detection
Record order of transactions
P0: v1P1: v2P0: v1P1: v2
Xuehai Qian OmniOrder: Directory-Based Conflict Serialization of Transactions
P0: v1P0: v1
Transaction Commit and Squash
9
S:v0I
P1P0 P2
D:v0 II
P1: v2
S:v0
Dir
S:v0
P1: v2P0: v1
Sq
P0’s squash does not squash P1 or P2 because they did not
read P0’s data
P1‘s squash squashes P2
because P2 read P1‘s data
Sq
Squash never affects non-spec
cache line.
P1: v2
S:v0I
P1P0 P2
D:v0 II
P1: v2
S:v0
Dir
S:v0
P1: v2P0: v1
Sq
Final cache state: contains P1’s
update but not P0’s update
Cmt
P1: v2S:v2
CmtS:v2
Squashes never affect non-spec
cache lines
Lazy value update: merge
on commit
commit order
commit order
commit order
commit order
Sq
Sq
Sq
Sq CmtP0: v1
Sq
Xuehai Qian OmniOrder: Directory-Based Conflict Serialization of Transactions
Transaction Commit and Squash
10
• Each processor in a dependence chain forwards squash and commit signals to successors
• Transaction commit: merge updates wherever they are• Transaction squash: purge updates wherever they are• Merges are in commit order; purges are in any order• Lazy value update is the key for efficient state recovery• Squashing a transaction simply involves purging update
history entries
Xuehai Qian OmniOrder: Directory-Based Conflict Serialization of Transactions
➙ correct merges and purges
➙ HW compatible with coherence
protocol
Keep update history at WORD granularity
Multi-word Cache Line
11
P1P0
wr AD I
P0:v0
P1P0
wr BD I
P0:v0I D
commit order
Record the commit order based on
dependences at LINE granularity
P1P0
I DP0:v0
P1:v1
P1:v1
commit order
wr AD I
Cycle: one of the transactions needs to squash. Even if no conflict at word
granularity
A B
P0:v2
Xuehai Qian OmniOrder: Directory-Based Conflict Serialization of Transactions
Cycle Detection• OmniOrder uses a local conservative cycle detection policy• A cycle is detected by a transaction, if:
• The transaction is the src of a dependence and the dst of another dependence• The src dependence occurs before the dst dependence
• The transaction that detects the cycle squashes• It may trigger other squashes following the successors
• The policy may have false positive but simple to implement
12
T0 T1 T2 T3A B
C D
GH
12
4
F3
E
Xuehai Qian OmniOrder: Directory-Based Conflict Serialization of Transactions
Cycle Detection• OmniOrder uses a local conservative cycle detection policy• A cycle is detected by a transaction, if
• The transaction is the src of a dependence and the dst of another dependence• The src dependence occurs before the dst dependence
• The transaction that detects the cycle squashes• It may trigger other squashes following the successors
• The policy may have false positive but simple to implement
13
T0 T1 T2 T3A B
C D
GH
12
3
Xuehai Qian OmniOrder: Directory-Based Conflict Serialization of Transactions
HW-Generated Dynamic Transaction
14
OmniOrder handles HW-generated transactions like SW-demarcated ones
P0 P1
Time
B0 B1
rd x
rd z
wr y
wr z
Start a HW transaction. It has to be committed according to dependence order
Enter a HW transaction to
enforce SC
Xuehai Qian OmniOrder: Directory-Based Conflict Serialization of Transactions
Also in the paper ...
15
• Detailed specification of speculative reads and writes, commits and squashes
• Hardware structures• Implementation issues
Xuehai Qian OmniOrder: Directory-Based Conflict Serialization of Transactions
Outline• Motivation• OmniOrder Design• Evaluation
16
Xuehai Qian OmniOrder: Directory-Based Conflict Serialization of Transactions
Evaluation Setup
17
• Use simulations of multicore with up to 64 cores• Private L1 cache, shared and banked L2 cache with distributed
directory
• SW-demarcated transactions• Run STAMP benchmarks• Compare OmniOrder (OO) to Squash-On-Conflict (S)
• HW-generated dynamic transactions• Run SPLASH, PARSEC and small concurrent algorithms• Compare OO to:• InvisiFence with Commit-On-Violation (IF_COV)• Release Consistency (RC)
Xuehai Qian OmniOrder: Directory-Based Conflict Serialization of Transactions Bayes_S Bayes_O
O G
enome_S
Genom
e_OO
Intruder_S Intruder_O
O
Kmeans_S
Kmeans_O
O
Labyrinth_S Labyrinth_O
O
Ssca2_S Ssca2_O
O
Vacation_S Vacation_O
O
Yada_S Yada_O
O
Avg_S Avg_O
O0
0.10.20.30.40.50.60.70.80.91.0
Nor
mal
ized
Exe
cutio
n Ti
me Useful Memory Squashed
SW-Demarcated Transactions
18
• OO reduces execution time by over 18% on avg. over squash-on-conf• Reduces squash time by ordering transactions• Reduces memory time by avoiding squash-induced cache misses
Xuehai Qian OmniOrder: Directory-Based Conflict Serialization of Transactions
HW-Generated Transactions (Apps)
19
• OO has similar performance as RC and IF_COV for most apps due to few conflicts
SC
Barnes_RC
Barnes_IF_CO
V Barnes_O
O Blackscholes_R
C Blackscholes_IF_C
OV
Blackscholes_OO
Cholesky_R
C C
holesky_IF_CO
V C
holesky_OO
FFT_RC
FFT_IF_CO
V FFT_O
O
Fluidanimate_R
C Fluidanim
ate_IF_CO
V Fluidanim
ate_OO
Fmm
_RC
Fmm
_IF_CO
V Fm
m_O
O
LU_R
C LU
_IF_CO
V LU
_OO
Ocean_R
C O
cean_IF_CO
V O
cean_OO
Radiosity_R
C R
adiosity_IF_CO
V R
adiosity_OO
Radix_R
C R
adix_IF_CO
V R
adix_OO
Raytrace_R
C R
aytrace_IF_CO
V R
aytrace_OO
Streamcluster_R
C Stream
cluster_IF_CO
V Stream
cluster_OO
Swaptions_R
C Sw
aptions_IF_CO
V Sw
aptions_OO
Volrend_RC
Volrend_IF_CO
V Volrend_O
O
Water-ns_R
C W
ater-ns_IF_CO
V W
ater-ns_OO
Water-sp_R
C W
ater-sp_IF_CO
V W
ater-sp_OO
0
0.25
0.5
0.75
1.0
1.25
Nor
mal
ized
Exe
cutio
n Ti
me Useful Memory Squashed+Stall
Xuehai Qian OmniOrder: Directory-Based Conflict Serialization of Transactions
HW-Generated Transactions (Conc. Algo.)
20
• OO reduces execution time by over 15% on average• Concurrent algorithms have conflicts ➙ OO eliminates most of
squashes
Dekker_IF_C
OV
Dekker_O
O Peterson_IF_C
OV
Peterson_OO
Aharr_IF_CO
V Aharr_O
O
Harris_IF_C
OV
Harris_O
O
Lazylist_IF_CO
V Lazylist_O
O
Moirbt_IF_C
OV
Moirbt_O
O
Moircas_IF_C
OV
Moircas_O
O
Ms2_IF_C
OV
Ms2_O
O
Msn_IF_C
OV
Msn_O
O
Mst_IF_C
OV
Mst_O
O
Snark_IF_CO
V Snark_O
O
Avg_IF_CO
V Avg_O
O
00.10.20.30.40.50.60.70.80.91.0
Nor
mal
ized
Exe
cutio
n Ti
me Useful Memory Squashed+Stall
Xuehai Qian OmniOrder: Directory-Based Conflict Serialization of Transactions
Conclusion
21
• OmniOrder: first directory-based cache coherence protocol that supports conflict serialization of both SW and HW transactions
• Key idea:• Keep only non-spec data in the caches• Keep history of spec updates at word granularity in a buffer• On line transfer due to cache coherence, include history of spec
updates
• Coherence protocol transitions are unmodified• Reduction in execution time depends on frequency of conflicts:• Avg. reduction by 18% for SW transactions• Avg. reduction by 15% for HW transactions for apps with conflicts
Universityof Illinois http://iacoma.cs.uiuc.edu/
http://iacoma.cs.uiuc.edu/
OmniOrder: Directory-Based Conflict Serialization of Transactions
Xuehai QianBenjamin Sahelices and Josep Torrellas
UC Berkeley, Universidad de ValladolidUniversity of Illinois, Urbana-Champaign
http://iacoma.cs.uiuc.eduhttp://iacoma.cs.uiuc.edu