Upload
ravikumarsid2990
View
223
Download
0
Embed Size (px)
Citation preview
8/10/2019 162 Consistency
1/22
CS 162
Memory Consistency Models
8/10/2019 162 Consistency
2/22
Memory operations are reordered to improveperformance
Hardware (e.g., store buffer, reorder buffer)
Compiler (e.g., code motion, caching value in register)
Behave the same as long as dependences arerespected
Reordering in Uniprocessors
a1: St x
a2: Ld y
a2: Ld y
a1: St x
8/10/2019 162 Consistency
3/22
counter-intuitiveprogram behavior
Reordering in Multiprocessors
Initiallyx=y=0
(Rx=1, Ry=1)
(Rx=1, Ry=0)
(Rx=0, Ry=0)
b1: Ry= y;
b2: Rx= x;
a1: x = 1;
a2: y = 1;
b2: Rx= x;
a1: x = 1;
a2: y = 1;
b1: Ry
= y;
b2: Rx= x;
a1: x = 1;
a2: y = 1;
b1: Ry= y;b1: Ry= y;
b2: Rx= x;
(Rx=0, Ry=1)Intuitively, y=1 x=1
a1: x = 1; b1: Ry= y;
b2: Rx= x;a2: y = 1;
P1 P2
a1: x = 1;
a2: y = 1;
Possible outcomes
8/10/2019 162 Consistency
4/22
Reordering in Multiprocessors
p = new A() if (flag)
a = p->var;flag = true;
P1 P2
flagis supposed to be set afterp is allocated
Initiallyp=NULL, flag = false
counter-intuitiveprogram behavior
Lock-free algorithms, e.g., Dekker, Peterson
8/10/2019 162 Consistency
5/22
Dekker Algorithm (mutual exclusion)
Reordering in Multiprocessors
flag1 = 1; flag2 = 1;
if (flag2 == 0) if (flag1 == 0)critical section critical section
P1 P2
Initially flag1 = flag2 = 0
flag1 = 1
flag2 == 0
After reordering, both flag1and flag2 can be 0
St flag1
Ld flag2
counter-intuitiveprogram behavior
8/10/2019 162 Consistency
6/22
Memory Consistency Models
Specify the ordering of loads and stores to
differentmemory locations
LdLd, Ld St, StLd, StSt
Contract between hardware, compiler, and
programmer
hardware and compiler will not violate the ordering specifiedthe programmer will not assume a stricter order than that of
the model
8/10/2019 162 Consistency
7/22
Memory Consistency Models
Allowed
Reordering
Commercial
Architecture
Sequential Consistency None not existTotal Store Ordering StLd x86, SPARC
Relaxed Memory Order All ARM, PowerPC
Low
High
Performance
Stronger models
Stronger constraints
Fewer
memory
reorderings
Easier to reason
Lower
performance
High
Low
Progra
mmability
8/10/2019 162 Consistency
8/22
Cache Coherence vs .Memory Model
Cache coherence ensures a consistent view ofmemory
Guarantees that the update to memory by one
processor will be seen by other processors eventually
But, how consistent ?NO guarantees on whenan update should be seen
NO guarantees on what order of updates should beseen
8/10/2019 162 Consistency
9/22
Cache Coherence vs .Memory Model
Initially A = B = 0
P1 P2 P3
A = 1; while (A != 1) ;
B = 1; while (B != 1) ;
tmp = A ;
tmp= 1? or tmp = 0?
8/10/2019 162 Consistency
10/22
Sequential Consistency (SC)
Definition [Lamport](1) the result of any execution is the same as if theoperations of all processors were executed in somesequential order;
(2) the operations of each individual processorappear in this sequence in the order specified by itsprogram.
MEMORY
P1
P3
P2
Pn
Behave as the repetition:(1) Pick a processor by anymethod (e.g., randomly)
(2) the processor completes a
load/store operation
8/10/2019 162 Consistency
11/22
SC Example
b1: Ry= y;
b2: Rx= x;
a1: x = 1;
a2: y = 1;
b2: Rx= x;
a1: x = 1;
a2: y = 1;
b1: Ry= y;
b1: Ry= y;
b2: Rx= x;
(Rx=0, Ry=0)
a1: x = 1; b1: Ry= y;
b2: Rx= x;a2: y = 1;
P1 P2
a1: x = 1;
a2: y = 1;
b1: Ry= y;
b2: Rx= x;
a1: x = 1;
a2: y = 1;
b1: Ry= y;
b2: Rx= x;
a1: x = 1;
a2: y = 1;
a2: y = 1;
b1: Ry= y;
b2: Rx= x;
a1: x = 1;
8/10/2019 162 Consistency
12/22
Sequential Consistency (SC)
Simple and intuitive
consistent with programmers intuition
easy to reason program behavior
However, the simplicity comes at the cost ofperformance
prevents aggressive compiler optimizations (e.g., loadreordering, store reordering, caching value in register)
constrains hardware utilization, (e.g., store buffer)
8/10/2019 162 Consistency
13/22
SC Violation
a1: x = 1
a2: y = 1
b1: R1 = y
b2: R2 = x
program order
conflict relation
SC Violation
- A cycleformed by programorders and conflict orders
[Shasha and Snir, 1988]
e.g., (a2, b1, b2, a1, a2)
- Executing in the order (a2, b1, b2, a1)will produce R1=1, R2=0, which is not anSC outcome
Insert fences to break cycle- a2 can not be executed before a1
8/10/2019 162 Consistency
14/22
8/10/2019 162 Consistency
15/22
8/10/2019 162 Consistency
16/22
if (cond)
a1: St x
a2: Ld y
b1: St y
b2: Ld x
Fence1 Fence2
a1is in a conditional branch
Conservativeness of Fences
a1: St *p
a2: Ld x
b1: St x
b2: Ld *q
Fence1 Fence2
p andq may point to the samememory location
Inserted statically and conservatively
No cycle is formed at runtime
8/10/2019 162 Consistency
17/22
8/10/2019 162 Consistency
18/22
Address-aware Fences
Consider memory locations accessed around
fences at runtime
Fences only take effect when there is a cycleabout to happen
8/10/2019 162 Consistency
19/22
Detect and Avoid Cycles
A1
A2
Proc 1 Proc 2
a1:
a2:
Fence1
B1
B2
b1:
Fence2
b2:
c1
c2?
How to detect c2efficiently?
8/10/2019 162 Consistency
20/22
Detect and Avoid Cycles
A1
A2
Proc 1 Proc 2
a1:
a2:
Fence1
B1
B2
b1:
Fence2
b2:
c1
watchl is t
c2?
How to detect c2efficiently?
Collecting watchlistfor each fence
Completing memory operation
checks the watchlist
- bypass,if its address is not in
the watchlist
- stall, otherwise
8/10/2019 162 Consistency
21/22
Performance: Execution Time
Traditional fence (T) vs. Address-aware fence (A)
Fence overhead becomes negligible
8/10/2019 162 Consistency
22/22
Further Reading
L. Lamport. How to make a multiprocessor computer that correctly
executes multiprocess program. IEEE Trans. Comput., 28(9):690
691, 1979.
S. V. Adve and K. Gharachorloo. Shared memory consistency
models: A tutorial. IEEE Computer, 29:6676, 1995.D. Shasha and M. Snir. Efficient and correct execution of parallel
programs that share memory. ACM Trans. Program. Lang. Syst.,
10(2):282312, 1988.
Daniel J. Sorin, Mark D. Hill, David A. Wood.A Primer on Memory
Consistency and Cache Coherence. Synthesis Lectures onComputer Architecture, 2011.
C. Lin, V. Nagarajan, and R. Gupta.Address-aware fences. ICS
13, pages 313324, 2013