22
CS 162 Memory Consistency Models

CS 162 Memory Consistency Models

  • Upload
    yehudi

  • View
    59

  • Download
    0

Embed Size (px)

DESCRIPTION

CS 162 Memory Consistency Models. Reordering in Uniprocessors. Memory operations are reordered to improve performance Hardware ( e.g. , store buffer, reorder buffer) Compiler ( e.g. , code motion, caching value in register) Behave the same as long as dependences are respected. ≡. a1: St x - PowerPoint PPT Presentation

Citation preview

Page 1: CS 162 Memory Consistency Models

CS 162Memory Consistency Models

Page 2: CS 162 Memory Consistency Models

Memory operations are reordered to improve performance

Hardware (e.g., store buffer, reorder buffer)Compiler (e.g., code motion, caching value in register)

Behave the same as long as dependences are respected

Reordering in Uniprocessors

a1: St x

a2: Ld y

a2: Ld y

a1: St x≡

Page 3: CS 162 Memory Consistency Models

counter-intuitive program behavior

Reordering in Multiprocessors

Initially x=y=0

(Rx=1, Ry =1)

(Rx=1, Ry =0)

(Rx=0, Ry =0)

b1: Ry = y;

b2: Rx = x;

a1: x = 1;

a2: y = 1;

b2: Rx = x;

a1: x = 1;

a2: y = 1;

b1: Ry = y;

b2: Rx = x;

a1: x = 1;

a2: y = 1;

b1: Ry = y;b1: Ry = y;

b2: Rx = x;

(Rx=0, Ry =1)Intuitively, y=1 x=1

a1: x = 1; b1: Ry = y;

b2: Rx = x;a2: y = 1;

P1 P2

a1: x = 1;

a2: y = 1;

Possible outcomes

Page 4: CS 162 Memory Consistency Models

Reordering in Multiprocessors

p = new A(…) if (flag)

a = p->var;flag = true;

P1 P2

flag is supposed to be set after p is allocated

Initially p=NULL, flag = false

counter-intuitive program behavior

Lock-free algorithms, e.g., Dekker, Peterson

Page 5: CS 162 Memory Consistency Models

Dekker Algorithm (mutual exclusion)

Reordering in Multiprocessors

flag1 = 1; flag2 = 1;if (flag2 == 0) if (flag1 == 0) critical section critical section

P1 P2

Initially flag1 = flag2 = 0

flag1 = 1flag2 == 0

After reordering, both flag1 and flag2 can be 0

St flag1

Ld flag2

counter-intuitive program behavior

Page 6: CS 162 Memory Consistency Models

Memory Consistency Models

Specify the ordering of loads and stores to different memory locations

Ld Ld, Ld St, St Ld, St St

Contract between hardware, compiler, and programmer

hardware and compiler will not violate the ordering specified

the programmer will not assume a stricter order than that of the model

Page 7: CS 162 Memory Consistency Models

Memory Consistency Models

Allowed Reordering

Commercial Architecture

Sequential Consistency

None not exist

Total Store Ordering

St Ld x86, SPARC

Relaxed Memory Order

All ARM, PowerPC

Low

High

Perform

ance

Stronger modelsStronger constraints

Fewer memory

reorderings

Easier to reason

Lower performance

High

Low

Program

mability

Page 8: CS 162 Memory Consistency Models

Cache Coherence vs. Memory Model

Cache coherence ensures a consistent view of memory

Guarantees that the update to memory by one processor will be seen by other processors eventually

But, how consistent ?NO guarantees on when an update should be seenNO guarantees on what order of updates should be seen

Page 9: CS 162 Memory Consistency Models

Cache Coherence vs. Memory Model

Initially A = B = 0

P1 P2 P3 A = 1; while (A != 1) ;

B = 1; while (B != 1) ;

tmp = A ;

tmp = 1? or tmp = 0?

Page 10: CS 162 Memory Consistency Models

Sequential Consistency (SC)Definition [Lamport]

(1) the result of any execution is the same as if the operations of all processors were executed in some sequential order;(2) the operations of each individual processor appear in this sequence in the order specified by its program.

MEMORY

P1 P3P2 Pn Behave as the repetition:(1) Pick a processor by any

method (e.g., randomly)(2) the processor completes a

load/store operation

Page 11: CS 162 Memory Consistency Models

SC Example

b1: Ry = y;

b2: Rx = x;

a1: x = 1;

a2: y = 1;

b2: Rx = x;

a1: x = 1;

a2: y = 1;

b1: Ry = y;

b1: Ry = y;

b2: Rx = x;

(Rx=0, Ry =0)

a1: x = 1; b1: Ry = y;

b2: Rx = x;a2: y = 1;

P1 P2

a1: x = 1;

a2: y = 1;

b1: Ry = y;

b2: Rx = x;

a1: x = 1;

a2: y = 1;≡

b1: Ry = y;

b2: Rx = x;

a1: x = 1;

a2: y = 1;

a2: y = 1;

b1: Ry = y;

b2: Rx = x;a1: x = 1;

Page 12: CS 162 Memory Consistency Models

Sequential Consistency (SC)

Simple and intuitive consistent with programmers’ intuition

easy to reason program behavior

However, the simplicity comes at the cost of performance

prevents aggressive compiler optimizations (e.g., load reordering, store reordering, caching value in register)constrains hardware utilization, (e.g., store buffer)

Page 13: CS 162 Memory Consistency Models

SC Violation

a1: x = 1

a2: y = 1

b1: R1 = y

b2: R2 = x

program order

conflict relation

SC Violation

- A cycle formed by program orders and conflict orders[Shasha and Snir, 1988] e.g., (a2, b1, b2, a1, a2)

- Executing in the order (a2, b1, b2, a1) will produce R1=1, R2=0, which is not an SC outcome

Insert fences to break cycle- a2 can not be executed before a1

Page 14: CS 162 Memory Consistency Models

Fence Instructions

p = new A(…)

flag = true;

P1

Fence InstructionsOrder memory operations before and after the fence

FENCE

Inevitable -- building concurrent implementations (e.g., mutual exclusion, queues) [Attiya et. al., POPL’11]

Expensive -- Cilk-5’s THE protocol spends 50% of its time executing a memory fence [Frigo et. al., PLDI’98]

Page 15: CS 162 Memory Consistency Models

a1: St x

a2: Ld y

Fence1

b1: St y

b2: Ld x

Fence2

Conservativeness of Fences

Inserted statically and conservatively

T

At time T, a1 and a2 have completed; b1 and b2 only execute after time T.

No cycle is formed at runtime

Page 16: CS 162 Memory Consistency Models

if (cond) a1: St x

a2: Ld y

b1: St y

b2: Ld x

Fence1 Fence2

a1 is in a conditional branch

Conservativeness of Fences

a1: St *p

a2: Ld x

b1: St x

b2: Ld *q

Fence1 Fence2

p and q may point to the same memory location

Inserted statically and conservatively

No cycle is formed at runtime

Page 17: CS 162 Memory Consistency Models

Processor-centric Fence

Traditional fence

Processor-centric - unaware of memory accesses in other processors

However, purpose of fences

Prevent memory accesses from being reordered and observed by other processors (i.e., a cycle formed at runtime)

Page 18: CS 162 Memory Consistency Models

Address-aware Fences

Consider memory locations accessed around fences at runtime

Fences only take effect when there is a cycle about to happen

Page 19: CS 162 Memory Consistency Models

Detect and Avoid Cycles

A1

A2

Proc 1 Proc 2

a1: …

a2: …

Fence1

B1

B2

b1: …

Fence2

b2: …

c1

c2?

How to detect c2 efficiently?

Page 20: CS 162 Memory Consistency Models

Detect and Avoid Cycles

A1

A2

Proc 1 Proc 2

a1: …

a2: …

Fence1

B1

B2

b1: …

Fence2

b2: …

c1

watchlist

c2?

How to detect c2 efficiently?Collecting watchlist for each fence

Completing memory operation checks the watchlist

- bypass, if its address is not in the watchlist

- stall, otherwise

Page 21: CS 162 Memory Consistency Models

Performance: Execution TimeTraditional fence (T) vs. Address-aware fence (A)

Fence overhead becomes negligible

Page 22: CS 162 Memory Consistency Models

Further ReadingL. Lamport. How to make a multiprocessor computer that correctly executes multiprocess program. IEEE Trans. Comput., 28(9):690–691, 1979.

S. V. Adve and K. Gharachorloo. Shared memory consistency models: A tutorial. IEEE Computer, 29:66–76, 1995.

D. Shasha and M. Snir. Efficient and correct execution of parallel programs that share memory. ACM Trans. Program. Lang. Syst., 10(2):282–312, 1988.

Daniel J. Sorin, Mark D. Hill, David A. Wood. A Primer on Memory Consistency and Cache Coherence. Synthesis Lectures on Computer Architecture, 2011.

C. Lin, V. Nagarajan, and R. Gupta. Address-aware fences. ICS ’13, pages 313–324, 2013