CS 162 Memory Consistency Models. Memory operations are reordered to improve performance Hardware...

CS 162Memory Consistency Models

Memory operations are reordered to improve performance

Hardware (e.g., store buffer, reorder buffer)Compiler (e.g., code motion, caching value in register)

Behave the same as long as dependences are respected

Reordering in Uniprocessors

a1: St x

a2: Ld y

a1: St x≡

counter-intuitive program behavior

Reordering in Multiprocessors

Initially x=y=0

(Rx=1, Ry =1)

(Rx=1, Ry =0)

(Rx=0, Ry =0)

b1: Ry = y;

b2: Rx = x;

a1: x = 1;

a2: y = 1;

b2: Rx = x;

a1: x = 1;

a2: y = 1;

b1: Ry = y;

b2: Rx = x;

a1: x = 1;

a2: y = 1;

b1: Ry = y;b1: Ry = y;

b2: Rx = x;

(Rx=0, Ry =1)Intuitively, y=1 x=1

a1: x = 1; b1: Ry = y;

b2: Rx = x;a2: y = 1;

a1: x = 1;

a2: y = 1;

Possible outcomes

p = new A(…) if (flag)

a = p->var;flag = true;

flag is supposed to be set after p is allocated

Initially p=NULL, flag = false

Lock-free algorithms, e.g., Dekker, Peterson

Dekker Algorithm (mutual exclusion)

flag1 = 1; flag2 = 1;if (flag2 == 0) if (flag1 == 0) critical section critical section

Initially flag1 = flag2 = 0

flag1 = 1flag2 == 0

After reordering, both flag1 and flag2 can be 0

St flag1

Ld flag2

Memory Consistency Models

Specify the ordering of loads and stores to different memory locations

Ld Ld, Ld St, St Ld, St St

Contract between hardware, compiler, and programmer

hardware and compiler will not violate the ordering specified

the programmer will not assume a stricter order than that of the model

Memory Consistency Models

Allowed Reordering

Commercial Architecture

Sequential Consistency

None not exist

Total Store Ordering

St Ld x86, SPARC

Relaxed Memory Order

All ARM, PowerPC

Perform

Stronger modelsStronger constraints

Fewer memory

reorderings

Easier to reason

Lower performance

Program

mability

Cache Coherence vs. Memory Model

Cache coherence ensures a consistent view of memory

Guarantees that the update to memory by one processor will be seen by other processors eventually

But, how consistent ?NO guarantees on when an update should be seenNO guarantees on what order of updates should be seen

Cache Coherence vs. Memory Model

Initially A = B = 0

P1 P2 P3 A = 1; while (A != 1) ;

B = 1; while (B != 1) ;

tmp = A ;

tmp = 1? or tmp = 0?

Sequential Consistency (SC)Definition [Lamport]

(1) the result of any execution is the same as if the operations of all processors were executed in some sequential order;(2) the operations of each individual processor appear in this sequence in the order specified by its program.

MEMORY

P1 P3P2 Pn Behave as the repetition:(1) Pick a processor by any

method (e.g., randomly)(2) the processor completes a

load/store operation

SC Example

b1: Ry = y;

b2: Rx = x;

a1: x = 1;

a2: y = 1;

b2: Rx = x;

a1: x = 1;

a2: y = 1;

b1: Ry = y;

b2: Rx = x;

(Rx=0, Ry =0)

a1: x = 1; b1: Ry = y;

b2: Rx = x;a2: y = 1;

a1: x = 1;

a2: y = 1;

b1: Ry = y;

b2: Rx = x;

a1: x = 1;

a2: y = 1;≡

b1: Ry = y;

b2: Rx = x;

a1: x = 1;

a2: y = 1;

b1: Ry = y;

b2: Rx = x;a1: x = 1;

Sequential Consistency (SC)

Simple and intuitive consistent with programmers’ intuition

easy to reason program behavior

However, the simplicity comes at the cost of performance

prevents aggressive compiler optimizations (e.g., load reordering, store reordering, caching value in register)constrains hardware utilization, (e.g., store buffer)

SC Violation

a1: x = 1

a2: y = 1

b1: R1 = y

b2: R2 = x

program order

conflict relation

SC Violation

- A cycle formed by program orders and conflict orders[Shasha and Snir, 1988] e.g., (a2, b1, b2, a1, a2)

- Executing in the order (a2, b1, b2, a1) will produce R1=1, R2=0, which is not an SC outcome

Insert fences to break cycle- a2 can not be executed before a1

Fence Instructions

p = new A(…)

flag = true;

Fence InstructionsOrder memory operations before and after the fence

Inevitable -- building concurrent implementations (e.g., mutual exclusion, queues) [Attiya et. al., POPL’11]

Expensive -- Cilk-5’s THE protocol spends 50% of its time executing a memory fence [Frigo et. al., PLDI’98]

a1: St x

a2: Ld y

Fence1

b1: St y

b2: Ld x

Fence2

Conservativeness of Fences

Inserted statically and conservatively

At time T, a1 and a2 have completed; b1 and b2 only execute after time T.

No cycle is formed at runtime

if (cond) a1: St x

a2: Ld y

b1: St y

b2: Ld x

Fence1 Fence2

a1 is in a conditional branch

Conservativeness of Fences

a1: St *p

a2: Ld x

b1: St x

b2: Ld *q

Fence1 Fence2

p and q may point to the same memory location

Inserted statically and conservatively

No cycle is formed at runtime

Processor-centric Fence

Traditional fence

Processor-centric - unaware of memory accesses in other processors

However, purpose of fences

Prevent memory accesses from being reordered and observed by other processors (i.e., a cycle formed at runtime)

Address-aware Fences

Consider memory locations accessed around fences at runtime

Fences only take effect when there is a cycle about to happen

Detect and Avoid Cycles

Proc 1 Proc 2

a1: …

a2: …

Fence1

b1: …

Fence2

b2: …

How to detect c2 efficiently?

Detect and Avoid Cycles

Proc 1 Proc 2

a1: …

a2: …

Fence1

b1: …

Fence2

b2: …

watchlist

How to detect c2 efficiently?Collecting watchlist for each fence

Completing memory operation checks the watchlist

- bypass, if its address is not in the watchlist

- stall, otherwise

Performance: Execution TimeTraditional fence (T) vs. Address-aware fence (A)

Fence overhead becomes negligible

Further ReadingL. Lamport. How to make a multiprocessor computer that correctly executes multiprocess program. IEEE Trans. Comput., 28(9):690–691, 1979.

S. V. Adve and K. Gharachorloo. Shared memory consistency models: A tutorial. IEEE Computer, 29:66–76, 1995.

D. Shasha and M. Snir. Efficient and correct execution of parallel programs that share memory. ACM Trans. Program. Lang. Syst., 10(2):282–312, 1988.

Daniel J. Sorin, Mark D. Hill, David A. Wood. A Primer on Memory Consistency and Cache Coherence. Synthesis Lectures on Computer Architecture, 2011.

C. Lin, V. Nagarajan, and R. Gupta. Address-aware fences. ICS ’13, pages 313–324, 2013

CS 162 Memory Consistency Models. Memory operations are reordered to improve performance Hardware...

Documents

DS28E80 Gamma Raiation Resistant 1-Wire Memory · The memory of the DS28E80 consists of user memory, administration memory, protection memory, a write buffer, and a ROM ID. Table

Buffer Overflow. Process Memory Organization

Memory Safety and Buffer Overflows - user.eng.umd.edudanadach/Security_Fall_19/02-overflows_19.pdf · Memory Safety and Buffer Overflows (with material from Mike Hicks, Dave Levin

6.Code Generation - JKU · e.g. Intel processor ... byte array in memory, ... 6.Code Generation 6.1 Overview 6.2 The MicroJava VM 6.3 Code Buffer 6.4 Operands 6.5 Expressions

Host Memory Buffer (HMB) based SSD SystemHost Memory Buffer (HMB) based SSD System Forum J-31: PCIe/NVMe Storage Jeroen Dorgelo Mike Chaowei Chen · 2015-8-25

2.1 Buffer Memory Technologyusers.ics.forth.gr/kateveni/534/06a/s21_mem_tech_sp.pdf · kateveni/534 CS-534, 2.1: Memory Technology 8. kateveni/534

On Cognitive Computing · Sensory Buffer Memory (SBM), Short-Term Memory (STM), Conscious-Status Memory (CSM), Long-Term Memory (LTM), and Action-Buffer Memory (ABM), i.e.: CMM SBM

Graphics Hardware Display (CRT, LCD,…) Graphics accelerator Scan controller Video Memory (frame buffer) Display/Graphics Processor CPU/Memory/Disk …

Buffer-On-Board Memory System - A. James Clark School of …blj/papers/isca2012.pdf · Buffer-On-Board Memory System Elliott Cooper-Balis, Paul Rosenfeld, Bruce Jacob University of

Memory Management Virtual Memory - Institute for …ths/a3/vm/virtual_memory.pdf · · 2006-04-06Memory Management Virtual Memory 2/49 Content ... Transitional Lookaside Buffer

CS 241 Section Week #12 (04/22/10). Outline Virtual Memory – Why Virtual Memory – Virtual Memory Addressing – TLB (Translation Lookaside Buffer) – Multilevel

Datorteknik MemoryAcceleration bild 1 Memory Hierarchy –Reasons –Virtual Memory Cache Memory Translation Lookaside Buffer –Address translation –Demand

Buffer-On-Board Memory System 1 Name: Aurangozeb ISCA 2012

Machine-Level Programming Advanced Topics Topics Linux Memory Layout Buffer Overflow

Scalable Logging Algorithm for in-Memory Database … Logging Algorithm for in-Memory Database Systems ... Logger 1 Wait Buffer 1 Logger 2 Wait Buffer 2 Parallel Logger Read(A) Read(B)

A Memory Model for RISC-V...I6: r2 = Ld a WMM allows: r1=0, r2=0 Monolithic memory P1 Reg state Store buffer Inv buffer P2 Reg state Store buffer Inv buffer

6400 6402 Advanced Memory Buffer Datasheet

Lab 2: Buffer Overflows - Computer Science - Wayne … 2: Buffer Overflows Introduction In this lab, you will learn how buffer overflows and other memory vulnerabilities are used to

SAP Memory Management (an Overview). SAP Memory Management SAP Memory areas overview : - SAP Buffer - SAP Roll Memory - SAP Extended Memory - SAP Heap

Efficient Virtual Memory for Big Memory Serversarkapravab/papers/isca13_direct_segmen… · Virtual Memory, Tanslation Lookaside Buffer. 1. INTRODUCTION “Virtual memory was invented