50
Memory Models In Software and in Hardware Practical Considerations

Memory Models In Software and in Hardware Practical Considerations

  • View
    223

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Memory Models In Software and in Hardware Practical Considerations

Memory Models

In Software and in Hardware

Practical Considerations

Page 2: Memory Models In Software and in Hardware Practical Considerations

Agenda

• Motivation

• Factors

• Levels of Memory Models– Models for software: Java, CLI

– Models for hardware: IA-32, IA-64

Page 3: Memory Models In Software and in Hardware Practical Considerations

MM Motivation and Factors

http://citeseer.nj.nec.com/adve95shared.html

Page 4: Memory Models In Software and in Hardware Practical Considerations

MM Motivation

• Multithreaded programming– Shared memory

• An example: producer/consumer queue

• Does it work correctly?– The program performs the operations in the correct order!

Task t = new Task();

queue.insert(t);

Task t = queue.get();

t.run();

Thread 1 Thread 2

Page 5: Memory Models In Software and in Hardware Practical Considerations

Memory Model Levels

Programmer-LevelModels

Programmer-LevelModels

Implementor-LevelModels (Virtual Machine)

Implementor-LevelModels (Virtual Machine)

Implementor-LevelModels (Hardware)

Implementor-LevelModels (Hardware)

IA-32, IA-64, Alpha, PowerPC, TSO, PSO,

etc.

Java Memory Model (Implementor View),

Microsoft CLI

Java MM, CLI MM, SC, Coherence, Release

Consistency, etc.

Compiler

VM

Page 6: Memory Models In Software and in Hardware Practical Considerations

Factors that Affect MM

• Compiler: performs optimizations

• [Virtual Machine]: yet more optimizations

• Processor: performs operations out of order

• Memory subsystem: delivers updates out of order

Page 7: Memory Models In Software and in Hardware Practical Considerations

MM Factors: Compiler & VM

• Compilers– Store values in registers– Reorder operations

• Example

int x = 0, answer = 0;

void f() { while (!answer) { x = x+1; }}

int x = 0, answer = 0;

void f() { while (!answer) { x = x+1; }}

int x = 0, answer = 0;

void f() { int tmp1 = x; int tmp2 = answer; while (!tmp2) { tmp1 = tmp1+1; } x = tmp1;}

int x = 0, answer = 0;

void f() { int tmp1 = x; int tmp2 = answer; while (!tmp2) { tmp1 = tmp1+1; } x = tmp1;}

No read from memory

No write to memory

Held in register all the time

Page 8: Memory Models In Software and in Hardware Practical Considerations

MM Factors: Processor

• Includes a lot of features that help it tolerate memory latency– Most of them change the order of memory operations

• Examples– Out-of-order execution : The most important

performance-enabler of modern processors

– Write combining : Reads/writes to the same cache line

– Read/write buffers

– Many more

Page 9: Memory Models In Software and in Hardware Practical Considerations

MM Factors: Memory Subsystem

• Hardware– Cache Coherence Protocols

• Software– DSM Coherence Protocols

Page 10: Memory Models In Software and in Hardware Practical Considerations

The TradeoffThe more optimizations are there in the system, the less transparent it is to the programmer

Sequential Consistency Any Order

Transparency Perfo

rman

ce

Page 11: Memory Models In Software and in Hardware Practical Considerations

Programmer View Models

Java – Original specification

Java – New specification

Microsoft’s CLI (.NET) specification

Page 12: Memory Models In Software and in Hardware Practical Considerations

Java MM – Original Spec

• Java Language Specification, Chapter 17 http://java.sun.com/docs/books/jls/

• A. Gontmakher, A. Schuster, ACM TOCS, vol. 18, No. 4, pp. 333-386 http://www.cs.technion.ac.il/~assaf/publications/java.ps

• Defines an abstract virtual machine– Really hard to understand– Non-compliant implementation by SUN (!!!)– Many other problems

Page 13: Memory Models In Software and in Hardware Practical Considerations

Java MM: Motivation

• Built-in synchronization– Modeled after monitors– Integrated with memory model

• Performance: Avoid synchronization– Immutable objects

Page 14: Memory Models In Software and in Hardware Practical Considerations

Java MM: The Abstract ModelThread 1

Local memory

Executionengine

Executionengine

Thread 2

Local memory

Executionengine

Executionengine

Main memory

useuse assignassign

loadload storestore

readread writewrite

useuse assignassign

loadload storestore

readread writewrite

Page 15: Memory Models In Software and in Hardware Practical Considerations

Java MM: The Constraints

read x,v load x,v use x,vassign x,v store x,v write x,v

read x,v load x,vwrite x,v store x,v

load x,v use x,v

store x,v assign x,v … and more

Thread 1

Local memory

Executionengine

Executionengine

Main memory

useuseassignassign

loadloadstorestore

readread writewrite

Not always(Prescient Stores)

Page 16: Memory Models In Software and in Hardware Practical Considerations

Java MM: Applying The Modelx==1y==1y=1 x=1

read y,1 read x,1

load y,1 load x,1

use y,1 use x,1

assign x,1 assign y,1

store x,1 store y,1

write x,1 write y,1

Page 17: Memory Models In Software and in Hardware Practical Considerations

Java MM: How To Deal With

• Determine the dependencies between use/assigns that follow from the constraints

• Then, ignore all the operations except for use/assigns

• Non-Operational Model!

Page 18: Memory Models In Software and in Hardware Practical Considerations

Java MM - Views

use/assign

load/store

read/write

use/assign

load/store

read/write

Programmer View(non-operational)

Implementor View(non-operational)

Program

mer V

iew(operational)Implementor View

(operational)

Page 19: Memory Models In Software and in Hardware Practical Considerations

Java MM: Characterizations

• Java is stronger than Coherence– Proof below

• Volatile variables: Sequential Consistency

• Locks: variant of Release Consistency– Semantics of locks not SC or PC (and not stated

explicitly at all).

Page 20: Memory Models In Software and in Hardware Practical Considerations

Java MM – Characterizations 2• Full definition: regular variables

– Based on Legal Serialization. Constraints:

– Excludes Prescient Stores– Proof: 5+ pages

r x,vw y,w

r/w xr/w x

Legend:Sees a value written by another thread

Same Variable rule

Transistor rule

Page 21: Memory Models In Software and in Hardware Practical Considerations

Java MM – Characterizations 3• Java: full definition (regular variables only)

– Constraints:

– Includes Prescient Stores– Proof: 20+ pages!– Coherence follows from the first Constraint

r x,vr y,1r y,2w y,w

r x,vw y,1r y,2w y,w

r x,v

w y,2wy,w

r/w xr/w x

Legend:Writes a value seen by another thread

Page 22: Memory Models In Software and in Hardware Practical Considerations

Java MM – Coherence Proof 1:Java is not weaker than Coherence

• Take operations for variable X from all threads.

• Divide each thread into blocks:

load-block: load (use)*

store-block: assign (use|assign) store (use)*

• Each block: one load/store operation.

• Sort the blocks by their memory accesses.

• Result: legal serialization of use/assigns to X.

Page 23: Memory Models In Software and in Hardware Practical Considerations

Java MM – Coherence Proof 2:Java is stronger than Coherence

• Coherence: easily shown

• Java (without Prescient Stores):– Transistor Rule: 1.1 1.2, 2.1 2.2– Legal Serialization: 2.2 1.1, 2.1 1.2– Cycle of dependencies!

Thread 1 Thread 2

1 use x,1 1 use y,12 assign y,1 2 assign x,1

Page 24: Memory Models In Software and in Hardware Practical Considerations

Java MM – Coherence Proof 3Prescient Stores

• A store can move presciently up– Before its corresponding assign– But not before another load/store

• The previous execution now valid– But it can still be fixed…

Thread 1read x,1read y,0read y,2write y,1

Thread 2read y,1read x,0read x,2write x,1

Thread 3write x,2write y,2

Necessarily has a load

The store, even prescient, now

cannot move up

Page 25: Memory Models In Software and in Hardware Practical Considerations

Java MM: Conclusions

• Programming with Locks: easy

• Programming with volatile variables: easy

• Programming with regular variables:– Using just Coherence – OK– Using full definition – hard– Really accounting for Prescient Stores -

nightmare

Page 26: Memory Models In Software and in Hardware Practical Considerations

New Java MM

In process, by Bill Pugh et. al.

http://www.javasoft.com/aboutJava/communityprocess/jsr/jsr_133.html

http://www.cs.umd.edu/~pugh/java/memoryModel/semantics.pdf

Page 27: Memory Models In Software and in Hardware Practical Considerations

New Java VM: Motivation

• Correctly synchronized programs must have SC semantics

• Incorrectly synchronized programs must have (safe) semantics– Safety: JVM must never fail– Security: Prevent attacks based on

unsynchronized code

Page 28: Memory Models In Software and in Hardware Practical Considerations

New Java MM: Requirements

• Backward Compatibility– No new language constructs– No new VM instructions– No system-specific artifacts, e.g. garbage collection

• Clear Distinction between compiler and VM– No optimizations in the compiler– Thus, VM model is the same as the one visible to the

programmer

• Implementability– No unrealistic requirements on software or hardware

Page 29: Memory Models In Software and in Hardware Practical Considerations

New Java VM: The Approach

• Exact semantics for all memory accesses– Not really relevant– Except that SC for Properly Labelled (no data

races) programs can be shown

• Semantics for support of established idioms– Final fields– Volatile variables– Locks

• Quite practical

Page 30: Memory Models In Software and in Hardware Practical Considerations

New Semantics of FinalImmutable objects

• Many objects in Java are designed to be immutable– Rationale: avoiding synchronization– Best known example – java.lang.String

• The problem: String not really immutable– Can see writes to the buffer, but not to the

length and offset!

• Security hole

Page 31: Memory Models In Software and in Hardware Practical Considerations

New Semantics of FinalFixing immutable objects

• Solution 1: Make ALL String methods synchronized– Serious hit at performance– Not needed on single-processor machines

• Solution 2: Extending semantics of final fields– Access that reads a final field, sees it initialized– An object must not escape the constructor

• Problem: String: array elements cannot be final– “weak acquire semantics”: reads dependent on the final

field are seen initialized too

Page 32: Memory Models In Software and in Hardware Practical Considerations

New Semantics for Volatile

• Previously: Sequential Consistency– But: no relation with the regular operations– Not really useful for synchronization (recall the

producer/consumer example)

• Now: Acquire/Release Semantics– Read works as Acquire– Write works as Release

Page 33: Memory Models In Software and in Hardware Practical Considerations

New Semantics of VolatileDouble-Checked Locking

• An object s must be created first time it is requestedsynchronized(s) { if (s==null) s = new S(); }– Slow! Locking on each access

• Double-Checking:if (s==null) { synchronized(this)

if (s==null) s = new S(); }

• The reader can reorder access to s and to its fields

• But, if s is volatile, it works!

Page 34: Memory Models In Software and in Hardware Practical Considerations

New Semantics of VolatileAdvanced Double-Checking

static volatile boolean initialized = false;

if (!initialized) {synchronized(this) {

if (!initialized) {s1 = new S();s1.connect(…);initialized = true;

}}}

Final fields won’t help

Page 35: Memory Models In Software and in Hardware Practical Considerations

New Semantics of Locks

• Only locks on the same variable have acquire/release semantics– Simplifies implementation– Different locks do not synchronize anyway, so no

need for acquire

• In original spec, each lock is a memory barrier– Even synchronized(new Object()) {}– Compiler cannot safely remove locks– In the new semantics, recursive locks are no-op

Page 36: Memory Models In Software and in Hardware Practical Considerations

CLI Memory Model

The VM for Microsoft’s .NET

http://www.ecma.ch/ecma1/STAND/ecma-335.htm

Standard ECMA-335, Common Language Infrastructure

Chapter 11.6, Memory Model and Optimizations

Page 37: Memory Models In Software and in Hardware Practical Considerations

CLI Memory Model

• So Short!!! Just 4 pages• The system

– Flat shared memory– Threads access the same memory

• Any reordering of operations is permitted– Except volatile reads/writes– Except synchronous exceptions

• Atomic access defined for some operations• Threading APIs define synchronization semantics

Page 38: Memory Models In Software and in Hardware Practical Considerations

CLI: Volatile Consistency

• Volatile reads and writes– Accesses to volatile variables– Explicit methods: Thread.VolatileRead,

Thread.VolatileWrite– Thread.MemoryBarrier – same as both VolatileRead

and VolatileWrite

• Volatile read – acquire semantics, volatile write – release semantics

• Different threads can see different orders of volatile writes of different threads

Page 39: Memory Models In Software and in Hardware Practical Considerations

CLI: Locks

• Usual locking semantics: obtaining and releasing locks– Synchronized methods– System.Threading.Monitor class – simulates

C.A.R. Hoare’s monitor (only tries to; simulation is no more complete than in Java)

• Acquiring lock has acquire semantics, releasing – release semantics

Page 40: Memory Models In Software and in Hardware Practical Considerations

CLI: Atomic Memory Accesses

• Word-length accesses, aligned 4-byte accesses are atomic

• System.Threading.Interlocked: atomic read-modify-write operations– Increment, Decrement, Exchange,

CompareExchange

• One and Two-byte reads are atomic. Byte writes may write the whole word

Page 41: Memory Models In Software and in Hardware Practical Considerations

Conclusions: Using CLI

• All concurrent accesses might be synchronized using synchronized methods or Monitor class

• Volatile variables: no common order. Probably usable in the simplest cases– Designed for accessing hardware registers. There it fits

• Atomic memory access: no memory barrier semantics– Probably just forgotten

– Useful in some simple cases

Page 42: Memory Models In Software and in Hardware Practical Considerations

Conclusions: Implementing CLI

• Lots of disclaimers in the spec – no unimplementable requirements. Thus, implementation is straightforward– For instance, Alpha has no instruction to write a

byte – implementation of atomic write would be problematic. Java has this problem

• From the other hand, all low-level mechanisms are present (Interlocked)

Page 43: Memory Models In Software and in Hardware Practical Considerations

Conclusions: JVM vs. CLI• Similar semantics for locks

– Except that in Java, nested locks are no-op, thus locks can be eliminated by the compiler

– In Java, acquire/release happens only if synchronizing on same lock object. In CLI – full acquire/release.

• Similar semantics for volatiles– Except that volatiles consistency is weaker. It is unclear if

the Double Checked Locking idiom should work

• Similarly unusable semantics for regular variables– Except for Java’s provisions for object construction

(semantics of volatile fields)

• Adds low-level interlocked accesses

Page 44: Memory Models In Software and in Hardware Practical Considerations

Hardware Memory Models

IA-64 and IA-32

Page 45: Memory Models In Software and in Hardware Practical Considerations

IA-32

• Memory reads: acquire semantics– Except that reads can see local writes early; see

below

• Memory writes: release semantics– Except that there is no global order of writes;

see below

• Interlocked memory accesses: using processor lock prefix

Page 46: Memory Models In Software and in Hardware Practical Considerations

IA-64: Memory Accesses

• Regular memory accesses – unordered

• Attributes to memory accesses: release or acquire– Acquire: ld.acq instruction– Release: st.rel instruction

• Memory Fence (mf)– AKA Memory Barrier, is both acquire and

release.

Page 47: Memory Models In Software and in Hardware Practical Considerations

IA-64: Atomic Accesses

• CMPXCHG (Compare and Exchange)– Compare memory with a given value. Exchange

if not equal– Can have either acquire (cmpxchg.acq) or

release (cmpxchg.rel) semantics

• FAA (fetch and add)– Also acquire or release semantics

• XCHG (Exchange)– Only acquire semantics

Page 48: Memory Models In Software and in Hardware Practical Considerations

IA-64: Semantics of ld.acq, st.rel

• Constraints:– Acquire >> X Acquire X

– X >> Release X Release

– Fence >> X Fence X

– X >> Fence X Fence

• Global order of all the strong write operationsT1 T2 T3 T4

st.rel [x]=1 ld.acq r1=[x] st.rel [y]=1 ld.acq r3=[y]

ld r2=[y] ld r4=[x]

Program order

Forbidden: r1=1, r3=1, r2=0, r4=0

Execution order

Page 49: Memory Models In Software and in Hardware Practical Considerations

IA-64 Semantics: Exceptions

• Load may see value from store buffer

• Inserting mf between st.rel and ld.acq solves the problem

• But: in Java semantics, this execution is OK!

T1 T2

st.rel [x]=1 st.rel [y]=1

ld.acq r1=[x] ld.acq r3=[y]

ld r2=[y] ld r4=[x]

Permitted: r1=1, r3=1, r2=0, r4=0

Page 50: Memory Models In Software and in Hardware Practical Considerations

IA-64 Semantics: Conclusion

• Simple. Clean

• Very usable: direct mapping to both Java and CLI memory models– Especially fits the new Java Memory Model (or

more reasonably, the new Java Memory Model especially fits IA-64 ;)

• IA-32: Obviously developed before MP systems became common (for Intel processors)– Cannot change architecture now