18
A Behavioral Memory Model for the UPC Language Kathy Yelick Joint work with: Dan Bonachea, Jason Duell, Chuck Wallace

A Behavioral Memory Model for the UPC Language Kathy Yelick Joint work with: Dan Bonachea, Jason Duell, Chuck Wallace

Embed Size (px)

Citation preview

Page 1: A Behavioral Memory Model for the UPC Language Kathy Yelick Joint work with: Dan Bonachea, Jason Duell, Chuck Wallace

A Behavioral Memory Model for the

UPC Language

Kathy Yelick

Joint work with: Dan Bonachea, Jason Duell, Chuck Wallace

Page 2: A Behavioral Memory Model for the UPC Language Kathy Yelick Joint work with: Dan Bonachea, Jason Duell, Chuck Wallace

UPC Meeting, SC03: Consistency Models 2Oct 17, 2003

Proposal for UPC Spec

• Replace wording in body of spec with prose that:• Defines a data race as:

• Two concurrent memory operations from two different threads to the same memory location in which at least one is a write.

• Defines a race-free program as one in which:• All executions of the program are free of data races (would be

nice if the user could only worry about naïve implementations)

• And states that programs will behave as if all operations from each thread execute in order if one of the following holds:• The program is race-free• The program contains no relaxed operations

• Refers readers to an appendix for programs with races

Page 3: A Behavioral Memory Model for the UPC Language Kathy Yelick Joint work with: Dan Bonachea, Jason Duell, Chuck Wallace

UPC Meeting, SC03: Consistency Models 3Oct 17, 2003

Formalism

• The appendix (or later section) of the language references would contain something akin to the following formalism

• This can be done in a page or 2

• In addition it would refer to an extended report on:• An operational (“state machine”) model of semantics• A study of various optimizations techniques and whether or not

they are correct• Caching (when to flush, problems with not flushing)• Reordering by the compiler (should be allowed on relaxed

operations as long as there are no dependencies)• Use of non-blocking operations or weak hw models (+fences)

Page 4: A Behavioral Memory Model for the UPC Language Kathy Yelick Joint work with: Dan Bonachea, Jason Duell, Chuck Wallace

UPC Meeting, SC03: Consistency Models 4Oct 17, 2003

Behavioral Approach

• Problems with operations specifications• Implicit assumptions about implementation strategy (e.g., caches)• May unnecessarily restrict implementations • Intuitive in principle, but complicated in practice

• A Behavioral Approach• Based on partial and total orders• Using Sequential Consistency definition as model

• Processor order defines a total order on each thread• Their union defines a partial order• 9 a consistent total order that is correct as a serial execution

•P0

•P1

Page 5: A Behavioral Memory Model for the UPC Language Kathy Yelick Joint work with: Dan Bonachea, Jason Duell, Chuck Wallace

UPC Meeting, SC03: Consistency Models 5Oct 17, 2003

Some Basic Notation

• The set of operations is • Ot = the set of operations issued by thread t

• The set of memory operations is:• M = {m0, m1, …}• Mt = the set of memory operations from thread t

• Each memory operations has properties• Thread(mi) is the thread that executed the operation• Location(mi) is the memory location involved

• Memory operations are partitioned into 6 sets, given by• S = Strict, R=Relaxed, P=Private• W=Write, R=Read (in the 2nd position)• Some useful groups: Strict(M) = SW(M) [ SR(M) W(M) = SW(M) [ RW(M) [ PW(M)

Page 6: A Behavioral Memory Model for the UPC Language Kathy Yelick Joint work with: Dan Bonachea, Jason Duell, Chuck Wallace

UPC Meeting, SC03: Consistency Models 6Oct 17, 2003

Compiler Assumption

• For specification purposes, assume the code is compiled by a naïve compiler in to ISO C machine• Real compilers may do optimizations

• E.g., reorder, remove, insert memory operations• Even strict operations may be reordered with sufficient

analysis (cycle detection)• These must produce an execution whose input/output and volatile

behavior is identical to that of an unoptimized program (ISO C)

Page 7: A Behavioral Memory Model for the UPC Language Kathy Yelick Joint work with: Dan Bonachea, Jason Duell, Chuck Wallace

UPC Meeting, SC03: Consistency Models 7Oct 17, 2003

Orderings on Strict Operations

Threads must agree on an ordering of:

• For pairs of strict accesses, it will be total:

• For a strict/relaxed pair on the same thread, they will all see the program order

Page 8: A Behavioral Memory Model for the UPC Language Kathy Yelick Joint work with: Dan Bonachea, Jason Duell, Chuck Wallace

UPC Meeting, SC03: Consistency Models 8Oct 17, 2003

Orderings on Local Operations

• Conflicting accesses have the usual definition

• Given a serial execution S = [o1,…on] defining <S let St be the subsequence of operations issued by t

• S conforms to program order for thread t iff:• St is consistent with the program text for t (follows control flow)

• S conforms to program dependence order for t iff 9 a permutation P(S) such that:• P(S) conforms to program order for t

• 8 (m1, m2) 2 Conflicting(M) m1 <S m2 , m1 <P(S) m2

•This is a bit too strong on anti-dependencies

Page 9: A Behavioral Memory Model for the UPC Language Kathy Yelick Joint work with: Dan Bonachea, Jason Duell, Chuck Wallace

UPC Meeting, SC03: Consistency Models 9Oct 17, 2003

UPC Consistency

An execution on T threads with memory ops M is UPC consistent iff:

•9 a partial <strict that orients all pairs in allStrict(M)

• And for each thread t 9 a total order <t on Ot [ W(M) [ SR(M)• <t is consistent with <strict

• All threads agree on ordering of strict operations

• <t conforms to program dependence order

• Local dependencies are observed

• <t is a correct execution

• Reads return most recent write values

Page 10: A Behavioral Memory Model for the UPC Language Kathy Yelick Joint work with: Dan Bonachea, Jason Duell, Chuck Wallace

UPC Meeting, SC03: Consistency Models 10Oct 17, 2003

Intuition on Strict Oderings

• Each thread may “build” its own total order to explain behavior

• They all agree on the strict ordering shown above in black, but• Different threads may see relaxed writes in different orders

• Allows non-blocking writes to be used in implementations• Each thread sees own dependencies, but not those of other threads

• Weak, but otherwise there would place consistency requirements on some relaxed operations (e.g., local cache control insufficient)

• Preserving dependencies requires usual compiler/hw analysis

•P0

•P1

Page 11: A Behavioral Memory Model for the UPC Language Kathy Yelick Joint work with: Dan Bonachea, Jason Duell, Chuck Wallace

UPC Meeting, SC03: Consistency Models 11Oct 17, 2003

Synchronization Operations

• UPC has both global and pairwise synchronization

• In addition to the synchronization properties, they also have memory model implications:• Locks

• upc_lock is a strict read• upc_unlock is a strict write

• Barriers (which may be split-phase)• upc_notify (begin barrier) is a strict write• upc_wait (end of barrier) is a strict read• upc_barrier = upc_notify; upc_wait

Page 12: A Behavioral Memory Model for the UPC Language Kathy Yelick Joint work with: Dan Bonachea, Jason Duell, Chuck Wallace

UPC Meeting, SC03: Consistency Models 12Oct 17, 2003

Alternative Models

• As specified, two relaxed writes to the same location may be viewed differently by different processors• Nothing to force eventual consistency (likely in implementations)• May add this to barrier points, at least• So far it looks ad hoc

• Adding directionality to reads/writes seems reasonable• Strict reads “fence” things that follow• Strict writes “fence” things that precede• Simply replace for StrictOnThreads definition

• Support user-defined synchronization primitive built from strict operations

Page 13: A Behavioral Memory Model for the UPC Language Kathy Yelick Joint work with: Dan Bonachea, Jason Duell, Chuck Wallace

UPC Meeting, SC03: Consistency Models 13Oct 17, 2003

Some Bizarre Behavior

• The following “out of thin air” behavior:• Given shared variables x&y, where x&y are initially 0

t0: r1 = x y = r1

t1: r2 = y x = r2

• x and y end with 42 (or any other arbitrary value)

• How does this happen?

• t0 speculates that x is 42 and writes that value to y

• t1 sees 42 in y and writes it into x

• this validates t0’s speculative read

Page 14: A Behavioral Memory Model for the UPC Language Kathy Yelick Joint work with: Dan Bonachea, Jason Duell, Chuck Wallace

UPC Meeting, SC03: Consistency Models 14Oct 17, 2003

Atomicity Issues

• Atomicity: Is there a word size (or type) such that• A write of anything larger is defined as a set of word-sized

operations (so a user might see a partial update)• E.g., is writing a struct the same as writing each field (or some

maximum size)

• Tearing: Is there a word size (or type) such that• Can two writes to the same location result in a merged value?

• Clobbering: Is there a word size (or type) such that• If something smaller is written, it might clobber writes to a

neighboring value • E.g., two processors write to two consecutive bytes in an array,

the processor does a read-modify-write for each, one can be lost

• Conflicts: on what size are these defined?

Page 15: A Behavioral Memory Model for the UPC Language Kathy Yelick Joint work with: Dan Bonachea, Jason Duell, Chuck Wallace

UPC Meeting, SC03: Consistency Models 15Oct 17, 2003

UPC Bulk Operation Semantics

• Are upc_memput, upc_memget, upc_memcpy relaxed or strict?• If relaxed, then the user can get strict behavior by putting a strict

operation (or operations in the nonsymmetric case) before and after

• Will this be surprising to users?• What do current implementations do?

Page 16: A Behavioral Memory Model for the UPC Language Kathy Yelick Joint work with: Dan Bonachea, Jason Duell, Chuck Wallace

UPC Meeting, SC03: Consistency Models 16Oct 17, 2003

UPC Fence Operations

• Should UPC have separate functions for:• read fence: prevents memory operations from moving before it • write fence: prevents memory operations from moving after it

• Or let the programming build these by doing a stricture read/write to some otherwise unused variable?

Page 17: A Behavioral Memory Model for the UPC Language Kathy Yelick Joint work with: Dan Bonachea, Jason Duell, Chuck Wallace

UPC Meeting, SC03: Consistency Models 17Oct 17, 2003

Future Plans

• Show that various implementations satisfy this spec• Use of non-blocking writes for relaxed writes with write

fench/synch at strict points• Compiler-inserted prefetching of relaxed reads• Compiler-inserted “message vectorization” to aggregate a set of

small operations into one larger one• A software caching implementation with cache flushes at strict

points

• Develop an operational model and show equivalence (or at least that it implements the spec)

• Define the data unit of atomicity• Fundamental unit of interleaving, Data tearing, Conflicts

Page 18: A Behavioral Memory Model for the UPC Language Kathy Yelick Joint work with: Dan Bonachea, Jason Duell, Chuck Wallace

UPC Meeting, SC03: Consistency Models 18Oct 17, 2003

Properties of UPC Consistency

• A program containing only strict operations is sequentially consistent

• A program that produces only race-free executions is sequentially consistent• A UPC consistent execution of a program is race-free if for all

threads t and all enabling orderings <t

• For all potential races:

• If m1<t m2 then 9 synchronization operations o1, o2 such that m1<t o1<t o2<t m2 and Thread(o1) = Thread(m1) and Thread(o2) = Thread (m2) and either• o1 is upc_notify and o2 is upc_wait or• o1 is upc_unlock and o2 is upc_lock on the same lock variable