Memory Consistency Arbob Ahmad, Henry DeYoung, Rakesh Iyer 15-740/18-740: Recent Research in Architecture October 14, 2009

  • View
    212

  • Download
    0

Embed Size (px)

Text of Memory Consistency Arbob Ahmad, Henry DeYoung, Rakesh Iyer 15-740/18-740: Recent Research in...

  • Slide 1
  • Memory Consistency Arbob Ahmad, Henry DeYoung, Rakesh Iyer 15-740/18-740: Recent Research in Architecture October 14, 2009
  • Slide 2
  • Memory Model = Instruction Reordering + Store Atomicity Arvind and Jan-Willem Maessen Memory consistency models exist to describe and constrain the behavior of [memory systems] Gives a unifying framework for SC and relaxed models with an atomic memory
  • Slide 3
  • Instruction Reordering vs. Store Atomicity Instruction reordering rules: Consistency within a thread e.g.: Store atomicity rules: Ordering which must exist in every serialization Consistency across threads
  • Slide 4
  • Store Atomicity 1.Predecessor Stores of a Load are ordered before its source. x 2 x 2 x 1
  • Slide 5
  • Store Atomicity 1.Predecessor Stores of a Load are ordered before its source. 2.Successor Stores of a Store are ordered after its observers. x 2 x 2 x 1
  • Slide 6
  • Store Atomicity 1.Predecessor Stores of a Load are ordered before its source. 2.Successor Stores of a Store are ordered after its observers. 3.Mutual ancestors of Loads are ordered before the mutual successors of the distinct Stores they observe. ?
  • Slide 7
  • Thread A Thread B Thread C x 1 Fence y 2 y 4 y 2 Fence z 6 y 4 Fence z 6 Fence x 8 x ? Local ordering constraints
  • Slide 8
  • Thread A Thread B Thread C x 1 Fence y 2 y 4 y 2 Fence z 6 y 4 Fence z 6 Fence x 8 x ? Observation constraints
  • Slide 9
  • Thread A Thread B Thread C x 1 Fence y 2 y 4 y 2 Fence z 6 y 4 Fence z 6 Fence x 8 x ? Question: Are there any ordering constraints not represented?
  • Slide 10
  • Thread A Thread B Thread C x 1 Fence y 2 y 4 y 2 Fence z 6 y 4 Fence z 6 Fence x 8 x ? Question: Are there any ordering constraints not represented? y 2 : y 2 : y 4 : y 4 y 4 : y 4 : y 2 : y 2 Order is or
  • Slide 11
  • Thread A Thread B Thread C x 1 Fence y 2 y 4 y 2 Fence z 6 y 4 Fence z 6 Fence x 8 x ? y 2 : y 2 : y 4 : y 4 y 4 : y 4 : y 2 : y 2 Order is or x 1 must precede both y 2 and y 4 z 6 must follow both y 2 and y 4
  • Slide 12
  • Thread A Thread B Thread C x 1 Fence y 2 y 4 y 2 Fence z 6 y 4 Fence z 6 Fence x 8 x ? Store atomicity constraint
  • Slide 13
  • Sequential Consistency Programmer's gold standard Question: How can we have the clarity of SC without sacrificing performance?
  • Slide 14
  • Improving the Performance of SC Key Idea: Rather than turning the switch at individual memory access boundaries, do it only at chunk boundaries.
  • Slide 15
  • This is the topic of: BulkSC: Bulk Enforcement of Sequential Consistency Luis Ceze, James Tuck, Pablo Montesinos, and Josep Torrellas Mechanisms for Store-wait-free Multiprocessors Thomas Wenisch, Anastasia Ailamaki, Babak Falsafi, and Andreas Moshovos
  • Slide 16
  • Coarse Grain Enforcement of SC Similar to tasks in TLS and transactions in TM But, chunks are created dynamically by hardware; tasks and transactions are specified statically in code
  • Slide 17
  • Common Ground Dynamically divide the program into chunks or atomic sequences ASO begins an atomic sequence when an ordering constraint would stall instruction retirement. BulkSC assumes chunks are around 1000 instructions. Re-ordering allowed within chunks/atomic sequences. Updates not visible until the commit. Evaluated on a full system simulator (Simics/Flexus)
  • Slide 18
  • Bulk SC: Bulk Enforcement of Sequential Consistency Chunk executes, updates L1 Commit Made, R,W Signatures broadcast Bulk Disambiguator computes intersection - Restart computation if non empty Computes minimum serialization requirement. Enables BulkSC on machines without broadcast capabilites
  • Slide 19
  • Atomic Store Ordering Scalable Store Buffer Eliminates store buffer capacity related stalls. No associative lookup required. ASO Implementation Eliminates ordering related stalls. Atomic sequence tracking. Detecting atomicity violations. Rollback on violation. Commit atomic sequences.
  • Slide 20
  • Bulk SC Performance Results ASO More realistic workloads
  • Slide 21
  • Open Research Questions in Memory Consistency Memory model framework was descriptive. What are the prescriptive consequences? Can the big-step semantics of transactions be explained with small-step framework? Can the same hardware in a single system be used for all of coarse-grain SC, TLS, and TM? ...
  • Slide 22
  • Thank you!
  • Slide 23
  • Slide 24
  • Extra Slides
  • Slide 25
  • x 1 Fence y 2 y 3 y 3 Fence x 4 x ? Thread AThread B Local ordering constraints
  • Slide 26
  • x 1 Fence y 2 y 3 y 3 Fence x 4 x ? Thread AThread B Observation constraint
  • Slide 27
  • x 1 Fence y 2 y 3 y 3 Fence x 4 x ? Thread AThread B Question: We need one more edge to capture the ordering. Where should it go?
  • Slide 28
  • Moral: When a store is observed to have been overwritten, the stores must be ordered. Thread AThread B Store atomicity constraint