22
Copyright © 2006, CS 612 Transactional Memory Architectural Support for a Lock-Free Data Structure Some material borrowed from : Some material borrowed from : Konrad Lai, Microprocessor Technology Konrad Lai, Microprocessor Technology Labs, Intel Labs, Intel Intel Multicore University Research Intel Multicore University Research Conference Conference Dec 8, 2005 Dec 8, 2005 By: Major Bhadauria By: Major Bhadauria

Transactional Memory Architectural Support for a Lock-Free Data Structure

  • Upload
    verity

  • View
    33

  • Download
    0

Embed Size (px)

DESCRIPTION

Transactional Memory Architectural Support for a Lock-Free Data Structure. By: Major Bhadauria. Some material borrowed from : Konrad Lai, Microprocessor Technology Labs, Intel Intel Multicore University Research Conference Dec 8, 2005. Motivation. - PowerPoint PPT Presentation

Citation preview

Page 1: Transactional Memory Architectural Support for a Lock-Free Data Structure

Copyright © 2006, CS 612

Transactional MemoryArchitectural Support for a Lock-Free Data Structure

Transactional MemoryArchitectural Support for a Lock-Free Data Structure

Some material borrowed from :Some material borrowed from :

Konrad Lai, Microprocessor Technology Labs, IntelKonrad Lai, Microprocessor Technology Labs, Intel

Intel Multicore University Research ConferenceIntel Multicore University Research Conference

Dec 8, 2005Dec 8, 2005

By: Major BhadauriaBy: Major Bhadauria

Page 2: Transactional Memory Architectural Support for a Lock-Free Data Structure

2Copyright © 2006, CS 612

MotivationMotivation

Multiple cores face a serious programmability problem Multiple cores face a serious programmability problem Writing correct parallel programs is very difficultWriting correct parallel programs is very difficult

Transactional MemoryTransactional Memory addresses addresses key part of the problem key part of the problem Makes parallel programming easier by simplifying coordinationMakes parallel programming easier by simplifying coordination Requires hardware support for performanceRequires hardware support for performance Implements Implements lock-freelock-free data structures data structures

Page 3: Transactional Memory Architectural Support for a Lock-Free Data Structure

3Copyright © 2006, CS 612

Benefits of Going Lock-FreeBenefits of Going Lock-Free

Priority inversion: lower-priority process is preempted while Priority inversion: lower-priority process is preempted while holding lock needed by a higher-priority processholding lock needed by a higher-priority process

Convoying: A process holding a lock is de-scheduled, Convoying: A process holding a lock is de-scheduled, possibly stopping others programs from progressing possibly stopping others programs from progressing needlessly.needlessly.

Deadlock: Common problem of process A and B both Deadlock: Common problem of process A and B both needing to lock C and D. needing to lock C and D.

Process B has lock for D.Process B has lock for D.

Process A has lock for CProcess A has lock for C

DEADLOCKDEADLOCK

ALLOWS US TO AVOID:ALLOWS US TO AVOID:

Page 4: Transactional Memory Architectural Support for a Lock-Free Data Structure

4Copyright © 2006, CS 612

Problem: Lock-Based SynchronizationProblem: Lock-Based Synchronization

Software engineering problemsSoftware engineering problems Lock-based programs do not composeLock-based programs do not compose Performance and correctness tightly coupledPerformance and correctness tightly coupled Timing dependent errors are difficult to find and debugTiming dependent errors are difficult to find and debug

Performance problemsPerformance problems High performance requires finer grain lockingHigh performance requires finer grain locking More and more locks add more overheadMore and more locks add more overhead

Need a better concurrency model for multi-core softwareNeed a better concurrency model for multi-core software

Lock-based synchronization of shared data access Lock-based synchronization of shared data access Is fundamentally problematicIs fundamentally problematic

Page 5: Transactional Memory Architectural Support for a Lock-Free Data Structure

5Copyright © 2006, CS 612

What is Transactional Memory (TM)?What is Transactional Memory (TM)?

Basic mechanisms:Basic mechanisms: Isolation: Track read and writes, detect when conflicts occurIsolation: Track read and writes, detect when conflicts occur Version management: Record new/old valuesVersion management: Record new/old values Atomicity: Commit new values or abort back to old valuesAtomicity: Commit new values or abort back to old values

Transactional Memory (TM) allows arbitrary multiple memory locations to Transactional Memory (TM) allows arbitrary multiple memory locations to be updated atomically and serially.be updated atomically and serially.

begin_xactionbegin_xaction

A = A – 20A = A – 20

B = B + 20B = B + 20

A = A – BA = A – B

C = C + 20C = C + 20

end_xactionend_xaction

Thread 1Thread 1begin_xactionbegin_xaction

C = C - 30C = C - 30

A = A + 30A = A + 30

end_xactionend_xaction

Thread 2Thread 2Thread 1’s accesses Thread 1’s accesses and updates to A, B, C and updates to A, B, C are atomicare atomic

Thread 2 sees either Thread 2 sees either “all” or “none” of “all” or “none” of Thread 1’s updatesThread 1’s updates

Page 6: Transactional Memory Architectural Support for a Lock-Free Data Structure

6Copyright © 2006, CS 612

Transactional Memory benefitsTransactional Memory benefits

Focus: Multithreaded programmability crisisFocus: Multithreaded programmability crisis Programmability & performanceProgrammability & performance

Allows conservative synchronizationAllows conservative synchronization

Programmer focuses on parallelism & correctness, HW extracts Programmer focuses on parallelism & correctness, HW extracts performanceperformance

Software engineering and composabilitySoftware engineering and composabilityAllows library re-use and composition (locks make this very difficult)Allows library re-use and composition (locks make this very difficult)

Critical for wide multi-core demandCritical for wide multi-core demand

Makes high performance MT programming easierMakes high performance MT programming easier Captures a fundamental, well-known, intuitive “atomic” constructCaptures a fundamental, well-known, intuitive “atomic” construct

Been around for decadesBeen around for decades

Similar to a “critical section” but without its problemsSimilar to a “critical section” but without its problemsNo deadlocks, priority inversion, data races, unnecessary serializationNo deadlocks, priority inversion, data races, unnecessary serialization

Page 7: Transactional Memory Architectural Support for a Lock-Free Data Structure

7Copyright © 2006, CS 612

Software Transactional MemorySoftware Transactional Memory

Software Transactional Memory (1995 until now)Software Transactional Memory (1995 until now) Significant work from Sun, Brown, Cambridge, MicrosoftSignificant work from Sun, Brown, Cambridge, Microsoft

Serious performance limitationsSerious performance limitationsDegrades “common” case of no conflicts/contentionDegrades “common” case of no conflicts/contention

>90% of transactions are no conflicts>90% of transactions are no conflicts

– 90% of critical sections are uncontended: what if all these slowed down by 5X?

Serious deployability limitationsSerious deployability limitationsRelies on special runtime supportRelies on special runtime support

Invasive to applications and librariesInvasive to applications and libraries

Is STM is too slow and too invasive to deploy?Is STM is too slow and too invasive to deploy?Could there be better implementation?Could there be better implementation?

But great to understand complex usage models of the futureBut great to understand complex usage models of the future

Page 8: Transactional Memory Architectural Support for a Lock-Free Data Structure

8Copyright © 2006, CS 612

Herlihy & Moss’ ImplementationHerlihy & Moss’ Implementation

Minimum implementation builds on previous LL/SC structure:Minimum implementation builds on previous LL/SC structure: Loading a shared value to readLoading a shared value to read

Writing to a shared valueWriting to a shared value

Commit/Abort Functions to commit or flush new valuesCommit/Abort Functions to commit or flush new values

Extensions for performance:Extensions for performance: Store copy of old value in cache – reduce bus traffic, latencyStore copy of old value in cache – reduce bus traffic, latency

Store committed data in cache – reduce bus trafficStore committed data in cache – reduce bus traffic

Have a read value that you can later change. – reduce bus trafficHave a read value that you can later change. – reduce bus traffic

Have Validate function to check current status – reduce number of Have Validate function to check current status – reduce number of orphan transactionsorphan transactions

Busy bus signal for transactions - reduce number of abortsBusy bus signal for transactions - reduce number of aborts

Page 9: Transactional Memory Architectural Support for a Lock-Free Data Structure

9Copyright © 2006, CS 612

Herlihy & Moss’ ImplementationHerlihy & Moss’ Implementation

Transactional Memory primitives:Transactional Memory primitives: Load-transactional (LT): reads value of a shared memory location into Load-transactional (LT): reads value of a shared memory location into

a private register.a private register.

Load-transactional-exclusive (LTX): similar to LT, but indicates that Load-transactional-exclusive (LTX): similar to LT, but indicates that likely to be updated.likely to be updated.

Store-transactional (ST): tentatively writes a value to a shared Store-transactional (ST): tentatively writes a value to a shared memory location, but not visible until a successful committal.memory location, but not visible until a successful committal.

TM instructions- commit, abort, validate.TM instructions- commit, abort, validate. Commit makes tentative changes permanent if the transaction’s read Commit makes tentative changes permanent if the transaction’s read

set (locations referenced by LT) has not changed, and no other set (locations referenced by LT) has not changed, and no other process had read any location within the write set (locations process had read any location within the write set (locations referenced by LTX and ST)referenced by LTX and ST)

Abort discards updates to write set.Abort discards updates to write set.

Validate –tests the current transaction status, T-continue, F-AbortValidate –tests the current transaction status, T-continue, F-Abort

Page 10: Transactional Memory Architectural Support for a Lock-Free Data Structure

10Copyright © 2006, CS 612

Example Example

Page 11: Transactional Memory Architectural Support for a Lock-Free Data Structure

11Copyright © 2006, CS 612

Herlihy & Moss’ ImplementationHerlihy & Moss’ Implementation

Hardware Modifications (somewhat)Hardware Modifications (somewhat)

OLD

NEW

Page 12: Transactional Memory Architectural Support for a Lock-Free Data Structure

12Copyright © 2006, CS 612

Hardware Support for PerformanceHardware Support for Performance

A.sum = 100

B.sum = 200

Core 1Core 1begin_xaction

A.withdraw(20)

B.deposit(20)

end_xaction

Architectural Memory stateArchitectural Memory state

A.sum = 100B.sum = 200

1

1

A.sum = 80

B.sum = 220

A.sum = 80

B.sum = 220

1.1. Record recovery stateRecord recovery state2.2. Buffer updates/track accessesBuffer updates/track accesses3.3. Commit if no external access (discard Commit if no external access (discard

all updates if conflict)all updates if conflict)

1

Core 2Core 2

1

begin_xaction

Sum =

A.sum + B.sum

end_xaction

Coherence protocol Coherence protocol for conflictsfor conflicts

Core 2 sees $300 – never $280 or $320Core 2 sees $300 – never $280 or $320

A.sum = 100

B.sum = 200

1

1

Abort & restart

Abort & restart

Page 13: Transactional Memory Architectural Support for a Lock-Free Data Structure

13Copyright © 2006, CS 612

Herlihy & Moss’ BenchmarksHerlihy & Moss’ Benchmarks

Compared against:Compared against: Test-and-test-and-set (TTS) w/ exponential backoffTest-and-test-and-set (TTS) w/ exponential backoff

Software queuing (QOSB mechanism)Software queuing (QOSB mechanism)

LOAD_LINKED/STORE_COND (LL/SC)LOAD_LINKED/STORE_COND (LL/SC)

Page 14: Transactional Memory Architectural Support for a Lock-Free Data Structure

14Copyright © 2006, CS 612

Herlihy & Moss’ Test 1 ResultsHerlihy & Moss’ Test 1 Results

Page 15: Transactional Memory Architectural Support for a Lock-Free Data Structure

15Copyright © 2006, CS 612

Herlihy & Moss’ ImplementationHerlihy & Moss’ Implementation

LimitationsLimitations Transaction size implementation-dependentTransaction size implementation-dependent

Must finish in a single scheduling quantumMust finish in a single scheduling quantum

Locations accessed cannot exceed architecturally specified limitLocations accessed cannot exceed architecturally specified limit

Short duration, small data setsShort duration, small data sets

Requires cache coherence model that’s sequentially consistent Requires cache coherence model that’s sequentially consistent (otherwise may need fences)(otherwise may need fences)

Nested transactions problematicNested transactions problematic

Hence Programmer Must Be Aware of HardwareHence Programmer Must Be Aware of Hardware

Page 16: Transactional Memory Architectural Support for a Lock-Free Data Structure

16Copyright © 2006, CS 612

What if HW is not sufficient?What if HW is not sufficient?

This is the deployability challengeThis is the deployability challenge The missing piece of the puzzle for all prior work…The missing piece of the puzzle for all prior work…

Resource limitations are fundamentalResource limitations are fundamental Space: cachesSpace: caches

More HW delays the inevitable: will always be an n+1 caseMore HW delays the inevitable: will always be an n+1 case

Time: scheduling quantaTime: scheduling quantaProgrammers have no control over timeProgrammers have no control over time

Affects functionality, not just performanceAffects functionality, not just performanceSome transactions may never completeSome transactions may never complete

Making HW limit explicit is difficultMaking HW limit explicit is difficult Limited usage onlyLimited usage only

Unreasonable for high level languagesUnreasonable for high level languages

How do you architect it in an evolvable manner?How do you architect it in an evolvable manner?

Page 17: Transactional Memory Architectural Support for a Lock-Free Data Structure

17Copyright © 2006, CS 612

Virtualizing Transactional MemoryVirtualizing Transactional Memory

Hides hardware limitations like virtual memory hides physical Hides hardware limitations like virtual memory hides physical memory limitationsmemory limitations

2 modes2 modes 1 for common case which is built into HW1 for common case which is built into HW

22ndnd for buffer overflow, page faults, context switches or thread for buffer overflow, page faults, context switches or thread migration which is built using SW and HW data structuresmigration which is built using SW and HW data structures

Virtualization allows transactions to: be suspended, migrate, Virtualization allows transactions to: be suspended, migrate, or overflow state from local buffers, and allows nested or overflow state from local buffers, and allows nested transactionstransactions

Extra challenge: Unlike normal TM, can’t only use cache Extra challenge: Unlike normal TM, can’t only use cache coherence mechanisms, since need to detect conflicts b/w coherence mechanisms, since need to detect conflicts b/w active transactions and transactions whose state partially or active transactions and transactions whose state partially or completely overflowed to virtual memorycompletely overflowed to virtual memory

Page 18: Transactional Memory Architectural Support for a Lock-Free Data Structure

18Copyright © 2006, CS 612

Virtualize for CompletenessVirtualize for Completeness

Timer interrupts,Timer interrupts,Context switches,Context switches,

Exceptions,…Exceptions,…

Limited buffersLimited buffers

Core 1Core 1

111

Virtual TM

virtual address spacevirtual address space

Log/buffer space

Overflow management Overflow management Using virtual memoryUsing virtual memory

Software libs. and microcodeSoftware libs. and microcode

Out-of-band concurrency controlOut-of-band concurrency control

Programmer transparentProgrammer transparent

Performance isolationPerformance isolation

Suspendable/swappableSuspendable/swappable

Page 19: Transactional Memory Architectural Support for a Lock-Free Data Structure

19Copyright © 2006, CS 612

Recent TM ResearchRecent TM Research

Recently, focus on solving the harder problem of TMRecently, focus on solving the harder problem of TM Making the model immune to cache buffer size limitations, Making the model immune to cache buffer size limitations,

scheduling limitations, etc.scheduling limitations, etc.

TCC (Stanford) (2004)TCC (Stanford) (2004) Same limitation as Herlihy/Moss for TM (size limited to local caches)Same limitation as Herlihy/Moss for TM (size limited to local caches)

LTM (MIT), VTM (Intel), LogTM (Wisconsin) (2005)LTM (MIT), VTM (Intel), LogTM (Wisconsin) (2005) Assume hardware TM supportAssume hardware TM support

Add support to allow transactions to be immune to resource Add support to allow transactions to be immune to resource limitationslimitations

Goals of each similar, approaches very differentGoals of each similar, approaches very differentLTM: only resource overflowLTM: only resource overflow

VTM: complete virtualizationVTM: complete virtualization

LogTM: only resource overflow, AND optimize for COMMITsLogTM: only resource overflow, AND optimize for COMMITs

Page 20: Transactional Memory Architectural Support for a Lock-Free Data Structure

20Copyright © 2006, CS 612

Some Research ChallengesSome Research Challenges

Large transactionsLarge transactions

Language extensionsLanguage extensions

IO, loophole, escape hatches, …IO, loophole, escape hatches, …

Interaction and co-existence withInteraction and co-existence with Other synchronization schemes: locks, flags, …Other synchronization schemes: locks, flags, …

Other transactionsOther transactions Database transactionDatabase transaction

System transaction (Microsoft)System transaction (Microsoft)

Other libraries, system software, operating system, …Other libraries, system software, operating system, …

Performance monitor, tuning, debugging, …Performance monitor, tuning, debugging, …

Open vs Closed NestingOpen vs Closed Nesting

Interaction between transaction & non-transactionInteraction between transaction & non-transaction

Usage & WorkloadUsage & Workload PLDI workshopPLDI workshop

Page 21: Transactional Memory Architectural Support for a Lock-Free Data Structure

21Copyright © 2006, CS 612

TM – First DecadeTM – First Decade

IBM 801 Database Storage (1980s)IBM 801 Database Storage (1980s) Lock attribute bits on Lock attribute bits on virtual memoryvirtual memory (via TLBs, PTEs) at 128 byte (via TLBs, PTEs) at 128 byte

granularitygranularity

Load-Linked/Store Conditional (Jensen et al. 1987)Load-Linked/Store Conditional (Jensen et al. 1987) Optimistic update of single cache line (Alpha, MIPS, PowerPC)Optimistic update of single cache line (Alpha, MIPS, PowerPC)

Transactional Memory (Herlihy&Moss 1993)Transactional Memory (Herlihy&Moss 1993) Coined term; TM generalization of LL/SCCoined term; TM generalization of LL/SC Instructions explicitly identify transactional loads and storesInstructions explicitly identify transactional loads and stores Used dedicated transaction cacheUsed dedicated transaction cache

Size of transactions limited to transaction cacheSize of transactions limited to transaction cache

Oklahoma Update (Stone et al./IBM 1993)Oklahoma Update (Stone et al./IBM 1993) Similar to TM, concurrently proposedSimilar to TM, concurrently proposed Didn’t use cache but Didn’t use cache but dedicated monitored registersdedicated monitored registers to operate upon to operate upon

Page 22: Transactional Memory Architectural Support for a Lock-Free Data Structure

22Copyright © 2006, CS 612

ReferencesReferences

1.1. ““Transactional Memory: Architectural Support for Lock-Free Transactional Memory: Architectural Support for Lock-Free Data Structures”, Moss et. al, ISCA 1993Data Structures”, Moss et. al, ISCA 1993

2.2. ““Virtualizing Transactional Memory” K. Lai et. al, ISCA 2005Virtualizing Transactional Memory” K. Lai et. al, ISCA 2005

3.3. ““LogTM: Log-based Transactional Memory”, Wood et. al, LogTM: Log-based Transactional Memory”, Wood et. al, HPCA-12HPCA-12