74
1 Hardware Transactional Memory Royi Maimon Merav Havuv 27/5/2007

1 Hardware Transactional Memory Royi Maimon Merav Havuv 27/5/2007

Embed Size (px)

Citation preview

Page 1: 1 Hardware Transactional Memory Royi Maimon Merav Havuv 27/5/2007

1

Hardware Transactional Memory

Royi MaimonMerav Havuv

27/5/2007

Page 2: 1 Hardware Transactional Memory Royi Maimon Merav Havuv 27/5/2007

2

References

M. Herlihy and J. Moss,  Transactional Memory: Architectural Support for Lock-Free Data Structures 

C. Scott Ananian, Krste Asanovic, Bradley  C. Kuszmaul, Charles  E. Leiserson, Sean  Lie: Unbounded Transactional  Memory.

Hammond, Wong, Chen, Carlstrom, Davis (Jun 2004).“Transactional Memory Coherence and Consistency”

Page 3: 1 Hardware Transactional Memory Royi Maimon Merav Havuv 27/5/2007

3

Today

What are transactions?

What is Hardware Transactional Memory?

Various implementations of HTM

Page 4: 1 Hardware Transactional Memory Royi Maimon Merav Havuv 27/5/2007

4

Outline

Lock-Free Hardware Transactional Memory (HTM)

Transactions Cache coherence protocol General Implementation Simulation

UTM LTM TCC (briefly) Conclusions

Page 5: 1 Hardware Transactional Memory Royi Maimon Merav Havuv 27/5/2007

5

Outline

Lock-Free Hardware Transactional Memory (HTM)

Transactions Cache coherence protocol General Implementation Simulation

UTM LTM TCC (briefly) Conclusions

Page 6: 1 Hardware Transactional Memory Royi Maimon Merav Havuv 27/5/2007

6

Lock-free

A shared data structure is lock-free if its operations do not require mutual exclusion.

If one process is interrupted in the middle of an operation, other processes will not be prevented from operating on that object.

Page 7: 1 Hardware Transactional Memory Royi Maimon Merav Havuv 27/5/2007

7

Lock-free data structures avoid common problems associated with conventional locking techniques in highly concurrent systems:

– Priority inversion

– Convoying occurs when a process holding a lock is descheduled, and then, other processes capable of running may be unable to progress.

– Deadlock

Lock-free (cont)

Page 8: 1 Hardware Transactional Memory Royi Maimon Merav Havuv 27/5/2007

8

Priority inversion

Priority inversion occurs when a lower-priority process is preempted while holding a lock needed by higher-priority processes.

Page 9: 1 Hardware Transactional Memory Royi Maimon Merav Havuv 27/5/2007

9

Deadlock

Deadlock – two or more processes are waiting indefinitely for an event that can be caused by only one of waiting processes.

Let S and Q be two resources

P0 P1

Lock(S) Lock(Q)Lock(Q) Lock(S)

Page 10: 1 Hardware Transactional Memory Royi Maimon Merav Havuv 27/5/2007

10

Outline

Lock-Free Hardware Transactional Memory (HTM)

Transactions Cache coherence protocol General Implementation Simulation

UTM LTM TCC (briefly) Conclusions

Page 11: 1 Hardware Transactional Memory Royi Maimon Merav Havuv 27/5/2007

11

What is a transaction?

A transaction is a sequence of memory loads and stores executed by a single process that either commits or aborts

If a transaction commits, all the loads and stores appear to have executed atomically

If a transaction aborts, none of its stores take effect Transaction operations aren't visible until they

commit or abort

Page 12: 1 Hardware Transactional Memory Royi Maimon Merav Havuv 27/5/2007

12

Transactions properties:

A transaction satisfies the following properties:– Serializability

– Atomicity

Simplified version of traditional ACID database (Atomicity, Consistency, Isolation, and Durability)

Page 13: 1 Hardware Transactional Memory Royi Maimon Merav Havuv 27/5/2007

13

Transactional Memory

A new multiprocessor architecture The goal: Implementing a lock-free synchronization

– efficient– easy to use

comparing to conventional techniques based on mutual exclusion

Implemented by straightforward extensions to multiprocessor cache-coherence protocols.

Page 14: 1 Hardware Transactional Memory Royi Maimon Merav Havuv 27/5/2007

14

An Example

Locks:if (i<j) {

a = i; b = j; } else { a = j; b = i; } Lock(L[a]); Lock(L[b]); Flow[i] = Flow[i] – X; Flow[j] = Flow[j] + X; Unlock(L[b]); Unlock(L[a]);

Transactional Memory:

StartTransaction; Flow[i] = Flow[i] – X; Flow[j] = Flow[j] + X; EndTransaction;

Page 15: 1 Hardware Transactional Memory Royi Maimon Merav Havuv 27/5/2007

15

Transactional Memory

Transactions execute in commit order

ld 0xdddd...st 0xbeef

Transaction ATime

ld 0xbeef

Transaction C

ld 0xbeef

Re-execute Re-execute with new datawith new data

Commit

ld 0xdddd...ld 0xbbbb

Transaction B

Commit Violation!Violation!

0xbeef0xbeef

Page 16: 1 Hardware Transactional Memory Royi Maimon Merav Havuv 27/5/2007

16

Outline

Lock-Free Hardware Transactional Memory (HTM)

Transactions Cache coherence protocol General Implementation Simulation

UTM LTM TCC (briefly) Conclusions

Page 17: 1 Hardware Transactional Memory Royi Maimon Merav Havuv 27/5/2007

17

Cache-Coherence Protocol

A protocol for managing the caches of a multiprocessor system:

– No data is lost– No overwritten before the data is transferred from a cache

to the target memory.

When multiprocessing, each processor may have its own memory cache that is separate from the shared memory

Page 18: 1 Hardware Transactional Memory Royi Maimon Merav Havuv 27/5/2007

18

The Problem (Cache-Coherence)

Solving the problem in either of two ways:– directory-based– snooping system

Page 19: 1 Hardware Transactional Memory Royi Maimon Merav Havuv 27/5/2007

19

Snoopy Cache

All caches watches the activity (snoop) on a global bus to determine if they have a copy of the block of data that is requested on the bus.

Page 20: 1 Hardware Transactional Memory Royi Maimon Merav Havuv 27/5/2007

20

Directory-based

The data being shared is placed in a common directory that maintains the coherence between caches.

The directory acts as a filter through which the processor must ask permission to load an entry from the primary memory to its cache.

When an entry is changed the directory either updates or invalidates the other caches with that entry.

Page 21: 1 Hardware Transactional Memory Royi Maimon Merav Havuv 27/5/2007

21

Outline

Lock-Free Hardware Transactional Memory (HTM)

Transactions Cache coherence protocol General Implementation Simulation

UTM LTM TCC (briefly) Conclusions

Page 22: 1 Hardware Transactional Memory Royi Maimon Merav Havuv 27/5/2007

22

How it Works?

The following primitive instructions for accessing memory are provided:

Load-transactional (LT): reads value of a shared memory location into a private register.

Load-transactional-exclusive (LTX): Like LT, but “hinting” that the location is likely to be modified.

Store-transactional (ST) tentatively writes a value from a private register to a shared memory location.

Commit (COMMIT) Abort (ABORT) Validate (VALIDATE) tests the current transaction status.

Page 23: 1 Hardware Transactional Memory Royi Maimon Merav Havuv 27/5/2007

23

Some definitions

Read set: the set of locations read by LT by a transaction

Write set: the set of locations accessed by LTX or ST by a transaction

Data set (footprints): the union of the read and write sets.

A set of values in memory is inconsistent if it couldn’t have been produced by any serial execution of transactions

Page 24: 1 Hardware Transactional Memory Royi Maimon Merav Havuv 27/5/2007

24

Intended Use

Instead of acquiring a lock, executing the critical section, and releasing the lock, a process would:

1. use LT or LTX to read from a set of locations2. use VALIDATE to check that the values read are

consistent,3. use ST to modify a set of locations4. use COMMIT to make the changes permanent.

If either the VALIDATE or the COMMIT fails, the process returns to Step (1).

Page 25: 1 Hardware Transactional Memory Royi Maimon Merav Havuv 27/5/2007

25

Implementation

Transactional memory is implemented by modifying standard multiprocessor cache coherence protocols

We describe here how to extend “snoopy” cache protocol for a shared bus to support transactional memory

Our transactions are short-lived activities with relatively small Data set.

Page 26: 1 Hardware Transactional Memory Royi Maimon Merav Havuv 27/5/2007

26

The basic idea

Any protocol capable of detecting accessibility conflicts can also detect transaction conflict at no extra cost

Once a transaction conflict is detected, it can be resolved in a variety of ways

Page 27: 1 Hardware Transactional Memory Royi Maimon Merav Havuv 27/5/2007

27

Implementation

Each processor maintains two caches– Regular cache for non-transactional operations, – Transactional cache for transactional operations.

It holds all the tentative writes, without propagating them to other processors or to main memory (until commit)

Why using two caches?

Page 28: 1 Hardware Transactional Memory Royi Maimon Merav Havuv 27/5/2007

28

Cache line states

Each cache line (regular or transactional) has one of the following states:

The transactional cache expends these states:

Page 29: 1 Hardware Transactional Memory Royi Maimon Merav Havuv 27/5/2007

29

Cleanup

When the transactional cache needs space for a new entry, it searches for:– EMPTY entry

– If not found - a NORMAL entry

– finally for an XCOMMIT entry.

Page 30: 1 Hardware Transactional Memory Royi Maimon Merav Havuv 27/5/2007

30

Processor actions

Each processor maintains two flags:– The transaction active (TACTIVE) flag: indicates whether a

transaction is in progress

– The transaction status (TSTATUS) flag: indicates whether that transaction is active (True) or aborted (False)

Non-transactional operations behave exactly as in original cache-coherence protocol

Page 31: 1 Hardware Transactional Memory Royi Maimon Merav Havuv 27/5/2007

31

Example – LT operation:

Look for XABORT entry

Return it’s value

Look for NORMAL entry

Change it to XABORT and allocate another XCOMMIT entry

Found?Not Found?

Ask to read this block from the shared memory

Found?

Not Found?

Successful read

Create two entries: XABORT and XCOMMIT

Unsuccessful read

Abort the transaction:

TSTATUS=FALSE

Drop XABORT entries

All XCOMMIT entries are set to NORMAL

Cache miss

Page 32: 1 Hardware Transactional Memory Royi Maimon Merav Havuv 27/5/2007

32

Snoopy cache actions:

Both the regular cache and the transactional cache snoop on the bus.

A cache ignores any bus cycles for lines not in that cache.

The transactional cache’s behavior:– If TSTATUS=False, or if the operation isn’t transactional,

the cache acts just like the regular cache, but ignores entries with state other than NORMAL

– On LT of other cpu, if the state is VALID, the cache returns the value, and for all other transactional operations it returns BUSY

Page 33: 1 Hardware Transactional Memory Royi Maimon Merav Havuv 27/5/2007

33

Outline

Lock-Free Hardware Transactional Memory (HTM)

Transactions Cache coherence protocol General Implementation Simulation

UTM LTM TCC (briefly) Conclusions

Page 34: 1 Hardware Transactional Memory Royi Maimon Merav Havuv 27/5/2007

34

Simulation

We’ll see an example code for the producer/consumer algorithm using transactional memory architecture.

The simulation runs on both cache coherence protocols: snoopy and directory cache.

The simulation use 32 processors The simulation finishes when 2^16 operations have

completed.

Page 35: 1 Hardware Transactional Memory Royi Maimon Merav Havuv 27/5/2007

35

Part Of Producer/Consumer Code

typedef struct { Word deqs; // Holds the head’s index Word enqs; // Holds the tail’s index Word items[QUEUE_SIZE];} queue;

unsigned queue_deq(queue *q) { unsigned head, tail, result; unsigned backoff = BACKOFF_MIN unsigned wait; while (1) { result = QUEUE_EMPTY; tail = LTX(&q->enqs); head = LTX(&q->deqs); if (head != tail) { /* queue not empty? */ result = LT(&q->items[head % QUEUE_SIZE]); /* advance counter */ ST(&q->deqs, head + 1); } if (COMMIT()) break; /* abort => backoff */ wait = random() % (01 << backoff); while (wait--); if (backoff < BACKOFF_MAX) backoff++; } return result;}

Page 36: 1 Hardware Transactional Memory Royi Maimon Merav Havuv 27/5/2007

36

The results:

Page 37: 1 Hardware Transactional Memory Royi Maimon Merav Havuv 27/5/2007

37

In both HTM and STM the transactions shouldn’t touch many memory locations

There is a (small) bound on the transactions footprint

In addition, there is a duration limit.

So Far:

Page 38: 1 Hardware Transactional Memory Royi Maimon Merav Havuv 27/5/2007

38

Outline

Lock-Free Hardware Transactional Memory (HTM)

Transactions Cache coherence protocol General Implementation Simulation

UTM LTM TCC (briefly) Conclusions

Page 39: 1 Hardware Transactional Memory Royi Maimon Merav Havuv 27/5/2007

39

UTM – new thesis: supports transactions of arbitrary footprint and duration.

The UTM architecture allows:– transactions as large as virtual memory– transactions of unlimited duration– transactions which can migrate between processors

UTM supports a semantics for nested transactions

In contrast to previous HTM implementation: UTM is optimized for transactions below a certain size but still operate correctly for larger transactions

Unbounded Transactional Memory (UTM)

Page 40: 1 Hardware Transactional Memory Royi Maimon Merav Havuv 27/5/2007

40

The Goal of UTM

The primary goal: – make concurrent programming easier.– Reducing implementation overhead.

Why do we want unbounded TM?

Neither programmers nor compilers can easily cope with an imposed hard limit on transaction size.

Page 41: 1 Hardware Transactional Memory Royi Maimon Merav Havuv 27/5/2007

41

UTM architecture

The transaction log – data structure that maintains bookkeeping information for a transaction

Why is it needed?– Enables transactions to survive time slice

interrupts – Enables process migration from one processor to

another.

Page 42: 1 Hardware Transactional Memory Royi Maimon Merav Havuv 27/5/2007

42

Two new instructions

All the programmer must specify is where a transaction begins and ends

XBEGIN pc– Begin a new transaction. Entry point to an abort handler

specified by pc.– If transaction must fail, roll back processor and memory

state to what it was when XBEGIN was executed, and jump to pc.

– We can think of an XBEGIN instruction as a conditional branch to the abort handler.

XEND– End the current transaction. If XEND completes, the

transaction is committed and appeared atomic.– Nested transactions are subsumed into outer transaction.

Page 43: 1 Hardware Transactional Memory Royi Maimon Merav Havuv 27/5/2007

43

Transaction Semantics

XBEGIN L1 ADD R1, R1, R1 ST 1000, R1 XEND

L2: XBEGIN L2 ADD R1, R1, R1 ST 2000, R1 XEND

Two transactions:– “A” has an abort handler at L1– “B” has an abort handler at L2

Here, very simplistic retry.

A

B

Page 44: 1 Hardware Transactional Memory Royi Maimon Merav Havuv 27/5/2007

44

A name dependence occurs when two instructions Inst1 and Inst2 use the same register (or memory location), but there is no data transmitted between Inst1 and Inst2.

If the register is renamed so that Inst1 and Inst2 do not conflict, the two instructions can execute simultaneously or be reordered.

This technique that dynamically eliminates name dependences in registers, is called register renaming.

Register renaming can be done statically (= by compiler) or dynamically (= by hardware).

Register renaming

Page 45: 1 Hardware Transactional Memory Royi Maimon Merav Havuv 27/5/2007

45

Rolling back processor state

After XBEGIN instruction we take a snapshot of the rename table

To keep track of busy registers, we maintain an S (saved) bit for each physical register to indicate which registers are part of the active transaction and it includes the S bits with every renaming-table snapshot

An active transaction’s abort handler address, nesting depth, and snapshot are part of its transactional state.

Page 46: 1 Hardware Transactional Memory Royi Maimon Merav Havuv 27/5/2007

46

Memory State

UTM represents the set of active transactions with a single data structure held in system memory, the x-state (short for “transaction state”).

Page 47: 1 Hardware Transactional Memory Royi Maimon Merav Havuv 27/5/2007

47

Xstate Implementation

The x-state contains a transaction log for each active transaction in the system.

Each log consists of:– A commit record: maintains the transaction’s status:

pending committed aborted

– A vector of log entries: corresponds to a memory block that the transaction has read or written to. The entry provides:

pointer to the block The block’s old value (for rollback) A pointer to the commit record Pointers that form a linked list of all entries in all transaction logs that

refer to the same block. (Reader List)

Page 48: 1 Hardware Transactional Memory Royi Maimon Merav Havuv 27/5/2007

48

Xstate Implementation (Cont)

The final part of the x-state consists of:– log pointer– read-write bit

for each memory block

Page 49: 1 Hardware Transactional Memory Royi Maimon Merav Havuv 27/5/2007

49

X-state Data Structure

42

Transaction log 1

PENDING

42

Transaction log 2

PENDING

32

32

42

Commit record

Old value

Block pointer

Reader list

Commit record pointer

Transaction log entry

W

log pointerRW bit

R

X-state

Application memory

Old value

Block pointer

Reader list

Commit record pointer

block

43

42

Page 50: 1 Hardware Transactional Memory Royi Maimon Merav Havuv 27/5/2007

50

More on x-state

When a processor references a block that is already part of a pending transaction, the system checks the RW bit and log pointer to determine the correct action:

– use the old value

– use the new value

– abort the transaction

Page 51: 1 Hardware Transactional Memory Royi Maimon Merav Havuv 27/5/2007

51

Commit action

42

Transaction log 1

PENDING

42

Transaction log 2

PENDING

32

43

42

Commit record

Old value

Block pointer

Reader list

Commit record pointer

Transaction log entry

W

log pointerRW bit

R

X-state

Application memory

Old value

Block pointer

Reader list

Commit record pointer

block

Transaction log 1

COMMITED

Page 52: 1 Hardware Transactional Memory Royi Maimon Merav Havuv 27/5/2007

52

Cleanup action

42

Transaction log 1

COMMITED

42

Transaction log 2

PENDING

32

43

42

Commit record

Old value

Block pointer

Reader list

Commit record pointer

Transaction log entry

W

log pointerRW bit

R

X-state

Application memory

Old value

Block pointer

Reader list

Commit record pointer

block

Page 53: 1 Hardware Transactional Memory Royi Maimon Merav Havuv 27/5/2007

53

Abort action

42

Transaction log 1

PENDING

42

Transaction log 2

PENDING

32

43

42

Commit record

Old value

Block pointer

Reader list

Commit record pointer

Transaction log entry

W

log pointerRW bit

R

X-state

Application memory

Old value

Block pointer

Reader list

Commit record pointer

block

Transaction log 1

ABORTED

32

42

Page 54: 1 Hardware Transactional Memory Royi Maimon Merav Havuv 27/5/2007

54

Transactions Conflict

A conflict: When two or more pending transactions have accessed a block and at least one of the accesses is for writing.

Performing a transaction load:– check that the log pointer refers to an entry in the current

transaction log or the RW bit is R.

Performing a transaction store:– check that the log pointer references no other transaction

In case of a conflict, some of the conflicting transactions are aborted.

– Which transaction should be aborted?

Page 55: 1 Hardware Transactional Memory Royi Maimon Merav Havuv 27/5/2007

55

Caching

For small transaction that fits in cache, UTM, like earlier proposed HTM systems, uses cache coherence protocol to identify conflicts

For transactions too big to fit in cache, the x-state for the transaction overflows into the ordinary memory hierarchy

– Most log entries don't need to be created

– Only create transaction log when transaction is run out of physical memory.

Page 56: 1 Hardware Transactional Memory Royi Maimon Merav Havuv 27/5/2007

56

UTM’s Goal

support transactions that run for an indefinite length of time

migrate from one processor to another footprints bigger than the physical memory.

The main technique we propose is to treat the x-state as a systemwide data structure that uses global virtual addresses

Page 57: 1 Hardware Transactional Memory Royi Maimon Merav Havuv 27/5/2007

57

Benefits and Limits of UTM

Limits:– Complicated implementation

Benefits:– Unlimited footprint– Unlimited duration– Migration possible– Good performance in the common case (small

transactions)

Page 58: 1 Hardware Transactional Memory Royi Maimon Merav Havuv 27/5/2007

58

Outline

Lock-Free Hardware Transactional Memory (HTM)

Transactions Cache coherence protocol General Implementation Simulation

UTM LTM TCC (briefly) Conclusions

Page 59: 1 Hardware Transactional Memory Royi Maimon Merav Havuv 27/5/2007

59

LTM: Visible, Large, Frequent, Scalable

“Large Transactional Memory”– Not truly unbounded, but simple and cheap

Minimal architectural changes, high performance– Small modifications to cache and processor core– No changes to main memory, cache coherence

protocol– Can be pin-compatible with conventional

processors

Page 60: 1 Hardware Transactional Memory Royi Maimon Merav Havuv 27/5/2007

60

LTM’s Restrictions :

Limiting a transaction’s footprint to (nearly) the size of physical memory.

Duration must be less than a time slice Transactions cannot migrate between

processors.

With these restrictions, we can implement LTM by modifying only the cache and processor core

Page 61: 1 Hardware Transactional Memory Royi Maimon Merav Havuv 27/5/2007

61

LTM vs UTM

Like UTM, LTM maintains data about pending transactions in the cache and detects conflicts using the cache coherency protocol

Unlike UTM, LTM does not treat the transaction as a data structure. Instead, it binds a transaction to a particular cache.

– Transactional data overflows from the cache into a hash table in main memory

LTM and UTM have similar semantics: XBEGIN and XEND instructions are the same

In LTM, the cache plays a major part…

Page 62: 1 Hardware Transactional Memory Royi Maimon Merav Havuv 27/5/2007

62

Addition to Cache

LTM adds a bit (T) per cache line to indicate that the data has been accessed as part of a pending transaction.

An additional bit (O) is added per cache set to indicate that it has overflowed.

Page 63: 1 Hardware Transactional Memory Royi Maimon Merav Havuv 27/5/2007

63

Cache overflow mechanism

O T Tag Data

Overflow hashtable

Key

ST 1000, 55XBEGIN L1LD R1, 1000ST 2000, 66ST 3000, 77LD R1, 1000XEND

Data

Page 64: 1 Hardware Transactional Memory Royi Maimon Merav Havuv 27/5/2007

64

Cache overflow mechanism

1000 55

O T Tag Data

Overflow hashtable

Key

ST 1000, 55XBEGIN L1LD R1, 1000ST 2000, 66ST 3000, 77LD R1, 1000XEND

Data

Page 65: 1 Hardware Transactional Memory Royi Maimon Merav Havuv 27/5/2007

65

Cache overflow: recording reads

T 1000 55

O T Tag Data

Overflow hashtable

Key

ST 1000, 55XBEGIN L1LD R1, 1000ST 2000, 66ST 3000, 77LD R1, 1000XEND

Data

Page 66: 1 Hardware Transactional Memory Royi Maimon Merav Havuv 27/5/2007

66

Cache overflow: recording writes

T 1000 55

T 2000 66

O T Tag Data

Overflow hashtable

Key

ST 1000, 55XBEGIN L1LD R1, 1000ST 2000, 66ST 3000, 77LD R1, 1000XEND

Data

Page 67: 1 Hardware Transactional Memory Royi Maimon Merav Havuv 27/5/2007

67

Cache overflow: spilling

T 3000 77

T 2000 66

O

1000 55

O T Tag Data

Overflow hashtable

Key

ST 1000, 55XBEGIN L1LD R1, 1000ST 2000, 66ST 3000, 77LD R1, 1000XEND

Data

Page 68: 1 Hardware Transactional Memory Royi Maimon Merav Havuv 27/5/2007

68

Cache overflow: miss handling

T 1000 55

T 2000 66

O

3000 77

O T Tag Data

Overflow hashtable

Key

ST 1000, 55XBEGIN L1LD R1, 1000ST 2000, 66ST 3000, 77LD R1, 1000XEND

Data

Page 69: 1 Hardware Transactional Memory Royi Maimon Merav Havuv 27/5/2007

69

LTM - Summary

Transactions as large as physical memory

Scalable overflow and commit

Easy to implement!

Low overhead

Page 70: 1 Hardware Transactional Memory Royi Maimon Merav Havuv 27/5/2007

70

Outline

Lock-Free Hardware Transactional Memory (HTM)

Transactions Cache coherence protocol General Implementation Simulation

UTM LTM TCC (briefly) Conclusions

Page 71: 1 Hardware Transactional Memory Royi Maimon Merav Havuv 27/5/2007

71

Transactional Memory Coherence and Consistency (TCC)

Hammond, Wong, Chen, Carlstrom, Davis (Jun 2004).“Transactional Memory Coherence and Consistency”

All transactions, all the time! Code partitioned into transactions by programmer or tools

– Possibly at run-time, for legacy code!

All writes buffered in caches, CPUs arbitrate system-wide for which one gets to commit

Updates broadcast to all CPUs. CPUs detect conflicts of their transactions and abort

Page 72: 1 Hardware Transactional Memory Royi Maimon Merav Havuv 27/5/2007

72

TCC Implementation

r m V tag data

Commit control

Write buffer

Local cache hierarchy

Broadcast bus or network

snoopingcommits

CPU Corestoresonly

Loads & stores

Page 73: 1 Hardware Transactional Memory Royi Maimon Merav Havuv 27/5/2007

73

Conclusions

Unbounded, scalable, and efficient Transactional Memory systems can be built.

– Support large, frequent, and concurrent transactions– Allow programmers to (finally!) use our parallel systems!

Three architectures:– LTM: easy to realize, almost unbounded– UTM: truly unbounded– TCC: high performance

Page 74: 1 Hardware Transactional Memory Royi Maimon Merav Havuv 27/5/2007

74

THE END…

Royi Maimon

Merav Havuv