24
EazyHTM: Eager-Lazy Hardware Transactional Memory Saša Tomić, Cristian Perfumo, Chinmay Kulkarni, Adrià Armejach, Adrián Cristal, Osman Unsal, Tim Harris, Mateo Valero Barcelona Supercomputing Center, UPC BITS Pilani Microsoft Research Cambridge

EazyHTM : Eager-Lazy Hardware Transactional Memory

  • Upload
    elan

  • View
    65

  • Download
    0

Embed Size (px)

DESCRIPTION

EazyHTM : Eager-Lazy Hardware Transactional Memory. Saša Tomić , Cristian Perfumo , Chinmay Kulkarni , Adrià Armejach , Adri á n Cristal, Osman Unsal , Tim Harris, Mateo Valero. Barcelona Supercomputing Center, UPC BITS Pilani Microsoft Research Cambridge. Why Transactional Memory?. - PowerPoint PPT Presentation

Citation preview

Page 1: EazyHTM : Eager-Lazy Hardware Transactional Memory

EazyHTM: Eager-Lazy Hardware Transactional Memory

Saša Tomić, Cristian Perfumo, Chinmay Kulkarni,

Adrià Armejach, Adrián Cristal, Osman Unsal, Tim Harris, Mateo Valero

Barcelona Supercomputing Center, UPC

BITS Pilani

Microsoft Research Cambridge

Page 2: EazyHTM : Eager-Lazy Hardware Transactional Memory

2

Why Transactional Memory?• Lock-based parallel programming has

problems– Deadlocks, races, complexity, performance, …

• Transactional Memory (TM) to the rescue– Optimistic concurrency control mechanism– Easy to use– Deadlock free– Supports composability– Protects data in critical sections

• Hardware-TM (HTM), Software-TM (STM) and hybrid

Page 3: EazyHTM : Eager-Lazy Hardware Transactional Memory

3

HTM terminology• Atomic section/transaction: group of

instructions that appear to take effect instantaneously

• Where are speculative values stored (version management):– in-place, and log the original value, or– buffered in private storage, publish on commit

• Conflict: TX writes where others TX reads– Detection: an action in which we check for

conflicts– Resolution: an action performed to resolve

the conflict• Can be abort, stalling the execution, …

Page 4: EazyHTM : Eager-Lazy Hardware Transactional Memory

4

• A.k.a. pessimistic• Writes in-place, detects&resolves conflicts on

every access• LogTM [Moore, HPCA06], LogTM-SE [Yen, HPCA07]

Eager HTM

Stall

W

RR

TX 1

TX 2

TX 3

fastcomm

it

Limitedconcurrency

Fast commit

Slow abort

Page 5: EazyHTM : Eager-Lazy Hardware Transactional Memory

5

• A.k.a. optimistic• Writes buffered, detect&resolve conflicts on

commit• TCC [Hammond, ISCA04], Scalable-TCC [Chafi,

HPCA07]

Lazy HTM

W

RR

TX 1

TX 2

TX 3

complexcommit: validate + write

Fast abort

Complex commit

Good concurrency

Page 6: EazyHTM : Eager-Lazy Hardware Transactional Memory

The MotivationSplitting conflict management

• Eager-Lazy hardware-software TM exists (FlexTM [Shriraman, ISCA08]):– Software begin, commit and abort– Probabilistic (signature based) conflict detection

• EazyHTM is the first pure-hardware TM6

Conflictdetection

Eager

Lazy

Conflict resolution

Eager Lazy

LogTM

TCC, S-TCCImpossible

EazyHTM Fast commit

Good concurrency

Page 7: EazyHTM : Eager-Lazy Hardware Transactional Memory

Outline• Motivation• Contributions• Hardware changes• The Protocol• Evaluation• Conclusions

7

Page 8: EazyHTM : Eager-Lazy Hardware Transactional Memory

EazyHTM Contributions• The best of two worlds

– Eager conflict detection: simple commit/exact list of conflicts in advance

– Lazy conflict resolution: good concurrency• Parallel commits of non-conflicting TXs• Designed for CMPs (Chip-Multiprocessors)

– Use cores proximity– MESI/MOESI protocol upgrade (easier

verification)

8

Page 9: EazyHTM : Eager-Lazy Hardware Transactional Memory

Hardware changes

9

Racers list – 1 bit per coreKillers list – 1 bit per core

SR – 1 bit per lineSM – 1 bit per line

TD – 1 bit per line

Register file checkpoint

Racers list

Killers listCPU

SR Existing cache logic

PrivateCache(s)

SM

TD Existing directory logicDirectory

• tracks conflicts• bit-vector• 32 bits for 32 cores

holds read/write set

read-only optimization bit(details in the paper)

core core core... ... ...

Page 10: EazyHTM : Eager-Lazy Hardware Transactional Memory

Racers and killers list• If line is shared between two TXs:

– Read-Read• No conflict

– Write-Read, Read-Write, Write-Write• Writer adds reader TX into “racers” list

– “TXs that I have to abort” list, if I commit first• Reader adds writer TX into “killers” list

– “TXs that can abort me” list, if they commit first• We illustrate only the Write-after-Read (WAR)

conflict

10

Page 11: EazyHTM : Eager-Lazy Hardware Transactional Memory

txMark @A

ACK @A, 0

... ...

no othersharers

EazyHTM Protocol

Conflict Detection (1/2)

11

racers

killers

TX 0

racers

killers

TX 2

sharers @A

Directory

1

2

TX 0 TX 2BTX

BTXRD A

WR ACTX

CTX

ReplacesGETS/GETX

Page 12: EazyHTM : Eager-Lazy Hardware Transactional Memory

TX 0 TX 2BTX

BTXRD A

WR ACTX

CTX

racers

killers

TX 2

sharers @A

Directory

racers

killers

TX 0

ACK @A, 1txAccessor #2, @A

txMark @A

Reader #0, @A

Potentialconflict

1 othersharer

Writer #2, @A

EazyHTM Protocol

Conflict Detection (2/2)

12

Remember: abort TX#0 on commit

Remember:TX#2 canabort me

1

23

4

5

Page 13: EazyHTM : Eager-Lazy Hardware Transactional Memory

racers

killers

TX 2

racers

killers

TX 0

sharers @A

Directory

Abort from TX#2WR @A (commit)

Abort Ack from TX#0

EazyHTM Protocol

Conflict Resolution

13

TX#2 first came to the commit point, abort TX#0!1

12

3

TX 0 TX 2BTX

BTXRD A

WR ACTX

CTX

Page 14: EazyHTM : Eager-Lazy Hardware Transactional Memory

TX 0 TX 2BTX

BTXWR A

WR BCTX

CTX

TX 0 TX 2BTX

BTXWR A

WR BCTX

CTX

TX 0 TX 2BTX

BTXWR A

WR BCTX

CTX

0 othersharers

EazyHTM Protocol

Disjoint data => parallel commit

14

txMark @B

...

txMark @A

ACK @A, 0

WR @A(commit)

WR @B(commit)

TX#0 works with line @A TX#2 works with line @B

sharers @A

Directorysharers @B

1 1

ACK @B, 022

racers

killers

TX 0

3racers

killers

TX 2

3

...

NO SERIALIZAT

ION 0 othersharers

Page 15: EazyHTM : Eager-Lazy Hardware Transactional Memory

Implementation• Implemented in M5, full-system simulator

(Alpha)• Private L1 (32KB, 4-way, 64B CL, 2 cycles)• Private L2 (512KB, 8-way, 64B CL, 10

cycles)• Memory (with directory, 100 cycles)• ICN (2D Mesh, 10 cycles per hop)

15

Page 16: EazyHTM : Eager-Lazy Hardware Transactional Memory

Evaluation• Evaluated STAMP benchmarks• Compared with Scalable-TCC-like HTM

– Same base simulator– Implemented specialized directory protocol

• Compared with ideal lazy HTM (MESI based)– magical conflict detection– instant conflict resolution– parallel write-back commit

16

Page 17: EazyHTM : Eager-Lazy Hardware Transactional Memory

17

Kmeans Low

• Small TXs (RS 15 CL; WS 5 CL)

• Low contention(10% aborts)

• Similar profile to “replacing locks with atomic”

• Near ideal performance• K-means: groups N-

dimensional space into K clusters

• Most of the SPLASH-2 suite has similar profile0 5 10 15 20 25 30 35

0

5

10

15

20

25

30

Kmeans-Low

IdealEazyHTMSTCC

processors

spee

dup

Page 18: EazyHTM : Eager-Lazy Hardware Transactional Memory

SSCA2

• Small TXs (RS 50 CL, WS 10 CL)

• Low contention(1.2% aborts)

• Near ideal performance• Scalability affected by

barriers, not by contention• SSCA2: large directed

graph operations

18

0 5 10 15 20 25 30 350

0.5

1

1.5

2

2.5

3

3.5

4

4.5

SSCA2

IdealEazyHTMSTCC

processors

spee

dup

Page 19: EazyHTM : Eager-Lazy Hardware Transactional Memory

Yada

• Large TXs (260 CL RS, 140 CL WS)

• Moderate contention (35% aborts)

• We can see good performance also for large TXs!

• Yada: delaunay mesh refinement

19

0 5 10 15 20 25 30 350

2

4

6

8

10

12

Yada

IdealEazyHTMSTCC

processors

spee

dup

Page 20: EazyHTM : Eager-Lazy Hardware Transactional Memory

Intruder

• Medium TXs (53 CL RS, 20 CL WS)

• High contention (85% aborts)

• Very bad scalability for all HTMs

• Every transaction detects conflicts over and over again – lot of conflict detection messages slow down the execution

• Intruder: signature based network intrusion detection system

20

0 5 10 15 20 25 30 35 400

2

4

6

8

10

12

Intruder

IdealEazyHTMSTCC

processors

spee

dup

Page 21: EazyHTM : Eager-Lazy Hardware Transactional Memory

Only high-conflict STAMP

• >50% abort rate only

• High contention high-core-count should be optimized

• Averages:• Labyrinth• Intruder• Kmeans-Hi

• Results highly affected by Intruder

21

0 5 10 15 20 25 30 350

2

4

6

8

10

12

High-conflict STAMP

IdealEazyHTMSTCC

processors

spee

dup

Page 22: EazyHTM : Eager-Lazy Hardware Transactional Memory

Only low-conflict STAMP

• <50% abort rate only

• Low abort rate necessary for scaling

• Excludes:• Labyrinth 8-32• Intruder 16-32• Kmeans-Hi 32

22

0 5 10 15 20 25 30 350

2

4

6

8

10

12

Scaling STAMP

IdealEazyHTMSTCC

processors

spee

dup

Page 23: EazyHTM : Eager-Lazy Hardware Transactional Memory

Conclusions• Introduced EazyHTM, a new HTM implementation

– Eager conflict detection, lazy conflict resolution– Fast: performs well for low conflict parallel applications– Minimal changes to directory protocols (easier

verification)– As scalable as standard directory protocol

• EazyHTM mechanism could allow (future work):– Simpler transaction prioritization– Less wasted work– Better performance optimization– Power efficient TM mechanisms

23

Page 24: EazyHTM : Eager-Lazy Hardware Transactional Memory

Thank you!

Questions? [email protected]

24