110
spcl.inf.ethz.ch @spcl_eth MACIEJ BESTA, HERMANN SCHWEIZER, TORSTEN HOEFLER Evaluating the Cost of Atomic Operations on Modern Architectures

@spcl eth Evaluating the Cost of Atomic Operations on ...spcl.inf.ethz.ch @spcl_eth ATOMICS: POPULARITY [PPoPP’14] Fast Concurrent Lock-Free Binary Search Trees [PPoPP’14] A General

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: @spcl eth Evaluating the Cost of Atomic Operations on ...spcl.inf.ethz.ch @spcl_eth ATOMICS: POPULARITY [PPoPP’14] Fast Concurrent Lock-Free Binary Search Trees [PPoPP’14] A General

spcl.inf.ethz.ch

@spcl_eth

MACIEJ BESTA, HERMANN SCHWEIZER, TORSTEN HOEFLER

Evaluating the Cost of Atomic Operations

on Modern Architectures

Page 2: @spcl eth Evaluating the Cost of Atomic Operations on ...spcl.inf.ethz.ch @spcl_eth ATOMICS: POPULARITY [PPoPP’14] Fast Concurrent Lock-Free Binary Search Trees [PPoPP’14] A General

spcl.inf.ethz.ch

@spcl_eth

LARGE-SCALE IRREGULAR GRAPH PROCESSING

Page 3: @spcl eth Evaluating the Cost of Atomic Operations on ...spcl.inf.ethz.ch @spcl_eth ATOMICS: POPULARITY [PPoPP’14] Fast Concurrent Lock-Free Binary Search Trees [PPoPP’14] A General

spcl.inf.ethz.ch

@spcl_eth

LARGE-SCALE IRREGULAR GRAPH PROCESSING

Page 4: @spcl eth Evaluating the Cost of Atomic Operations on ...spcl.inf.ethz.ch @spcl_eth ATOMICS: POPULARITY [PPoPP’14] Fast Concurrent Lock-Free Binary Search Trees [PPoPP’14] A General

spcl.inf.ethz.ch

@spcl_eth

LARGE-SCALE IRREGULAR GRAPH PROCESSING

Page 5: @spcl eth Evaluating the Cost of Atomic Operations on ...spcl.inf.ethz.ch @spcl_eth ATOMICS: POPULARITY [PPoPP’14] Fast Concurrent Lock-Free Binary Search Trees [PPoPP’14] A General

spcl.inf.ethz.ch

@spcl_eth

LARGE-SCALE IRREGULAR GRAPH PROCESSING

Page 6: @spcl eth Evaluating the Cost of Atomic Operations on ...spcl.inf.ethz.ch @spcl_eth ATOMICS: POPULARITY [PPoPP’14] Fast Concurrent Lock-Free Binary Search Trees [PPoPP’14] A General

spcl.inf.ethz.ch

@spcl_eth

LARGE-SCALE IRREGULAR GRAPH PROCESSING

[1] A. Lumsdaine et al. Challenges in Parallel Graph Processing. Parallel Processing Letters. 2007.

Page 7: @spcl eth Evaluating the Cost of Atomic Operations on ...spcl.inf.ethz.ch @spcl_eth ATOMICS: POPULARITY [PPoPP’14] Fast Concurrent Lock-Free Binary Search Trees [PPoPP’14] A General

spcl.inf.ethz.ch

@spcl_eth

A BRIEF SUMMARY OF RESEARCH I DO IN HPC

Page 8: @spcl eth Evaluating the Cost of Atomic Operations on ...spcl.inf.ethz.ch @spcl_eth ATOMICS: POPULARITY [PPoPP’14] Fast Concurrent Lock-Free Binary Search Trees [PPoPP’14] A General

spcl.inf.ethz.ch

@spcl_eth

Com

puting a

bstr

actions

A BRIEF SUMMARY OF RESEARCH I DO IN HPC

Page 9: @spcl eth Evaluating the Cost of Atomic Operations on ...spcl.inf.ethz.ch @spcl_eth ATOMICS: POPULARITY [PPoPP’14] Fast Concurrent Lock-Free Binary Search Trees [PPoPP’14] A General

spcl.inf.ethz.ch

@spcl_eth

Com

puting a

bstr

actions

Hardware

PACT’15

HPDC’15

HPDC’14

HPDC’16

A BRIEF SUMMARY OF RESEARCH I DO IN HPC

Page 10: @spcl eth Evaluating the Cost of Atomic Operations on ...spcl.inf.ethz.ch @spcl_eth ATOMICS: POPULARITY [PPoPP’14] Fast Concurrent Lock-Free Binary Search Trees [PPoPP’14] A General

spcl.inf.ethz.ch

@spcl_eth

Com

puting a

bstr

actions

Hardware

Topologies

PACT’15

SC14

HPDC’15

HPDC’14

HPDC’16

A BRIEF SUMMARY OF RESEARCH I DO IN HPC

Page 11: @spcl eth Evaluating the Cost of Atomic Operations on ...spcl.inf.ethz.ch @spcl_eth ATOMICS: POPULARITY [PPoPP’14] Fast Concurrent Lock-Free Binary Search Trees [PPoPP’14] A General

spcl.inf.ethz.ch

@spcl_eth

Com

puting a

bstr

actions

Hardware

Topologies

OS, middleware

PACT’15

SC14

ICS’15 HPDC’15

HPDC’14

HPDC’16

A BRIEF SUMMARY OF RESEARCH I DO IN HPC

Page 12: @spcl eth Evaluating the Cost of Atomic Operations on ...spcl.inf.ethz.ch @spcl_eth ATOMICS: POPULARITY [PPoPP’14] Fast Concurrent Lock-Free Binary Search Trees [PPoPP’14] A General

spcl.inf.ethz.ch

@spcl_eth

Com

puting a

bstr

actions

Hardware

Topologies

OS, middleware

Algorithms

PACT’15

SC14

ICS’15 HPDC’15

HPDC’14

SC13

HPDC’16

A BRIEF SUMMARY OF RESEARCH I DO IN HPC

Page 13: @spcl eth Evaluating the Cost of Atomic Operations on ...spcl.inf.ethz.ch @spcl_eth ATOMICS: POPULARITY [PPoPP’14] Fast Concurrent Lock-Free Binary Search Trees [PPoPP’14] A General

spcl.inf.ethz.ch

@spcl_eth

Com

puting a

bstr

actions

Hardware

Topologies

OS, middleware

Algorithms

Programming models

PACT’15

SC14

ICS’15 HPDC’15

HPDC’14

SC13

HPDC’16

A BRIEF SUMMARY OF RESEARCH I DO IN HPC

Page 14: @spcl eth Evaluating the Cost of Atomic Operations on ...spcl.inf.ethz.ch @spcl_eth ATOMICS: POPULARITY [PPoPP’14] Fast Concurrent Lock-Free Binary Search Trees [PPoPP’14] A General

spcl.inf.ethz.ch

@spcl_eth

Com

puting a

bstr

actions

Hardware

Topologies

OS, middleware

Algorithms

Programming models

PACT’15

SC14

ICS’15 HPDC’15

HPDC’14

SC13

HPDC’16

A BRIEF SUMMARY OF RESEARCH I DO IN HPC

Page 15: @spcl eth Evaluating the Cost of Atomic Operations on ...spcl.inf.ethz.ch @spcl_eth ATOMICS: POPULARITY [PPoPP’14] Fast Concurrent Lock-Free Binary Search Trees [PPoPP’14] A General

spcl.inf.ethz.ch

@spcl_eth

Com

puting a

bstr

actions

Hardware

Topologies

OS, middleware

Algorithms

Programming models

PACT’15

SC14

ICS’15 HPDC’15

HPDC’14

SC13

HPDC’16

A BRIEF SUMMARY OF RESEARCH I DO IN HPC

Most of these layers require

efficient synchronization

Page 16: @spcl eth Evaluating the Cost of Atomic Operations on ...spcl.inf.ethz.ch @spcl_eth ATOMICS: POPULARITY [PPoPP’14] Fast Concurrent Lock-Free Binary Search Trees [PPoPP’14] A General

spcl.inf.ethz.ch

@spcl_eth

SYNCHRONIZATION MECHANISMS

LOCKS

Page 17: @spcl eth Evaluating the Cost of Atomic Operations on ...spcl.inf.ethz.ch @spcl_eth ATOMICS: POPULARITY [PPoPP’14] Fast Concurrent Lock-Free Binary Search Trees [PPoPP’14] A General

spcl.inf.ethz.ch

@spcl_eth

Proc qProc p

SYNCHRONIZATION MECHANISMS

LOCKS

Page 18: @spcl eth Evaluating the Cost of Atomic Operations on ...spcl.inf.ethz.ch @spcl_eth ATOMICS: POPULARITY [PPoPP’14] Fast Concurrent Lock-Free Binary Search Trees [PPoPP’14] A General

spcl.inf.ethz.ch

@spcl_eth

Proc qProc p

SYNCHRONIZATION MECHANISMS

LOCKS

An example

graph

Page 19: @spcl eth Evaluating the Cost of Atomic Operations on ...spcl.inf.ethz.ch @spcl_eth ATOMICS: POPULARITY [PPoPP’14] Fast Concurrent Lock-Free Binary Search Trees [PPoPP’14] A General

spcl.inf.ethz.ch

@spcl_eth

Proc qProc p

SYNCHRONIZATION MECHANISMS

LOCKS

An example

graph

Page 20: @spcl eth Evaluating the Cost of Atomic Operations on ...spcl.inf.ethz.ch @spcl_eth ATOMICS: POPULARITY [PPoPP’14] Fast Concurrent Lock-Free Binary Search Trees [PPoPP’14] A General

spcl.inf.ethz.ch

@spcl_eth

Proc qProc p

SYNCHRONIZATION MECHANISMS

LOCKS

An example

graph

Page 21: @spcl eth Evaluating the Cost of Atomic Operations on ...spcl.inf.ethz.ch @spcl_eth ATOMICS: POPULARITY [PPoPP’14] Fast Concurrent Lock-Free Binary Search Trees [PPoPP’14] A General

spcl.inf.ethz.ch

@spcl_eth

Proc qProc p

SYNCHRONIZATION MECHANISMS

LOCKS

An example

graph

Page 22: @spcl eth Evaluating the Cost of Atomic Operations on ...spcl.inf.ethz.ch @spcl_eth ATOMICS: POPULARITY [PPoPP’14] Fast Concurrent Lock-Free Binary Search Trees [PPoPP’14] A General

spcl.inf.ethz.ch

@spcl_eth

Proc qProc p

SYNCHRONIZATION MECHANISMS

LOCKS

An example

graph

Page 23: @spcl eth Evaluating the Cost of Atomic Operations on ...spcl.inf.ethz.ch @spcl_eth ATOMICS: POPULARITY [PPoPP’14] Fast Concurrent Lock-Free Binary Search Trees [PPoPP’14] A General

spcl.inf.ethz.ch

@spcl_eth

Proc qProc p

Intuitive

semantics

SYNCHRONIZATION MECHANISMS

LOCKS

An example

graph

Page 24: @spcl eth Evaluating the Cost of Atomic Operations on ...spcl.inf.ethz.ch @spcl_eth ATOMICS: POPULARITY [PPoPP’14] Fast Concurrent Lock-Free Binary Search Trees [PPoPP’14] A General

spcl.inf.ethz.ch

@spcl_eth

Proc qProc p

Intuitive

semantics

SYNCHRONIZATION MECHANISMS

LOCKS

An example

graph

Page 25: @spcl eth Evaluating the Cost of Atomic Operations on ...spcl.inf.ethz.ch @spcl_eth ATOMICS: POPULARITY [PPoPP’14] Fast Concurrent Lock-Free Binary Search Trees [PPoPP’14] A General

spcl.inf.ethz.ch

@spcl_eth

Proc qProc p

Intuitive

semantics

SYNCHRONIZATION MECHANISMS

LOCKS

Serialization

An example

graph

Page 26: @spcl eth Evaluating the Cost of Atomic Operations on ...spcl.inf.ethz.ch @spcl_eth ATOMICS: POPULARITY [PPoPP’14] Fast Concurrent Lock-Free Binary Search Trees [PPoPP’14] A General

spcl.inf.ethz.ch

@spcl_eth

Proc qProc p

Possibly complex

protocols

Intuitive

semantics

SYNCHRONIZATION MECHANISMS

LOCKS

Serialization

An example

graph

Page 27: @spcl eth Evaluating the Cost of Atomic Operations on ...spcl.inf.ethz.ch @spcl_eth ATOMICS: POPULARITY [PPoPP’14] Fast Concurrent Lock-Free Binary Search Trees [PPoPP’14] A General

spcl.inf.ethz.ch

@spcl_eth

Proc qProc p

Possibly complex

protocols

Intuitive

semantics

SYNCHRONIZATION MECHANISMS

LOCKS

Serialization

An example

graph

High performance

distributed locks?

Page 28: @spcl eth Evaluating the Cost of Atomic Operations on ...spcl.inf.ethz.ch @spcl_eth ATOMICS: POPULARITY [PPoPP’14] Fast Concurrent Lock-Free Binary Search Trees [PPoPP’14] A General

spcl.inf.ethz.ch

@spcl_eth

Proc qProc p

SYNCHRONIZATION MECHANISMS

ATOMIC OPERATIONS

Page 29: @spcl eth Evaluating the Cost of Atomic Operations on ...spcl.inf.ethz.ch @spcl_eth ATOMICS: POPULARITY [PPoPP’14] Fast Concurrent Lock-Free Binary Search Trees [PPoPP’14] A General

spcl.inf.ethz.ch

@spcl_eth

Proc qProc p

SYNCHRONIZATION MECHANISMS

ATOMIC OPERATIONS

Complex access

patterns

Page 30: @spcl eth Evaluating the Cost of Atomic Operations on ...spcl.inf.ethz.ch @spcl_eth ATOMICS: POPULARITY [PPoPP’14] Fast Concurrent Lock-Free Binary Search Trees [PPoPP’14] A General

spcl.inf.ethz.ch

@spcl_eth

Proc qProc p

High

performance

SYNCHRONIZATION MECHANISMS

ATOMIC OPERATIONS

Complex access

patterns

Page 31: @spcl eth Evaluating the Cost of Atomic Operations on ...spcl.inf.ethz.ch @spcl_eth ATOMICS: POPULARITY [PPoPP’14] Fast Concurrent Lock-Free Binary Search Trees [PPoPP’14] A General

spcl.inf.ethz.ch

@spcl_eth

Proc qProc p

High

performance

SYNCHRONIZATION MECHANISMS

ATOMIC OPERATIONS

Complex access

patterns

Very common, truly

hardware mechanizm

Page 32: @spcl eth Evaluating the Cost of Atomic Operations on ...spcl.inf.ethz.ch @spcl_eth ATOMICS: POPULARITY [PPoPP’14] Fast Concurrent Lock-Free Binary Search Trees [PPoPP’14] A General

spcl.inf.ethz.ch

@spcl_eth

Proc qProc p

High

performance

SYNCHRONIZATION MECHANISMS

ATOMIC OPERATIONS

Complex access

patterns

Very common, truly

hardware mechanizm

Page 33: @spcl eth Evaluating the Cost of Atomic Operations on ...spcl.inf.ethz.ch @spcl_eth ATOMICS: POPULARITY [PPoPP’14] Fast Concurrent Lock-Free Binary Search Trees [PPoPP’14] A General

spcl.inf.ethz.ch

@spcl_eth

Proc qProc p

Complex

protocols

High

performance

SYNCHRONIZATION MECHANISMS

ATOMIC OPERATIONS

Complex access

patterns

Very common, truly

hardware mechanizm

Page 34: @spcl eth Evaluating the Cost of Atomic Operations on ...spcl.inf.ethz.ch @spcl_eth ATOMICS: POPULARITY [PPoPP’14] Fast Concurrent Lock-Free Binary Search Trees [PPoPP’14] A General

spcl.inf.ethz.ch

@spcl_eth

Proc qProc p

Complex

protocols

High

performance

SYNCHRONIZATION MECHANISMS

ATOMIC OPERATIONS

Subtle

issues (ABA

problem, ...)Complex access

patterns

Very common, truly

hardware mechanizm

Page 35: @spcl eth Evaluating the Cost of Atomic Operations on ...spcl.inf.ethz.ch @spcl_eth ATOMICS: POPULARITY [PPoPP’14] Fast Concurrent Lock-Free Binary Search Trees [PPoPP’14] A General

spcl.inf.ethz.ch

@spcl_eth

Proc qProc p

Complex

protocols

High

performance

SYNCHRONIZATION MECHANISMS

ATOMIC OPERATIONS

Subtle

issues (ABA

problem, ...)Complex access

patterns

Do we really

understand their

performance?

Very common, truly

hardware mechanizm

Page 36: @spcl eth Evaluating the Cost of Atomic Operations on ...spcl.inf.ethz.ch @spcl_eth ATOMICS: POPULARITY [PPoPP’14] Fast Concurrent Lock-Free Binary Search Trees [PPoPP’14] A General

spcl.inf.ethz.ch

@spcl_eth

ATOMICS: POPULARITY

Page 37: @spcl eth Evaluating the Cost of Atomic Operations on ...spcl.inf.ethz.ch @spcl_eth ATOMICS: POPULARITY [PPoPP’14] Fast Concurrent Lock-Free Binary Search Trees [PPoPP’14] A General

spcl.inf.ethz.ch

@spcl_eth

ATOMICS: POPULARITY

Page 38: @spcl eth Evaluating the Cost of Atomic Operations on ...spcl.inf.ethz.ch @spcl_eth ATOMICS: POPULARITY [PPoPP’14] Fast Concurrent Lock-Free Binary Search Trees [PPoPP’14] A General

spcl.inf.ethz.ch

@spcl_eth

ATOMICS: POPULARITY

[PPoPP’14] Fast

Concurrent Lock-Free

Binary Search Trees

[PPoPP’14] A General

Technique for Non-

blocking Trees

[PPoPP’14] Practical

Concurrent Binary Search

Trees via Logical Ordering

[PPoPP’14] A Practical

Wait-Free Simulation for

Lock-Free Data Structures

[PPoPP’15]

[PPoPP’15]

[SPAA’15]

[SPAA’16]

Page 39: @spcl eth Evaluating the Cost of Atomic Operations on ...spcl.inf.ethz.ch @spcl_eth ATOMICS: POPULARITY [PPoPP’14] Fast Concurrent Lock-Free Binary Search Trees [PPoPP’14] A General

spcl.inf.ethz.ch

@spcl_eth

ATOMICS: POPULARITY

[PPoPP’14] Fast

Concurrent Lock-Free

Binary Search Trees

[PPoPP’14] A General

Technique for Non-

blocking Trees

[PPoPP’14] Practical

Concurrent Binary Search

Trees via Logical Ordering

[PPoPP’14] A Practical

Wait-Free Simulation for

Lock-Free Data Structures

[PPoPP’15]

[PPoPP’15]

[SPAA’15]

[SPAA’16]Used in so many designs...

But do we really know their

raw performance?

Page 40: @spcl eth Evaluating the Cost of Atomic Operations on ...spcl.inf.ethz.ch @spcl_eth ATOMICS: POPULARITY [PPoPP’14] Fast Concurrent Lock-Free Binary Search Trees [PPoPP’14] A General

spcl.inf.ethz.ch

@spcl_eth

ATOMICS: POPULARITY

[PPoPP’14] Fast

Concurrent Lock-Free

Binary Search Trees

[PPoPP’14] A General

Technique for Non-

blocking Trees

[PPoPP’14] Practical

Concurrent Binary Search

Trees via Logical Ordering

[PPoPP’14] A Practical

Wait-Free Simulation for

Lock-Free Data Structures

[PPoPP’15]

[PPoPP’15]

[SPAA’15]

[SPAA’16]Used in so many designs...

But do we really know their

raw performance?

Raw performance... Of what?

Page 41: @spcl eth Evaluating the Cost of Atomic Operations on ...spcl.inf.ethz.ch @spcl_eth ATOMICS: POPULARITY [PPoPP’14] Fast Concurrent Lock-Free Binary Search Trees [PPoPP’14] A General

spcl.inf.ethz.ch

@spcl_eth

ATOMICS: PERFORMANCE DIMENSIONS

Page 42: @spcl eth Evaluating the Cost of Atomic Operations on ...spcl.inf.ethz.ch @spcl_eth ATOMICS: POPULARITY [PPoPP’14] Fast Concurrent Lock-Free Binary Search Trees [PPoPP’14] A General

spcl.inf.ethz.ch

@spcl_eth

ATOMICS: PERFORMANCE DIMENSIONS

Page 43: @spcl eth Evaluating the Cost of Atomic Operations on ...spcl.inf.ethz.ch @spcl_eth ATOMICS: POPULARITY [PPoPP’14] Fast Concurrent Lock-Free Binary Search Trees [PPoPP’14] A General

spcl.inf.ethz.ch

@spcl_eth

ATOMICS: PERFORMANCE DIMENSIONS

Cache level?

Page 44: @spcl eth Evaluating the Cost of Atomic Operations on ...spcl.inf.ethz.ch @spcl_eth ATOMICS: POPULARITY [PPoPP’14] Fast Concurrent Lock-Free Binary Search Trees [PPoPP’14] A General

spcl.inf.ethz.ch

@spcl_eth

ATOMICS: PERFORMANCE DIMENSIONS

Cache level?

Locality?

Page 45: @spcl eth Evaluating the Cost of Atomic Operations on ...spcl.inf.ethz.ch @spcl_eth ATOMICS: POPULARITY [PPoPP’14] Fast Concurrent Lock-Free Binary Search Trees [PPoPP’14] A General

spcl.inf.ethz.ch

@spcl_eth

ATOMICS: PERFORMANCE DIMENSIONS

Page 46: @spcl eth Evaluating the Cost of Atomic Operations on ...spcl.inf.ethz.ch @spcl_eth ATOMICS: POPULARITY [PPoPP’14] Fast Concurrent Lock-Free Binary Search Trees [PPoPP’14] A General

spcl.inf.ethz.ch

@spcl_eth

ATOMICS: PERFORMANCE DIMENSIONS

Atomic?

Page 47: @spcl eth Evaluating the Cost of Atomic Operations on ...spcl.inf.ethz.ch @spcl_eth ATOMICS: POPULARITY [PPoPP’14] Fast Concurrent Lock-Free Binary Search Trees [PPoPP’14] A General

spcl.inf.ethz.ch

@spcl_eth

ATOMICS: PERFORMANCE DIMENSIONS

Atomic?

Performance

metrics?

Page 48: @spcl eth Evaluating the Cost of Atomic Operations on ...spcl.inf.ethz.ch @spcl_eth ATOMICS: POPULARITY [PPoPP’14] Fast Concurrent Lock-Free Binary Search Trees [PPoPP’14] A General

spcl.inf.ethz.ch

@spcl_eth

ATOMICS: PERFORMANCE DIMENSIONS

Atomic?

Performance

metrics?

Architecture

Page 49: @spcl eth Evaluating the Cost of Atomic Operations on ...spcl.inf.ethz.ch @spcl_eth ATOMICS: POPULARITY [PPoPP’14] Fast Concurrent Lock-Free Binary Search Trees [PPoPP’14] A General

spcl.inf.ethz.ch

@spcl_eth

ATOMICS: PERFORMANCE DIMENSIONS

Atomic?

Contention?

Performance

metrics?

Architecture

Page 50: @spcl eth Evaluating the Cost of Atomic Operations on ...spcl.inf.ethz.ch @spcl_eth ATOMICS: POPULARITY [PPoPP’14] Fast Concurrent Lock-Free Binary Search Trees [PPoPP’14] A General

spcl.inf.ethz.ch

@spcl_eth

ATOMICS: PERFORMANCE DIMENSIONS

Cache

coherence

state?

Atomic?

Contention?

Performance

metrics?

Architecture

Page 51: @spcl eth Evaluating the Cost of Atomic Operations on ...spcl.inf.ethz.ch @spcl_eth ATOMICS: POPULARITY [PPoPP’14] Fast Concurrent Lock-Free Binary Search Trees [PPoPP’14] A General

spcl.inf.ethz.ch

@spcl_eth

ATOMICS: PERFORMANCE DIMENSIONS

Cache

coherence

state?

Atomic?

Contention?

Operand

size?

Performance

metrics?

Architecture

Page 52: @spcl eth Evaluating the Cost of Atomic Operations on ...spcl.inf.ethz.ch @spcl_eth ATOMICS: POPULARITY [PPoPP’14] Fast Concurrent Lock-Free Binary Search Trees [PPoPP’14] A General

spcl.inf.ethz.ch

@spcl_eth

ATOMICS: PERFORMANCE DIMENSIONS

Cache

coherence

state?

Atomic?

Contention?

Operand

size?

Performance

metrics?

Alignment?

Architecture

Page 53: @spcl eth Evaluating the Cost of Atomic Operations on ...spcl.inf.ethz.ch @spcl_eth ATOMICS: POPULARITY [PPoPP’14] Fast Concurrent Lock-Free Binary Search Trees [PPoPP’14] A General

spcl.inf.ethz.ch

@spcl_eth

ATOMICS: PERFORMANCE DIMENSIONS

Cache

coherence

state?

Atomic?

Contention?

Operand

size?

Performance

metrics?

Alignment?

Mechanisms?

Architecture

Page 54: @spcl eth Evaluating the Cost of Atomic Operations on ...spcl.inf.ethz.ch @spcl_eth ATOMICS: POPULARITY [PPoPP’14] Fast Concurrent Lock-Free Binary Search Trees [PPoPP’14] A General

spcl.inf.ethz.ch

@spcl_eth

ATOMICS: PERFORMANCE DIMENSIONS

Atomic?

Page 55: @spcl eth Evaluating the Cost of Atomic Operations on ...spcl.inf.ethz.ch @spcl_eth ATOMICS: POPULARITY [PPoPP’14] Fast Concurrent Lock-Free Binary Search Trees [PPoPP’14] A General

spcl.inf.ethz.ch

@spcl_eth

ATOMICS: PERFORMANCE DIMENSIONS

Atomic?

Compare-and-

Swap (CAS)

Page 56: @spcl eth Evaluating the Cost of Atomic Operations on ...spcl.inf.ethz.ch @spcl_eth ATOMICS: POPULARITY [PPoPP’14] Fast Concurrent Lock-Free Binary Search Trees [PPoPP’14] A General

spcl.inf.ethz.ch

@spcl_eth

ATOMICS: PERFORMANCE DIMENSIONS

Atomic?

Compare-and-

Swap (CAS)

Fetch-and-Add

(FAA)

Page 57: @spcl eth Evaluating the Cost of Atomic Operations on ...spcl.inf.ethz.ch @spcl_eth ATOMICS: POPULARITY [PPoPP’14] Fast Concurrent Lock-Free Binary Search Trees [PPoPP’14] A General

spcl.inf.ethz.ch

@spcl_eth

ATOMICS: PERFORMANCE DIMENSIONS

Atomic?

Compare-and-

Swap (CAS)

Fetch-and-Add

(FAA)

Swap (SWP)

Page 58: @spcl eth Evaluating the Cost of Atomic Operations on ...spcl.inf.ethz.ch @spcl_eth ATOMICS: POPULARITY [PPoPP’14] Fast Concurrent Lock-Free Binary Search Trees [PPoPP’14] A General

spcl.inf.ethz.ch

@spcl_eth

ATOMICS: PERFORMANCE DIMENSIONS

Performance

metrics?

Page 59: @spcl eth Evaluating the Cost of Atomic Operations on ...spcl.inf.ethz.ch @spcl_eth ATOMICS: POPULARITY [PPoPP’14] Fast Concurrent Lock-Free Binary Search Trees [PPoPP’14] A General

spcl.inf.ethz.ch

@spcl_eth

ATOMICS: PERFORMANCE DIMENSIONS

Performance

metrics?

Latency

Page 60: @spcl eth Evaluating the Cost of Atomic Operations on ...spcl.inf.ethz.ch @spcl_eth ATOMICS: POPULARITY [PPoPP’14] Fast Concurrent Lock-Free Binary Search Trees [PPoPP’14] A General

spcl.inf.ethz.ch

@spcl_eth

ATOMICS: PERFORMANCE DIMENSIONS

Performance

metrics?

Bandwidth

Latency

Page 61: @spcl eth Evaluating the Cost of Atomic Operations on ...spcl.inf.ethz.ch @spcl_eth ATOMICS: POPULARITY [PPoPP’14] Fast Concurrent Lock-Free Binary Search Trees [PPoPP’14] A General

spcl.inf.ethz.ch

@spcl_eth

ATOMICS: PERFORMANCE DIMENSIONS

Cache

coherence

state?

Page 62: @spcl eth Evaluating the Cost of Atomic Operations on ...spcl.inf.ethz.ch @spcl_eth ATOMICS: POPULARITY [PPoPP’14] Fast Concurrent Lock-Free Binary Search Trees [PPoPP’14] A General

spcl.inf.ethz.ch

@spcl_eth

ATOMICS: PERFORMANCE DIMENSIONS

Cache

coherence

state?

Modified: in one

cache and dirty

Page 63: @spcl eth Evaluating the Cost of Atomic Operations on ...spcl.inf.ethz.ch @spcl_eth ATOMICS: POPULARITY [PPoPP’14] Fast Concurrent Lock-Free Binary Search Trees [PPoPP’14] A General

spcl.inf.ethz.ch

@spcl_eth

ATOMICS: PERFORMANCE DIMENSIONS

Cache

coherence

state?

Modified: in one

cache and dirty

Exclusive: in one

cache and clean

Page 64: @spcl eth Evaluating the Cost of Atomic Operations on ...spcl.inf.ethz.ch @spcl_eth ATOMICS: POPULARITY [PPoPP’14] Fast Concurrent Lock-Free Binary Search Trees [PPoPP’14] A General

spcl.inf.ethz.ch

@spcl_eth

ATOMICS: PERFORMANCE DIMENSIONS

Cache

coherence

state?

Modified: in one

cache and dirty

Exclusive: in one

cache and clean

Shared: in >1

cache and clean

Page 65: @spcl eth Evaluating the Cost of Atomic Operations on ...spcl.inf.ethz.ch @spcl_eth ATOMICS: POPULARITY [PPoPP’14] Fast Concurrent Lock-Free Binary Search Trees [PPoPP’14] A General

spcl.inf.ethz.ch

@spcl_eth

ATOMICS: PERFORMANCE DIMENSIONS

Cache

coherence

state?

Modified: in one

cache and dirty

Exclusive: in one

cache and clean

Shared: in >1

cache and clean

Invalid: garbage

data

Page 66: @spcl eth Evaluating the Cost of Atomic Operations on ...spcl.inf.ethz.ch @spcl_eth ATOMICS: POPULARITY [PPoPP’14] Fast Concurrent Lock-Free Binary Search Trees [PPoPP’14] A General

spcl.inf.ethz.ch

@spcl_eth

ATOMICS: PERFORMANCE DIMENSIONS

Architecture

Page 67: @spcl eth Evaluating the Cost of Atomic Operations on ...spcl.inf.ethz.ch @spcl_eth ATOMICS: POPULARITY [PPoPP’14] Fast Concurrent Lock-Free Binary Search Trees [PPoPP’14] A General

spcl.inf.ethz.ch

@spcl_eth

ATOMICS: PERFORMANCE DIMENSIONS

Architecture

Page 68: @spcl eth Evaluating the Cost of Atomic Operations on ...spcl.inf.ethz.ch @spcl_eth ATOMICS: POPULARITY [PPoPP’14] Fast Concurrent Lock-Free Binary Search Trees [PPoPP’14] A General

spcl.inf.ethz.ch

@spcl_eth

ATOMICS: PERFORMANCE DIMENSIONS

Architecture

Page 69: @spcl eth Evaluating the Cost of Atomic Operations on ...spcl.inf.ethz.ch @spcl_eth ATOMICS: POPULARITY [PPoPP’14] Fast Concurrent Lock-Free Binary Search Trees [PPoPP’14] A General

spcl.inf.ethz.ch

@spcl_eth

ATOMICS: PERFORMANCE DIMENSIONS

Architecture

Page 70: @spcl eth Evaluating the Cost of Atomic Operations on ...spcl.inf.ethz.ch @spcl_eth ATOMICS: POPULARITY [PPoPP’14] Fast Concurrent Lock-Free Binary Search Trees [PPoPP’14] A General

spcl.inf.ethz.ch

@spcl_eth

ATOMICS: PERFORMANCE DIMENSIONS

Architecture

Page 71: @spcl eth Evaluating the Cost of Atomic Operations on ...spcl.inf.ethz.ch @spcl_eth ATOMICS: POPULARITY [PPoPP’14] Fast Concurrent Lock-Free Binary Search Trees [PPoPP’14] A General

spcl.inf.ethz.ch

@spcl_eth

RESEARCH QUESTIONS

Page 72: @spcl eth Evaluating the Cost of Atomic Operations on ...spcl.inf.ethz.ch @spcl_eth ATOMICS: POPULARITY [PPoPP’14] Fast Concurrent Lock-Free Binary Search Trees [PPoPP’14] A General

spcl.inf.ethz.ch

@spcl_eth

How do we model the

performance of

atomics?

RESEARCH QUESTIONS

Page 73: @spcl eth Evaluating the Cost of Atomic Operations on ...spcl.inf.ethz.ch @spcl_eth ATOMICS: POPULARITY [PPoPP’14] Fast Concurrent Lock-Free Binary Search Trees [PPoPP’14] A General

spcl.inf.ethz.ch

@spcl_eth

What is the

performance

difference between

various atomics?

How do we model the

performance of

atomics?

RESEARCH QUESTIONS

Page 74: @spcl eth Evaluating the Cost of Atomic Operations on ...spcl.inf.ethz.ch @spcl_eth ATOMICS: POPULARITY [PPoPP’14] Fast Concurrent Lock-Free Binary Search Trees [PPoPP’14] A General

spcl.inf.ethz.ch

@spcl_eth

What is the

performance

difference between

various atomics?

What is the influence

of various parameters

and mechanisms?

How do we model the

performance of

atomics?

RESEARCH QUESTIONS

Page 75: @spcl eth Evaluating the Cost of Atomic Operations on ...spcl.inf.ethz.ch @spcl_eth ATOMICS: POPULARITY [PPoPP’14] Fast Concurrent Lock-Free Binary Search Trees [PPoPP’14] A General

spcl.inf.ethz.ch

@spcl_eth

Core

Cache Cache Cache

…Cache line

LATENCY MODEL

Page 76: @spcl eth Evaluating the Cost of Atomic Operations on ...spcl.inf.ethz.ch @spcl_eth ATOMICS: POPULARITY [PPoPP’14] Fast Concurrent Lock-Free Binary Search Trees [PPoPP’14] A General

spcl.inf.ethz.ch

@spcl_eth

Core

Cache Cache Cache

…Cache line

LATENCY MODEL

Page 77: @spcl eth Evaluating the Cost of Atomic Operations on ...spcl.inf.ethz.ch @spcl_eth ATOMICS: POPULARITY [PPoPP’14] Fast Concurrent Lock-Free Binary Search Trees [PPoPP’14] A General

spcl.inf.ethz.ch

@spcl_eth

Core

Cache Cache Cache

…Cache line

Read for ownership

LATENCY MODEL

Page 78: @spcl eth Evaluating the Cost of Atomic Operations on ...spcl.inf.ethz.ch @spcl_eth ATOMICS: POPULARITY [PPoPP’14] Fast Concurrent Lock-Free Binary Search Trees [PPoPP’14] A General

spcl.inf.ethz.ch

@spcl_eth

Core

Cache Cache Cache

…Cache line

Read for ownership

LATENCY MODEL

Cache

coherence state

Page 79: @spcl eth Evaluating the Cost of Atomic Operations on ...spcl.inf.ethz.ch @spcl_eth ATOMICS: POPULARITY [PPoPP’14] Fast Concurrent Lock-Free Binary Search Trees [PPoPP’14] A General

spcl.inf.ethz.ch

@spcl_eth

Core

Cache Cache Cache

…Cache line

Read for ownership

= max(read latency, invalidation latency)

LATENCY MODEL

Cache

coherence state

Page 80: @spcl eth Evaluating the Cost of Atomic Operations on ...spcl.inf.ethz.ch @spcl_eth ATOMICS: POPULARITY [PPoPP’14] Fast Concurrent Lock-Free Binary Search Trees [PPoPP’14] A General

spcl.inf.ethz.ch

@spcl_eth

Core

Cache Cache Cache

…Cache line

Cache line Read for ownership

= max(read latency, invalidation latency)

LATENCY MODEL

Cache

coherence state

Page 81: @spcl eth Evaluating the Cost of Atomic Operations on ...spcl.inf.ethz.ch @spcl_eth ATOMICS: POPULARITY [PPoPP’14] Fast Concurrent Lock-Free Binary Search Trees [PPoPP’14] A General

spcl.inf.ethz.ch

@spcl_eth

Core

Cache Cache Cache

…Cache line

Cache line Read for ownership

= max(read latency, invalidation latency)

LATENCY MODEL

Cache

coherence state

Page 82: @spcl eth Evaluating the Cost of Atomic Operations on ...spcl.inf.ethz.ch @spcl_eth ATOMICS: POPULARITY [PPoPP’14] Fast Concurrent Lock-Free Binary Search Trees [PPoPP’14] A General

spcl.inf.ethz.ch

@spcl_eth

Core

Cache Cache Cache

…Cache line

Cache line Read for ownership

Execute

= max(read latency, invalidation latency)

= constant

LATENCY MODEL

Cache

coherence state

Page 83: @spcl eth Evaluating the Cost of Atomic Operations on ...spcl.inf.ethz.ch @spcl_eth ATOMICS: POPULARITY [PPoPP’14] Fast Concurrent Lock-Free Binary Search Trees [PPoPP’14] A General

spcl.inf.ethz.ch

@spcl_eth

Core

Cache Cache Cache

…Cache line

Cache line Read for ownership

Execute

= max(read latency, invalidation latency)

= constant

LATENCY MODEL

Cache

coherence state

Atomic

Page 84: @spcl eth Evaluating the Cost of Atomic Operations on ...spcl.inf.ethz.ch @spcl_eth ATOMICS: POPULARITY [PPoPP’14] Fast Concurrent Lock-Free Binary Search Trees [PPoPP’14] A General

spcl.inf.ethz.ch

@spcl_eth

Core

Cache Cache Cache

…Cache line

Cache line Read for ownership

Execute

= max(read latency, invalidation latency)

= constant

LATENCY MODEL

Cache

coherence state

Atomic

Page 85: @spcl eth Evaluating the Cost of Atomic Operations on ...spcl.inf.ethz.ch @spcl_eth ATOMICS: POPULARITY [PPoPP’14] Fast Concurrent Lock-Free Binary Search Trees [PPoPP’14] A General

spcl.inf.ethz.ch

@spcl_eth

Core

Cache Cache Cache

…Cache line

Cache line Read for ownership

Execute

Atomic

= max(read latency, invalidation latency)

= constant

LATENCY MODEL

Cache

coherence state

Atomic

Page 86: @spcl eth Evaluating the Cost of Atomic Operations on ...spcl.inf.ethz.ch @spcl_eth ATOMICS: POPULARITY [PPoPP’14] Fast Concurrent Lock-Free Binary Search Trees [PPoPP’14] A General

spcl.inf.ethz.ch

@spcl_eth

Core

Cache Cache Cache

…Cache line

Cache line Read for ownership

Execute

AtomicCache

coherence state

= max(read latency, invalidation latency)

= constant

LATENCY MODEL

Cache

coherence state

Atomic

Page 87: @spcl eth Evaluating the Cost of Atomic Operations on ...spcl.inf.ethz.ch @spcl_eth ATOMICS: POPULARITY [PPoPP’14] Fast Concurrent Lock-Free Binary Search Trees [PPoPP’14] A General

spcl.inf.ethz.ch

@spcl_eth

Core

Cache Cache Cache

…Cache line

Cache line Read for ownership

Execute

AtomicCache

coherence state

= constant

LATENCY MODEL

EXCLUSIVE OR MODIFIED STATE

= max(read latency, invalidation latency)

Page 88: @spcl eth Evaluating the Cost of Atomic Operations on ...spcl.inf.ethz.ch @spcl_eth ATOMICS: POPULARITY [PPoPP’14] Fast Concurrent Lock-Free Binary Search Trees [PPoPP’14] A General

spcl.inf.ethz.ch

@spcl_eth

Core

Cache Cache Cache

…Cache line

Cache line Read for ownership

Execute

AtomicCache

coherence state

= read latency

= constant

LATENCY MODEL

EXCLUSIVE OR MODIFIED STATE

Page 89: @spcl eth Evaluating the Cost of Atomic Operations on ...spcl.inf.ethz.ch @spcl_eth ATOMICS: POPULARITY [PPoPP’14] Fast Concurrent Lock-Free Binary Search Trees [PPoPP’14] A General

spcl.inf.ethz.ch

@spcl_eth

Core

Cache Cache Cache

…Cache line

Cache line Read for ownership

Execute

AtomicCache

coherence state

= read latency

= constant

LATENCY MODEL

EXCLUSIVE OR MODIFIED STATE

predictions observed

data

mean of

observed data

Page 90: @spcl eth Evaluating the Cost of Atomic Operations on ...spcl.inf.ethz.ch @spcl_eth ATOMICS: POPULARITY [PPoPP’14] Fast Concurrent Lock-Free Binary Search Trees [PPoPP’14] A General

spcl.inf.ethz.ch

@spcl_eth

LATENCY

HASWELL, EXCLUSIVE

Page 91: @spcl eth Evaluating the Cost of Atomic Operations on ...spcl.inf.ethz.ch @spcl_eth ATOMICS: POPULARITY [PPoPP’14] Fast Concurrent Lock-Free Binary Search Trees [PPoPP’14] A General

spcl.inf.ethz.ch

@spcl_eth

CAS FAA

LATENCY

BULLDOZER, EXCLUSIVE

Page 92: @spcl eth Evaluating the Cost of Atomic Operations on ...spcl.inf.ethz.ch @spcl_eth ATOMICS: POPULARITY [PPoPP’14] Fast Concurrent Lock-Free Binary Search Trees [PPoPP’14] A General

spcl.inf.ethz.ch

@spcl_eth

LATENCY

HASWELL, EXCLUSIVE Alignment?

Page 93: @spcl eth Evaluating the Cost of Atomic Operations on ...spcl.inf.ethz.ch @spcl_eth ATOMICS: POPULARITY [PPoPP’14] Fast Concurrent Lock-Free Binary Search Trees [PPoPP’14] A General

spcl.inf.ethz.ch

@spcl_eth

64 bit 128 bit

LATENCY

BULLDOZER, EXCLUSIVE Operand

size?

Page 94: @spcl eth Evaluating the Cost of Atomic Operations on ...spcl.inf.ethz.ch @spcl_eth ATOMICS: POPULARITY [PPoPP’14] Fast Concurrent Lock-Free Binary Search Trees [PPoPP’14] A General

spcl.inf.ethz.ch

@spcl_eth

BANDWIDTH

HASWELL, ATOMICS

Page 95: @spcl eth Evaluating the Cost of Atomic Operations on ...spcl.inf.ethz.ch @spcl_eth ATOMICS: POPULARITY [PPoPP’14] Fast Concurrent Lock-Free Binary Search Trees [PPoPP’14] A General

spcl.inf.ethz.ch

@spcl_eth

CONCLUSIONS

PERFORMANCE INSIGHTS

Page 96: @spcl eth Evaluating the Cost of Atomic Operations on ...spcl.inf.ethz.ch @spcl_eth ATOMICS: POPULARITY [PPoPP’14] Fast Concurrent Lock-Free Binary Search Trees [PPoPP’14] A General

spcl.inf.ethz.ch

@spcl_eth

CONCLUSIONS

PERFORMANCE INSIGHTS

The same latency of

different atomics in most

scenarios

Page 97: @spcl eth Evaluating the Cost of Atomic Operations on ...spcl.inf.ethz.ch @spcl_eth ATOMICS: POPULARITY [PPoPP’14] Fast Concurrent Lock-Free Binary Search Trees [PPoPP’14] A General

spcl.inf.ethz.ch

@spcl_eth

CONCLUSIONS

PERFORMANCE INSIGHTS

The same latency of

different atomics in most

scenarios

CAS is the fastest for

some cases

Page 98: @spcl eth Evaluating the Cost of Atomic Operations on ...spcl.inf.ethz.ch @spcl_eth ATOMICS: POPULARITY [PPoPP’14] Fast Concurrent Lock-Free Binary Search Trees [PPoPP’14] A General

spcl.inf.ethz.ch

@spcl_eth

CONCLUSIONS

PERFORMANCE INSIGHTS

The same latency of

different atomics in most

scenarios

CAS is the fastest for

some cases

Unaligned atomics

should be avoided at all

costs

Page 99: @spcl eth Evaluating the Cost of Atomic Operations on ...spcl.inf.ethz.ch @spcl_eth ATOMICS: POPULARITY [PPoPP’14] Fast Concurrent Lock-Free Binary Search Trees [PPoPP’14] A General

spcl.inf.ethz.ch

@spcl_eth

CONCLUSIONS

PERFORMANCE INSIGHTS

The same latency of

different atomics in most

scenarios

CAS is the fastest for

some cases

Unaligned atomics

should be avoided at all

costs

No parallel execution

(low bandwidth) even if

there are no data deps

Page 100: @spcl eth Evaluating the Cost of Atomic Operations on ...spcl.inf.ethz.ch @spcl_eth ATOMICS: POPULARITY [PPoPP’14] Fast Concurrent Lock-Free Binary Search Trees [PPoPP’14] A General

spcl.inf.ethz.ch

@spcl_eth

CONCLUSIONS

PERFORMANCE INSIGHTS

The same latency of

different atomics in most

scenarios

CAS is the fastest for

some cases

Small operand sizes

give best performance

Unaligned atomics

should be avoided at all

costs

No parallel execution

(low bandwidth) even if

there are no data deps

Page 101: @spcl eth Evaluating the Cost of Atomic Operations on ...spcl.inf.ethz.ch @spcl_eth ATOMICS: POPULARITY [PPoPP’14] Fast Concurrent Lock-Free Binary Search Trees [PPoPP’14] A General

spcl.inf.ethz.ch

@spcl_eth

Atomics Read

LATENCY

HASWELL, EXCLUSIVE

Page 102: @spcl eth Evaluating the Cost of Atomic Operations on ...spcl.inf.ethz.ch @spcl_eth ATOMICS: POPULARITY [PPoPP’14] Fast Concurrent Lock-Free Binary Search Trees [PPoPP’14] A General

spcl.inf.ethz.ch

@spcl_eth

Atomics Read

LATENCY

HASWELL, MODIFIED

Page 103: @spcl eth Evaluating the Cost of Atomic Operations on ...spcl.inf.ethz.ch @spcl_eth ATOMICS: POPULARITY [PPoPP’14] Fast Concurrent Lock-Free Binary Search Trees [PPoPP’14] A General

spcl.inf.ethz.ch

@spcl_eth

LATENCY

IVY BRIDGE, EXCLUSIVE

CAS

Page 104: @spcl eth Evaluating the Cost of Atomic Operations on ...spcl.inf.ethz.ch @spcl_eth ATOMICS: POPULARITY [PPoPP’14] Fast Concurrent Lock-Free Binary Search Trees [PPoPP’14] A General

spcl.inf.ethz.ch

@spcl_eth

LATENCY

IVY BRIDGE, EXCLUSIVE

CAS

Page 105: @spcl eth Evaluating the Cost of Atomic Operations on ...spcl.inf.ethz.ch @spcl_eth ATOMICS: POPULARITY [PPoPP’14] Fast Concurrent Lock-Free Binary Search Trees [PPoPP’14] A General

spcl.inf.ethz.ch

@spcl_eth

LATENCY

XEON PHI, MODIFIED / EXCLUSIVE

CAS

Page 106: @spcl eth Evaluating the Cost of Atomic Operations on ...spcl.inf.ethz.ch @spcl_eth ATOMICS: POPULARITY [PPoPP’14] Fast Concurrent Lock-Free Binary Search Trees [PPoPP’14] A General

spcl.inf.ethz.ch

@spcl_eth

Core

Cache Cache Cache

…Cache line

Cache line Read for ownership

Execute

AtomicCache

coherence state

= constant

LATENCY MODEL

EXCLUSIVE OR MODIFIED STATE

= max(read latency, invalidation latency)

Page 107: @spcl eth Evaluating the Cost of Atomic Operations on ...spcl.inf.ethz.ch @spcl_eth ATOMICS: POPULARITY [PPoPP’14] Fast Concurrent Lock-Free Binary Search Trees [PPoPP’14] A General

spcl.inf.ethz.ch

@spcl_eth

Core

Cache Cache Cache

…Cache line

Cache line Read for ownership

Execute

AtomicCache

coherence state

= constant

LATENCY MODEL

SHARED STATE

= max(read latency, invalidation latency)

Page 108: @spcl eth Evaluating the Cost of Atomic Operations on ...spcl.inf.ethz.ch @spcl_eth ATOMICS: POPULARITY [PPoPP’14] Fast Concurrent Lock-Free Binary Search Trees [PPoPP’14] A General

spcl.inf.ethz.ch

@spcl_eth

Core

Cache Cache Cache

…Cache line

Cache line Read for ownership

Execute

AtomicCache

coherence state

= invalidation latency

= constant

LATENCY MODEL

SHARED STATE

Page 109: @spcl eth Evaluating the Cost of Atomic Operations on ...spcl.inf.ethz.ch @spcl_eth ATOMICS: POPULARITY [PPoPP’14] Fast Concurrent Lock-Free Binary Search Trees [PPoPP’14] A General

spcl.inf.ethz.ch

@spcl_eth

Atomics Read

LATENCY

HASWELL, SHARED

Page 110: @spcl eth Evaluating the Cost of Atomic Operations on ...spcl.inf.ethz.ch @spcl_eth ATOMICS: POPULARITY [PPoPP’14] Fast Concurrent Lock-Free Binary Search Trees [PPoPP’14] A General

spcl.inf.ethz.ch

@spcl_eth

How to force cache coherence state

F(M): Write cache line (invalidates all copies)

F(E): F(M) flush read

F(S): F(E) read by some other core

D. Hackenberg, D. Molka, W. Nagel. Comparing Cache Architectures and Coherency Protocols on x86-64

Multicore SMP Systems. MICRO’09