55
Multithreading dos and don'ts Multithreading dos and don'ts ACCU 2015, Bristol – June 2015 ACCU 2015, Bristol – June 2015 Hubert Matthews Hubert Matthews [email protected] [email protected]

Multithread dos and don'ts - ACCU Dos... · Multithreading dos and don'ts ACCU 2015, Bristol – June 2015 Hubert Matthews [email protected]. ... –In Java and C# it means “atomic

  • Upload
    lynhi

  • View
    217

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Multithread dos and don'ts - ACCU Dos... · Multithreading dos and don'ts ACCU 2015, Bristol – June 2015 Hubert Matthews hubert@oxyware.com. ... –In Java and C# it means “atomic

Multithreading dos and don'tsMultithreading dos and don'ts

ACCU 2015, Bristol – June 2015ACCU 2015, Bristol – June 2015

Hubert MatthewsHubert [email protected]@oxyware.com

Page 2: Multithread dos and don'ts - ACCU Dos... · Multithreading dos and don'ts ACCU 2015, Bristol – June 2015 Hubert Matthews hubert@oxyware.com. ... –In Java and C# it means “atomic

Copyright © 2015 Oxyware Ltd 2/55

Why this talk?Why this talk?

Page 3: Multithread dos and don'ts - ACCU Dos... · Multithreading dos and don'ts ACCU 2015, Bristol – June 2015 Hubert Matthews hubert@oxyware.com. ... –In Java and C# it means “atomic

Copyright © 2015 Oxyware Ltd 3/55

Don't – why notDon't – why not

• Avoid multithreaded programming if you canAvoid multithreaded programming if you can– It's harder to write, to read, to understand and to It's harder to write, to read, to understand and to 

test than single­threaded codetest than single­threaded code– It may appear to work but may just not have failed It may appear to work but may just not have failed 

sufficiently visibly yetsufficiently visibly yet– Often distracts from the underlying application Often distracts from the underlying application 

problem and focuses developers on technical issuesproblem and focuses developers on technical issues– A great consumer of developer time and generator A great consumer of developer time and generator 

of frustrationof frustration– Often avoidableOften avoidable– May not deliver the performance benefits you May not deliver the performance benefits you 

expectexpect

Page 4: Multithread dos and don'ts - ACCU Dos... · Multithreading dos and don'ts ACCU 2015, Bristol – June 2015 Hubert Matthews hubert@oxyware.com. ... –In Java and C# it means “atomic

Copyright © 2015 Oxyware Ltd 4/55

Do – whyDo – why

• You need it to access the full power of the You need it to access the full power of the machine (measure, don't guess!)machine (measure, don't guess!)

• You need to scale your application and your You need to scale your application and your application is CPU­boundapplication is CPU­bound

• You are running in a threaded environmentYou are running in a threaded environment• There is an obvious parallel decomposition of There is an obvious parallel decomposition of 

the problem or algorithmthe problem or algorithm• Other approaches to covering latency and I/O Other approaches to covering latency and I/O 

are worse or not availableare worse or not available• You are brave or a masochist (or have an ego)You are brave or a masochist (or have an ego)

Page 5: Multithread dos and don'ts - ACCU Dos... · Multithreading dos and don'ts ACCU 2015, Bristol – June 2015 Hubert Matthews hubert@oxyware.com. ... –In Java and C# it means “atomic

Copyright © 2015 Oxyware Ltd 5/55

AlternativesAlternatives

Single-threaded approaches• event-driven code• asynchronous I/O

• asio, libaio (linux)• overlapped I/O

(Windows)• non-blocking TCP• UDP

• coroutines or fibers• separate processes

Multi-threaded approaches• concurrent library (tasks)

• TBB, PPL• concurrent library (data)

• OpenMP• message passing

• MPI

Page 6: Multithread dos and don'ts - ACCU Dos... · Multithreading dos and don'ts ACCU 2015, Bristol – June 2015 Hubert Matthews hubert@oxyware.com. ... –In Java and C# it means “atomic

Copyright © 2015 Oxyware Ltd 6/55

Page 7: Multithread dos and don'ts - ACCU Dos... · Multithreading dos and don'ts ACCU 2015, Bristol – June 2015 Hubert Matthews hubert@oxyware.com. ... –In Java and C# it means “atomic

Copyright © 2015 Oxyware Ltd 7/55

A whole new set of problemsA whole new set of problems

• Getting single­threaded code working well and Getting single­threaded code working well and with good performance can be a challengewith good performance can be a challenge

• Multithreading provides a whole new set of Multithreading provides a whole new set of ways of getting it wrong or going slowerways of getting it wrong or going slower

– The problems may not show up except under load The problems may not show up except under load or at the most inconvenient timeor at the most inconvenient time

– They may not be reproducibleThey may not be reproducible– They will be hard to debug or to measureThey will be hard to debug or to measure– Knowing which of these problems you have is hardKnowing which of these problems you have is hard

focus on getting good single-threaded performance first before going multithreaded

Page 8: Multithread dos and don'ts - ACCU Dos... · Multithreading dos and don'ts ACCU 2015, Bristol – June 2015 Hubert Matthews hubert@oxyware.com. ... –In Java and C# it means “atomic

Copyright © 2015 Oxyware Ltd 8/55

Problem decompositionProblem decomposition

• Before considering how to implement a parallel Before considering how to implement a parallel solution you have to split the problem up into solution you have to split the problem up into pieces and find an algorithm for processing and pieces and find an algorithm for processing and recombining these piecesrecombining these pieces

– This split may be trivial for “embarrassingly parallel This split may be trivial for “embarrassingly parallel problems” or hard (travelling salesman problem)problems” or hard (travelling salesman problem)

• Classic approachesClassic approaches– Data parallel (sections of an array)Data parallel (sections of an array)– Task parallel (web requests)Task parallel (web requests)

• Interaction between sub­items is keyInteraction between sub­items is key

Page 9: Multithread dos and don'ts - ACCU Dos... · Multithreading dos and don'ts ACCU 2015, Bristol – June 2015 Hubert Matthews hubert@oxyware.com. ... –In Java and C# it means “atomic

Copyright © 2015 Oxyware Ltd 9/55

GoldilocksGoldilocks

• Parallel approaches have to find the “sweet Parallel approaches have to find the “sweet spot” between two extremesspot” between two extremes

• Too fine­grainedToo fine­grained– Data – computation dominated by overheadData – computation dominated by overhead– Threads – context switching overheadThreads – context switching overhead

• Too coarse­grainedToo coarse­grained– Data – load balancing problemsData – load balancing problems– Threads – insufficient items toThreads – insufficient items to

keep threads busykeep threads busy

Page 10: Multithread dos and don'ts - ACCU Dos... · Multithreading dos and don'ts ACCU 2015, Bristol – June 2015 Hubert Matthews hubert@oxyware.com. ... –In Java and C# it means “atomic

Copyright © 2015 Oxyware Ltd 10/55

TestingTesting

• Single­threaded code can be unit testedSingle­threaded code can be unit tested– Repeatable results from isolated codeRepeatable results from isolated code

• Multi­threaded code cannot be unit tested Multi­threaded code cannot be unit tested easily or reliablyeasily or reliably

– Non­deterministic outputsNon­deterministic outputs– Making them deterministic may be possibleMaking them deterministic may be possible– Errors are transient (data races)Errors are transient (data races)– Problems are often performance­related and show Problems are often performance­related and show 

up only at scale or under loadup only at scale or under load

allow for scaling down to a single thread for test before scaling up for production

Page 11: Multithread dos and don'ts - ACCU Dos... · Multithreading dos and don'ts ACCU 2015, Bristol – June 2015 Hubert Matthews hubert@oxyware.com. ... –In Java and C# it means “atomic

Copyright © 2015 Oxyware Ltd 11/55

Avoid sharing mutable dataAvoid sharing mutable data

• Shared mutable Shared mutable data is the evil of all data is the evil of all computing!computing!

• Read­only data can Read­only data can be shared safely be shared safely without lockswithout locks

• Const is your friendConst is your friend• Pure message­Pure message­

passing approach passing approach avoids thisavoids this

mutable

sharednot shared

immutable

Page 12: Multithread dos and don'ts - ACCU Dos... · Multithreading dos and don'ts ACCU 2015, Bristol – June 2015 Hubert Matthews hubert@oxyware.com. ... –In Java and C# it means “atomic

Copyright © 2015 Oxyware Ltd 12/55

Shared writes don't scaleShared writes don't scale

(graphic by Dmitry Vykov, http://www.1024cores.net, CC BY-NC-SA 3.0) single writer principle for speed/scale

Page 13: Multithread dos and don'ts - ACCU Dos... · Multithreading dos and don'ts ACCU 2015, Bristol – June 2015 Hubert Matthews hubert@oxyware.com. ... –In Java and C# it means “atomic

Copyright © 2015 Oxyware Ltd 13/55

Why shared writes don't scaleWhy shared writes don't scale

Core1

L1

L2

Core2

L1

L2

L3

RAM

32+32 KB 1-3 cycles

256 KB 5-20 cycles

4 MB 30-50 cycles

16 GB 100-300 cycles

MESI

• Caches have to Caches have to communicate communicate to ensure to ensure coherent view coherent view 

• MESI protocol MESI protocol passes passes messages messages between cachesbetween caches

• Shared writes Shared writes limited by limited by MESI commsMESI comms

MESI

Page 14: Multithread dos and don'ts - ACCU Dos... · Multithreading dos and don'ts ACCU 2015, Bristol – June 2015 Hubert Matthews hubert@oxyware.com. ... –In Java and C# it means “atomic

Copyright © 2015 Oxyware Ltd 14/55

Shared writes – cache ping­pongShared writes – cache ping­pong• Cache line Cache line 

passed between passed between cachescaches

• Hardware Hardware serialises writes serialises writes to the same lineto the same line

• Therefore zero Therefore zero scalability!!!scalability!!!

• For speed, don't For speed, don't pass ownership: pass ownership: Single Writer Single Writer PrinciplePrinciple

Page 15: Multithread dos and don'ts - ACCU Dos... · Multithreading dos and don'ts ACCU 2015, Bristol – June 2015 Hubert Matthews hubert@oxyware.com. ... –In Java and C# it means “atomic

Copyright © 2015 Oxyware Ltd 15/55

Example – contention costsExample – contention costsstd::atomic<int> counter(0);

void count(){ for (auto i = 0; i != numLoops; ++i) counter++;}

// run with 4 threads on a 4-core machine// -O3 -march=native

$ time ./a.outreal 0m1.675suser 0m6.384ssys 0m0.007s

$ time taskset -c 1 ./a.outreal 0m0.476suser 0m0.470ssys 0m0.006s

3.5 times faster13.6 times less CPU

when run on one core

Page 16: Multithread dos and don'ts - ACCU Dos... · Multithreading dos and don'ts ACCU 2015, Bristol – June 2015 Hubert Matthews hubert@oxyware.com. ... –In Java and C# it means “atomic

Copyright © 2015 Oxyware Ltd 16/55

Example – contention costs (cont'd)Example – contention costs (cont'd)$ perf stat -D 100 -e cycles,instructions,stalled-cycles-backend,cs ./a.out

Performance counter stats for './a.out':

16,635,079,957 cycles 247,668,647 instructions # 0.01 insns per cycle # 65.91 stalled cycles per insn 16,324,609,637 stalled-cycles-backend # 98.13% backend cycles idle 1,912 cs

1.569828247 seconds time elapsed

$ perf stat -D 100 -e cycles,instructions,stalled-cycles-backend,cs taskset -c 1 ./a.out

Performance counter stats for 'taskset -c 1 ./a.out':

1,135,801,575 cycles 197,626,046 instructions # 0.17 insns per cycle # 5.19 stalled cycles per insn 1,026,469,027 stalled-cycles-backend # 90.37% backend cycles idle 115 cs

0.480011707 seconds time elapsed

Page 17: Multithread dos and don'ts - ACCU Dos... · Multithreading dos and don'ts ACCU 2015, Bristol – June 2015 Hubert Matthews hubert@oxyware.com. ... –In Java and C# it means “atomic

Copyright © 2015 Oxyware Ltd 17/55

Don't undersynchroniseDon't undersynchronise

• Shared variables need to be synchronised Shared variables need to be synchronised correctlycorrectly

– Do not rely on guessworkDo not rely on guesswork– Do not try and cheatDo not try and cheat– Do not rely on unspecified ordering or visibilityDo not rely on unspecified ordering or visibility

• Undersychronised variables are subject to data Undersychronised variables are subject to data races (at least one reader and one writer)races (at least one reader and one writer)

• Causes transient and unreproducible errorsCauses transient and unreproducible errors

use locks on shared mutable data structures or use single atomics as synchronisation points

Page 18: Multithread dos and don'ts - ACCU Dos... · Multithreading dos and don'ts ACCU 2015, Bristol – June 2015 Hubert Matthews hubert@oxyware.com. ... –In Java and C# it means “atomic

Copyright © 2015 Oxyware Ltd 18/55

Don't oversynchroniseDon't oversynchronise

• Shared variables need to be synchronised Shared variables need to be synchronised correctlycorrectly

– Too much locking will make the code serialised Too much locking will make the code serialised – Locks are there to slow your program down until Locks are there to slow your program down until 

it is (hopefully) correctit is (hopefully) correct– Watch out for deadlock and livelockWatch out for deadlock and livelock– Performance reduces back to slower than a single Performance reduces back to slower than a single 

thread in the worst case because of locking thread in the worst case because of locking overhead (locks are shared writes)overhead (locks are shared writes)

– Amdahl's Law kicks inAmdahl's Law kicks in

don't keep adding locks – have a clear plan

Page 19: Multithread dos and don'ts - ACCU Dos... · Multithreading dos and don'ts ACCU 2015, Bristol – June 2015 Hubert Matthews hubert@oxyware.com. ... –In Java and C# it means “atomic

Copyright © 2015 Oxyware Ltd 19/55

Amdahl's LawAmdahl's Law

• Serial code limits scale, regardless of the Serial code limits scale, regardless of the number of threads or cores availablenumber of threads or cores available

Serial portion of code Maximum speedup

1% 100x

5% 20x

10% 10x

20% 5x

25% 4x

avoid non-read-only data sharing to allow for maximum parallelism

Page 20: Multithread dos and don'ts - ACCU Dos... · Multithreading dos and don'ts ACCU 2015, Bristol – June 2015 Hubert Matthews hubert@oxyware.com. ... –In Java and C# it means “atomic

Copyright © 2015 Oxyware Ltd 20/55

Deadlock and livelockDeadlock and livelock

• If you have more than one lock in your If you have more than one lock in your program you may end up in a deadlock (deadly program you may end up in a deadlock (deadly embrace)embrace)

– Have only one lock (may limit performance)Have only one lock (may limit performance)– Increases lock hold timeIncreases lock hold time

• Locking order is importantLocking order is important– C++11's std::lockC++11's std::lock– Use addresses of locks to guarantee orderingUse addresses of locks to guarantee ordering– Release order is not importantRelease order is not important– May require exposing internal locks to callersMay require exposing internal locks to callers

Page 21: Multithread dos and don'ts - ACCU Dos... · Multithreading dos and don'ts ACCU 2015, Bristol – June 2015 Hubert Matthews hubert@oxyware.com. ... –In Java and C# it means “atomic

Copyright © 2015 Oxyware Ltd 21/55

Time­based synchronisationTime­based synchronisation

locks don't sequence start and finishuse futures to synchronise in time (A comes after B)

horizontal sync (locks)

time-based sync (futures)

fork/join model

(Amdahl)

Page 22: Multithread dos and don'ts - ACCU Dos... · Multithreading dos and don'ts ACCU 2015, Bristol – June 2015 Hubert Matthews hubert@oxyware.com. ... –In Java and C# it means “atomic

Copyright © 2015 Oxyware Ltd 22/55

Hardware v. software threadsHardware v. software threads

• There are a limited number of hardware There are a limited number of hardware threads availablethreads available

– 1 per core1 per core– 2 per core for Intel hyperthreading2 per core for Intel hyperthreading

• If there are more s/w than h/w threads then If there are more s/w than h/w threads then they will have to take turns (oversubscription)they will have to take turns (oversubscription)

– Leads to context switchingLeads to context switching– Slow; 1000s of cycles to switchSlow; 1000s of cycles to switch

• Call to operating systemCall to operating system• SchedulingScheduling• Cold cache and TLBCold cache and TLB

one software thread per hardware thread

Page 23: Multithread dos and don'ts - ACCU Dos... · Multithreading dos and don'ts ACCU 2015, Bristol – June 2015 Hubert Matthews hubert@oxyware.com. ... –In Java and C# it means “atomic

Copyright © 2015 Oxyware Ltd 23/55

Queue­based systemsQueue­based systems

• Systems that are based on queues can have Systems that are based on queues can have performance issues caused by:performance issues caused by:

– Context switching when queues are empty or fullContext switching when queues are empty or full– Voluntary context switching Voluntary context switching – Shared writes to queue (insert and remove)Shared writes to queue (insert and remove)– Processing per item is too smallProcessing per item is too small– Can be difficult to run in single­threaded modeCan be difficult to run in single­threaded mode– c.f. Disruptor patternc.f. Disruptor pattern

be careful with queues if performance is important

Page 24: Multithread dos and don'ts - ACCU Dos... · Multithreading dos and don'ts ACCU 2015, Bristol – June 2015 Hubert Matthews hubert@oxyware.com. ... –In Java and C# it means “atomic

Copyright © 2015 Oxyware Ltd 24/55

Lock hold time and scopeLock hold time and scope

• The time that a lock is held for determines the The time that a lock is held for determines the amount of parallelismamount of parallelism

– Shorter hold times are betterShorter hold times are better– Shorter times may also indicate less shared stateShorter times may also indicate less shared state

• However, small lock scopes may not protect the However, small lock scopes may not protect the data across lock scopes adequatelydata across lock scopes adequately

– Need to consider business­level transactions and Need to consider business­level transactions and logical unit of workslogical unit of works

– Can lead to application­level errors because of Can lead to application­level errors because of concurrently changing dataconcurrently changing data

Page 25: Multithread dos and don'ts - ACCU Dos... · Multithreading dos and don'ts ACCU 2015, Bristol – June 2015 Hubert Matthews hubert@oxyware.com. ... –In Java and C# it means “atomic

Copyright © 2015 Oxyware Ltd 25/55

TOCTOU and application errorsTOCTOU and application errors

• Time­of­check to time­of­use errors (TOCTOU) Time­of­check to time­of­use errors (TOCTOU) can lead to application errorscan lead to application errors

– Note:Note: these are not data races caused by  these are not data races caused by synchronisation errors (i.e. locking errors)synchronisation errors (i.e. locking errors)

– These are caused by concurrent modifications at the These are caused by concurrent modifications at the application levelapplication level

– Usually caused by inappropriate APIsUsually caused by inappropriate APIs

Sample conversationMe: Is there any ice cream left, please?Waiter: I'll check... yes there isMe: I'll have some pleaseWaiter: Oops, we've just run out

Page 26: Multithread dos and don'ts - ACCU Dos... · Multithreading dos and don'ts ACCU 2015, Bristol – June 2015 Hubert Matthews hubert@oxyware.com. ... –In Java and C# it means “atomic

Copyright © 2015 Oxyware Ltd 26/55

TOCTOU and application errors (2)TOCTOU and application errors (2)

• Locking doesn't helpLocking doesn't help• Need a different APINeed a different API

–e.g. putIfAbsent()e.g. putIfAbsent()–popIfNotEmpty()popIfNotEmpty()

• Single batched Single batched operation with operation with expection andexpection andfailure notificationfailure notification

• Compare­and­set Compare­and­set (CAS) is a classic (CAS) is a classic approach (retries)approach (retries)

read

read

write

write

modify

modify

Page 27: Multithread dos and don'ts - ACCU Dos... · Multithreading dos and don'ts ACCU 2015, Bristol – June 2015 Hubert Matthews hubert@oxyware.com. ... –In Java and C# it means “atomic

Copyright © 2015 Oxyware Ltd 27/55

VolatileVolatile

• Volatile in C and C++ is of no use for Volatile in C and C++ is of no use for multithreadingmultithreading

– In Java and C# it means “atomic”In Java and C# it means “atomic”• It disables caching in registersIt disables caching in registers• It forces memory accessesIt forces memory accesses• It doesn't ensure cross­thread visibiltyIt doesn't ensure cross­thread visibilty• It doesn't affect compiler or hardware It doesn't affect compiler or hardware 

reordering of operationsreordering of operations

do not use volatile variables except for memory-mapped device I/O

Page 28: Multithread dos and don'ts - ACCU Dos... · Multithreading dos and don'ts ACCU 2015, Bristol – June 2015 Hubert Matthews hubert@oxyware.com. ... –In Java and C# it means “atomic

Copyright © 2015 Oxyware Ltd 28/55

Spin loops and pollingSpin loops and polling

• Spin loops are disastrous on single­threaded Spin loops are disastrous on single­threaded machinesmachines

–They just burn CPU cycles until the OS They just burn CPU cycles until the OS reschedules the threadreschedules the thread

• On a multi­thread machine they are of use On a multi­thread machine they are of use when they use less CPU than the overhead of a when they use less CPU than the overhead of a context switch (1000s of cycles)context switch (1000s of cycles)

• Polling with a sleep() uses little CPU but has Polling with a sleep() uses little CPU but has longer wakeup latency (avg 1/2 polling time)longer wakeup latency (avg 1/2 polling time)

avoid spin loops unless you have measured the latency-CPU tradeoff

Page 29: Multithread dos and don'ts - ACCU Dos... · Multithreading dos and don'ts ACCU 2015, Bristol – June 2015 Hubert Matthews hubert@oxyware.com. ... –In Java and C# it means “atomic

Copyright © 2015 Oxyware Ltd 29/55

Blocking I/OBlocking I/O

• Programs can be CPU, memory, network or I/O Programs can be CPU, memory, network or I/O limitedlimited

• I/O limited programs that use blocking I/O I/O limited programs that use blocking I/O will often use too many threads to handle will often use too many threads to handle blocking calls for I/Oblocking calls for I/O

• This causes lots of context switchingThis causes lots of context switching• Investigate asynchronous approachesInvestigate asynchronous approaches

– libaio, non­blocking sockets, overlapped I/Olibaio, non­blocking sockets, overlapped I/O• Damage­limitation approaches such as I/O Damage­limitation approaches such as I/O 

thread poolsthread pools• Watch out for copying of data (zero copy)Watch out for copying of data (zero copy)

Page 30: Multithread dos and don'ts - ACCU Dos... · Multithreading dos and don'ts ACCU 2015, Bristol – June 2015 Hubert Matthews hubert@oxyware.com. ... –In Java and C# it means “atomic

Copyright © 2015 Oxyware Ltd 30/55

Interrupting threads and shutdownInterrupting threads and shutdown

• Don't even think about trying to interrupt Don't even think about trying to interrupt another threadanother thread

• Plan a clean shutdown mechanism for your Plan a clean shutdown mechanism for your programprogram

• Often will involve a cooperative approachOften will involve a cooperative approach–Shared stop/start/state flagShared stop/start/state flag–Shutdown message in message­passing Shutdown message in message­passing 

applicationsapplications

Page 31: Multithread dos and don'ts - ACCU Dos... · Multithreading dos and don'ts ACCU 2015, Bristol – June 2015 Hubert Matthews hubert@oxyware.com. ... –In Java and C# it means “atomic

Copyright © 2015 Oxyware Ltd 31/55

Thread priorities and schedulingThread priorities and scheduling

• If your application requires the use of thread If your application requires the use of thread priorities to operate correctly then it's almost priorities to operate correctly then it's almost certainly brokencertainly broken

• Beware of priority inversion and locking issuesBeware of priority inversion and locking issues• Thread scheduling is rarely the correct solutionThread scheduling is rarely the correct solution

–Probably implies locking issues and too Probably implies locking issues and too much contention; fix that firstmuch contention; fix that first

–Can be useful in limited circumstances to Can be useful in limited circumstances to provide run­to­completion semanticsprovide run­to­completion semantics

Page 32: Multithread dos and don'ts - ACCU Dos... · Multithreading dos and don'ts ACCU 2015, Bristol – June 2015 Hubert Matthews hubert@oxyware.com. ... –In Java and C# it means “atomic

Copyright © 2015 Oxyware Ltd 32/55

DeletionDeletion• Be careful about deleting data in concurrent Be careful about deleting data in concurrent 

systems; another thread may still have a systems; another thread may still have a reference reference 

• Reference counting can help but counters must Reference counting can help but counters must be thread­safe (std::shared_ptr is, mostly...)be thread­safe (std::shared_ptr is, mostly...)

• Avoid concurrent deletions: Avoid concurrent deletions: tbb::concurrent_vector is append­onlytbb::concurrent_vector is append­only

• Separate the program into phases so that Separate the program into phases so that deletion is in a safe serial partdeletion is in a safe serial part

• Garbage collection is a big win hereGarbage collection is a big win here

Page 33: Multithread dos and don'ts - ACCU Dos... · Multithreading dos and don'ts ACCU 2015, Bristol – June 2015 Hubert Matthews hubert@oxyware.com. ... –In Java and C# it means “atomic

Copyright © 2015 Oxyware Ltd 33/55

Multiple atomicsMultiple atomics• Can be used successfully individuallyCan be used successfully individually• The problem becomes more about transactional The problem becomes more about transactional 

correctnesscorrectness–Do the atomics make sense together?Do the atomics make sense together?–Race window between modifying bothRace window between modifying both–Initialisation orderInitialisation order–Visibility of updatesVisibility of updates

1) use atomic<Data *> instead of multiple atomics(even better, use atomic<const Data *>)

2) use std::call_once for initialisation

Page 34: Multithread dos and don'ts - ACCU Dos... · Multithreading dos and don'ts ACCU 2015, Bristol – June 2015 Hubert Matthews hubert@oxyware.com. ... –In Java and C# it means “atomic

Copyright © 2015 Oxyware Ltd 34/55

Immutable data and safe publishingImmutable data and safe publishing

• Immutable data can be shared without lockingImmutable data can be shared without locking• Fewer errors and easy to understandFewer errors and easy to understand• Be careful about deletion; are there still Be careful about deletion; are there still 

references to the object?!references to the object?!• std::shared_ptr<> has atomic counters but not std::shared_ptr<> has atomic counters but not 

its body so it must be lockedits body so it must be locked• Helps with exception safety, transactions and Helps with exception safety, transactions and 

copy­on­write optimisationcopy­on­write optimisation

publish safely using std::atomic<const Data *>

Page 35: Multithread dos and don'ts - ACCU Dos... · Multithreading dos and don'ts ACCU 2015, Bristol – June 2015 Hubert Matthews hubert@oxyware.com. ... –In Java and C# it means “atomic

Copyright © 2015 Oxyware Ltd 35/55

Error handlingError handling

• Propagating errors from one thread to another Propagating errors from one thread to another is trickyis tricky

• std::future<> catches exceptions thrown in the std::future<> catches exceptions thrown in the called thread and rethrows them in the calling called thread and rethrows them in the calling thread when f.get() is calledthread when f.get() is called

• Make sure you have a try/catch at the top­level Make sure you have a try/catch at the top­level of every thread you startof every thread you start

use std::future<> for time sequencing and easier error handling

Page 36: Multithread dos and don'ts - ACCU Dos... · Multithreading dos and don'ts ACCU 2015, Bristol – June 2015 Hubert Matthews hubert@oxyware.com. ... –In Java and C# it means “atomic

Copyright © 2015 Oxyware Ltd 36/55

False sharingFalse sharing

cache line

thread 1 thread 2

• Separate threads can access separate variables on the Separate threads can access separate variables on the same cache line (often 64 bytes long)same cache line (often 64 bytes long)

• Writes by one thread invalidate the cache line for the Writes by one thread invalidate the cache line for the other thread, leading to “cache ping­pong”other thread, leading to “cache ping­pong”

• Major performance killer – effectively shared writesMajor performance killer – effectively shared writes

watch out for false sharing use padding to length of cache line

Page 37: Multithread dos and don'ts - ACCU Dos... · Multithreading dos and don'ts ACCU 2015, Bristol – June 2015 Hubert Matthews hubert@oxyware.com. ... –In Java and C# it means “atomic

Copyright © 2015 Oxyware Ltd 37/55

Parallel algorithmsParallel algorithms

• Some libraries can run in parallel mode without Some libraries can run in parallel mode without you having to start any threads or do any you having to start any threads or do any synchronisationsynchronisation

• Gcc does this for STL algorithms if compiled Gcc does this for STL algorithms if compiled with ­D_GLIBCXX_PARALLEL and ­fopenmpwith ­D_GLIBCXX_PARALLEL and ­fopenmp

use “free” parallelism if available

Page 38: Multithread dos and don'ts - ACCU Dos... · Multithreading dos and don'ts ACCU 2015, Bristol – June 2015 Hubert Matthews hubert@oxyware.com. ... –In Java and C# it means “atomic

Copyright © 2015 Oxyware Ltd 38/55

Read/write ratioRead/write ratio

• Different approaches are appropriate for read­Different approaches are appropriate for read­mostly or write­mostly access patternsmostly or write­mostly access patterns

• Also depends on lock hold timeAlso depends on lock hold time• Short hold, lots of writes => CAS, spin lock et alShort hold, lots of writes => CAS, spin lock et al• Short hold, mixed R/W => distributed mutexShort hold, mixed R/W => distributed mutex• Short (zero) hold => RCU­style lock freeShort (zero) hold => RCU­style lock free• Beware of reader/writer lock scalingBeware of reader/writer lock scaling

select an approach based on data-access patterns

Page 39: Multithread dos and don'ts - ACCU Dos... · Multithreading dos and don'ts ACCU 2015, Bristol – June 2015 Hubert Matthews hubert@oxyware.com. ... –In Java and C# it means “atomic

Copyright © 2015 Oxyware Ltd 39/55

Fast/slow pathsFast/slow paths

• Know what operations need to be fastKnow what operations need to be fast– Frequent operationsFrequent operations

• Avoid locks on the fast pathAvoid locks on the fast path– Mutexes, I/O, memory allocation, etcMutexes, I/O, memory allocation, etc

• Push work to the slow pathPush work to the slow path– Maybe use a queue or a background threadMaybe use a queue or a background thread– Block slow path until there are no fast path usersBlock slow path until there are no fast path users– RCU, garbage collection, distributed read/write RCU, garbage collection, distributed read/write 

mutexmutex

know what needs to be fast

Page 40: Multithread dos and don'ts - ACCU Dos... · Multithreading dos and don'ts ACCU 2015, Bristol – June 2015 Hubert Matthews hubert@oxyware.com. ... –In Java and C# it means “atomic

Copyright © 2015 Oxyware Ltd 40/55

read thread

read thread

read thread

read thread

Example – distributed R/W mutexExample – distributed R/W mutex

• Per-core mutex can be locked by only one read thread at a time so the mutex is uncontended and therefore fast; read mutex cache line is not shared across cores

• Write thread locks all mutexes to block readers; slow operation

mutex mutex mutex mutexper core

write thread

read thread

read thread

read thread

read thread

• Windows: GetCurrentProcessorNumber()• Linux: sched_getcpu()• See http://1024cores.net for more details

Page 41: Multithread dos and don'ts - ACCU Dos... · Multithreading dos and don'ts ACCU 2015, Bristol – June 2015 Hubert Matthews hubert@oxyware.com. ... –In Java and C# it means “atomic

Copyright © 2015 Oxyware Ltd 41/55

Distributed R/W mutex read performanceDistributed R/W mutex read performancestruct alignas(64) PerCoreLock { std::mutex lock;};

PerCoreLock locks[numThreads];

void lockLoop(){ for (auto i = 0; i != numLoops; ++i) { auto core = sched_getcpu(); std::lock_guard<std::mutex> guard(locks[core].lock); }}

$ time ./a.outreal 0m0.537suser 0m2.019ssys 0m0.003s

$ time taskset -c 1 ./a.outreal 0m1.992suser 0m1.985ssys 0m0.005s

$ time ./a.outreal 0m4.204suser 0m10.514ssys 0m0.005s

$ time taskset -c 1 ./a.outreal 0m2.091suser 0m2.077ssys 0m0.004s

$ time ./a.outreal 0m10.097suser 0m11.862ssys 0m24.078s

$ time taskset -c 1 ./a.outreal 0m2.057suser 0m2.045ssys 0m0.003s

code as

shownone shared

lock – context switching

no alignas(64) – false sharing

Page 42: Multithread dos and don'ts - ACCU Dos... · Multithreading dos and don'ts ACCU 2015, Bristol – June 2015 Hubert Matthews hubert@oxyware.com. ... –In Java and C# it means “atomic

Copyright © 2015 Oxyware Ltd 42/55

Memory model and orderingMemory model and ordering• We want our programs to run quicklyWe want our programs to run quickly• Modern hardware reorders instructions and Modern hardware reorders instructions and 

can run multiple instruction at oncecan run multiple instruction at once• Compilers can reorder instructions too (e.g. to Compilers can reorder instructions too (e.g. to 

cover possible cache misses, delay slots, etc)cover possible cache misses, delay slots, etc)• Some languages (Java, C++11) have defined a Some languages (Java, C++11) have defined a 

memory model to say what reordering means memory model to say what reordering means at the language levelat the language level

• Don't go there unless you can prove through Don't go there unless you can prove through measurement it's necessarymeasurement it's necessary

Page 43: Multithread dos and don'ts - ACCU Dos... · Multithreading dos and don'ts ACCU 2015, Bristol – June 2015 Hubert Matthews hubert@oxyware.com. ... –In Java and C# it means “atomic

Copyright © 2015 Oxyware Ltd 43/55

Need for a memory modelNeed for a memory model

• Correctness now based on memory, not just codeCorrectness now based on memory, not just code• Need to control caching (register, L1, L2, etc)Need to control caching (register, L1, L2, etc)

// C++03 code // single thread

// C++03 code // single thread

// C++11 code// thread 1

// C++11 code// thread 1

compiler reordering

compiler reordering

compiler reordering

compiler reordering

hardware reordering

and caching

hardware reordering

and caching

// C++11 code// thread 2

// C++11 code// thread 2

memory

compiler reordering

compiler reordering

hardware reordering

and caching

hardware reordering

and caching

sequence points

memory

volatile reads and writes

“happens before”

“synchronizes with” and visibility

Page 44: Multithread dos and don'ts - ACCU Dos... · Multithreading dos and don'ts ACCU 2015, Bristol – June 2015 Hubert Matthews hubert@oxyware.com. ... –In Java and C# it means “atomic

Copyright © 2015 Oxyware Ltd 44/55

Instruction interleavingInstruction interleaving

• ““Sequential consistency” means operations in Sequential consistency” means operations in separate threads are interleaved and that all threads separate threads are interleaved and that all threads see the same interleavingsee the same interleaving– Sequence is preserved within and across threadsSequence is preserved within and across threads

• This is the “natural” mental model for programmers This is the “natural” mental model for programmers to think of thread execution order and memoryto think of thread execution order and memory– It is also the C++11 default memory orderingIt is also the C++11 default memory ordering

// thread 1x = 1; // 1r1 = y; // 2

// thread 1x = 1; // 1r1 = y; // 2

// thread 2y = 1; // 3r2 = x; // 4

// thread 2y = 1; // 3r2 = x; // 4

// 6 possible sequentially consistent// execution orders

1234// x=1; r1=y; y=1; r2=x;1324// x=1; y=1; r1=y; r2=x;1342// x=1; y=1; r2=x; r1=y; 3412// y=1; r2=x; x=1; r1=y;3142// y=1; x=1; r2=x; r1=y;3124// y=1; x=1; r1=y; r2=x;

// 6 possible sequentially consistent// execution orders

1234// x=1; r1=y; y=1; r2=x;1324// x=1; y=1; r1=y; r2=x;1342// x=1; y=1; r2=x; r1=y; 3412// y=1; r2=x; x=1; r1=y;3142// y=1; x=1; r2=x; r1=y;3124// y=1; x=1; r1=y; r2=x;

interleave

Page 45: Multithread dos and don'ts - ACCU Dos... · Multithreading dos and don'ts ACCU 2015, Bristol – June 2015 Hubert Matthews hubert@oxyware.com. ... –In Java and C# it means “atomic

Copyright © 2015 Oxyware Ltd 45/55

Instruction reorderingInstruction reordering

• In order to gain performance both the compiler and the In order to gain performance both the compiler and the hardware may reorder instructions hardware may reorder instructions – compiler may move loads earlier (to allow for cache misses)compiler may move loads earlier (to allow for cache misses)– hardware may not write back to memory immediately (store buffers)hardware may not write back to memory immediately (store buffers)

• x, y, r1 and r2 are all independent so code can be reorderedx, y, r1 and r2 are all independent so code can be reordered• Even worse, changes in one thread may not be visible in Even worse, changes in one thread may not be visible in 

another thread so results are not defined – another thread so results are not defined – data racedata race

// thread 1x = 1; // 1r1 = y; // 2

// thread 1x = 1; // 1r1 = y; // 2

// thread 2y = 1; // 3r2 = x; // 4

// thread 2y = 1; // 3r2 = x; // 4

// 4 factorial (== 24)// possible execution orders

1234// x=1; r1=y; y=1; r2=x;4321// r2=x; y=1; r1=y; x=1;3124// y=1; x=1; r1=y; r2=x;// etc...

// 4 factorial (== 24)// possible execution orders

1234// x=1; r1=y; y=1; r2=x;4321// r2=x; y=1; r1=y; x=1;3124// y=1; x=1; r1=y; r2=x;// etc...

could be executed in reverse order

reorder

Page 46: Multithread dos and don'ts - ACCU Dos... · Multithreading dos and don'ts ACCU 2015, Bristol – June 2015 Hubert Matthews hubert@oxyware.com. ... –In Java and C# it means “atomic

Copyright © 2015 Oxyware Ltd

Hardware memory reorderingHardware memory reordering

Alp

ha

Alp

ha

AR

Mv7

A

RM

v7

PA

-RIS

C

PA

-RIS

C

PO

WE

R

PO

WE

R

SP

AR

C R

MO

S

PA

RC

RM

O

SP

AR

C P

SO

S

PA

RC

PS

O

SP

AR

C T

SO

S

PA

RC

TS

O

x86

x86

x86

oost

ore

x86

oost

ore

AM

D64

A

MD

64

IA-6

4 IA

-64

zSer

ies

zSer

ies

Loads reordered after loads Loads reordered after loads Y Y Y Y Y Y Y Y Y Y Y Y Y Y Loads reordered after stores Loads reordered after stores Y Y Y Y Y Y Y Y Y Y Y Y Y Y Stores reordered after stores Stores reordered after stores Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Stores reordered after loads Stores reordered after loads Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Atomic reordered with loads Atomic reordered with loads Y Y Y Y Y Y Y Y Y Y Atomic reordered with stores Atomic reordered with stores Y Y Y Y Y Y Y Y Y Y Y Y Dependent loads reordered Dependent loads reordered Y Y Incoherent Instruction cache pipeline Incoherent Instruction cache pipeline Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y

• Hardware can reorder memory operations in different waysHardware can reorder memory operations in different ways– Can also depend on operating systemCan also depend on operating system

• Solaris on SPARC uses Total Store Order (TSO)Solaris on SPARC uses Total Store Order (TSO)• Linux on SPARC uses Relaxed Memory Order (RMO)Linux on SPARC uses Relaxed Memory Order (RMO)

http://en.wikipedia.org/wiki/Memory_ordering

Page 47: Multithread dos and don'ts - ACCU Dos... · Multithreading dos and don'ts ACCU 2015, Bristol – June 2015 Hubert Matthews hubert@oxyware.com. ... –In Java and C# it means “atomic

Copyright © 2015 Oxyware Ltd 47/55

Synchronisation with seq. consistencySynchronisation with seq. consistency

• This code is correct when sequentially consistentThis code is correct when sequentially consistent– thread 2 doesn’t access x until it has been set by thread 1thread 2 doesn’t access x until it has been set by thread 1

• But in the presence of reordering it can failBut in the presence of reordering it can fail• The problem is that we haven’t specified that cross­The problem is that we haven’t specified that cross­

thread order or visibility is importantthread order or visibility is important• We need to use synchronisation variables – atomicsWe need to use synchronisation variables – atomics• Making everything atomic is slow – 30­60 cyclesMaking everything atomic is slow – 30­60 cycles

– cache synchronisation is slow and has limited bandwidthcache synchronisation is slow and has limited bandwidth

// thread 1x = 42;x_init = true;

// thread 1x = 42;x_init = true;

// thread 2while (! x_init) {}y = x;

// thread 2while (! x_init) {}y = x;

Page 48: Multithread dos and don'ts - ACCU Dos... · Multithreading dos and don'ts ACCU 2015, Bristol – June 2015 Hubert Matthews hubert@oxyware.com. ... –In Java and C# it means “atomic

Copyright © 2015 Oxyware Ltd 48/55

Synchronisation with atomicsSynchronisation with atomics

• This now works without relying on having This now works without relying on having sequential consistency everywhere (just atomics)sequential consistency everywhere (just atomics)– atomics prevent the compiler moving code across atomics prevent the compiler moving code across 

accesses accesses – atomics also cause memory updates to be visibleatomics also cause memory updates to be visible

• Load and store uses sequential consistencyLoad and store uses sequential consistency– uses default parameter of std::memory_order_seq_cstuses default parameter of std::memory_order_seq_cst

// thread 1std::atomic<bool> x_init;int x;

x = 42;x_init.store(true);

// or x_init = true;

// thread 1std::atomic<bool> x_init;int x;

x = 42;x_init.store(true);

// or x_init = true;

// thread 2extern std::atomic<bool> x_init;extern int x;

while (! x_init.load());y = x;

// or while (! x_init);

// thread 2extern std::atomic<bool> x_init;extern int x;

while (! x_init.load());y = x;

// or while (! x_init);

Page 49: Multithread dos and don'ts - ACCU Dos... · Multithreading dos and don'ts ACCU 2015, Bristol – June 2015 Hubert Matthews hubert@oxyware.com. ... –In Java and C# it means “atomic

Copyright © 2015 Oxyware Ltd 49/55

Low­level synchronisation detailLow­level synchronisation detail

• BlueBlue  fencesfences prevent the compiler reordering code prevent the compiler reordering code– they don’t generate any run­time codethey don’t generate any run­time code

• Red fencesRed fences force memory to make changes visible force memory to make changes visible– they do generate code: fence, lock prefix, CAS opcodesthey do generate code: fence, lock prefix, CAS opcodes– depends heavily on underlying hardware (c.f. depends heavily on underlying hardware (c.f. 

reordering)reordering)– only need one of the two red fences, usually on storeonly need one of the two red fences, usually on store

• One reason that threads can’t just be a libraryOne reason that threads can’t just be a library

// thread 1std::atomic<bool> x_init;

x = 42;// blue fencex_init.store(true);// red fence

// thread 1std::atomic<bool> x_init;

x = 42;// blue fencex_init.store(true);// red fence

// thread 2extern std::atomic<bool> x_init;

while (/*rfence*/! x_init.load()); // blue fencey = x;

// thread 2extern std::atomic<bool> x_init;

while (/*rfence*/! x_init.load()); // blue fencey = x;

rfence here in loop forces load of latest value

of x_init

prevents x and x_init being reordered

makes store to x_init visible

prevents reordering

Page 50: Multithread dos and don'ts - ACCU Dos... · Multithreading dos and don'ts ACCU 2015, Bristol – June 2015 Hubert Matthews hubert@oxyware.com. ... –In Java and C# it means “atomic

Copyright © 2015 Oxyware Ltd 50/55

Generated assembler codeGenerated assembler code// thread 1x = 42;// blue fencex_init.store(true);// red fence

// thread 1x = 42;// blue fencex_init.store(true);// red fence

// thread 2

while (/*rfence*/ ! x_init.load());// blue fencey = x;

// thread 2

while (/*rfence*/ ! x_init.load());// blue fencey = x;

// with atomic bool x_init mov DWORD PTR x, 42 mov BYTE PTR x_init, 1 mfence

// with bool x_init mov DWORD PTR x, 42 mov BYTE PTR x_init, 1

// with atomic bool x_init mov DWORD PTR x, 42 mov BYTE PTR x_init, 1 mfence

// with bool x_init mov DWORD PTR x, 42 mov BYTE PTR x_init, 1

// with atomic bool x_initL25: movzx eax, BYTE PTR x_init test al, al je .L25 mov eax, DWORD PTR x mov DWORD PTR y, eax

// with bool x_init cmp BYTE PTR x_init, 0 jne .L3.L5: jmp .L5.L3: mov eax, DWORD PTR x mov DWORD PTR y, eax

// with atomic bool x_initL25: movzx eax, BYTE PTR x_init test al, al je .L25 mov eax, DWORD PTR x mov DWORD PTR y, eax

// with bool x_init cmp BYTE PTR x_init, 0 jne .L3.L5: jmp .L5.L3: mov eax, DWORD PTR x mov DWORD PTR y, eax

infinite loop because

visibility not specified

no fence needed on load

for X86

fence needed on store for

X86

sequential consistent by default

g++ 4.7output, x86

Page 51: Multithread dos and don'ts - ACCU Dos... · Multithreading dos and don'ts ACCU 2015, Bristol – June 2015 Hubert Matthews hubert@oxyware.com. ... –In Java and C# it means “atomic

Copyright © 2015 Oxyware Ltd 51/55

Using memory order flagsUsing memory order flags// thread 1x = 42;// blue fencex_init.store(true, std::memory_order_release);

// thread 1x = 42;// blue fencex_init.store(true, std::memory_order_release);

// thread 2while (! x_init.load(std::memory_order_acquire));// blue fencey = x;

// thread 2while (! x_init.load(std::memory_order_acquire));// blue fencey = x;

// with bool x_init seq_cst mov DWORD PTR x, 42 mov BYTE PTR x_init, 1 mfence

// with bool x_init release mov DWORD PTR x, 42 mov BYTE PTR x_init, 1

// with bool x_init seq_cst mov DWORD PTR x, 42 mov BYTE PTR x_init, 1 mfence

// with bool x_init release mov DWORD PTR x, 42 mov BYTE PTR x_init, 1

// with bool atomic x_initL25: movzx eax, BYTE PTR x_init test al, al je .L25 mov eax, DWORD PTR x mov DWORD PTR y, eax

// same code for x_init acquire

// with bool atomic x_initL25: movzx eax, BYTE PTR x_init test al, al je .L25 mov eax, DWORD PTR x mov DWORD PTR y, eax

// same code for x_init acquireno fence for release store

no fence needed on load

for X86

needed on seq_cst store

• Release provides only the blue fence (no writes Release provides only the blue fence (no writes in this thread reordered after the store)in this thread reordered after the store)

• Controls compiler reordering but not hardwareControls compiler reordering but not hardware

acquire means no reads in this thread

reordered before here

Page 52: Multithread dos and don'ts - ACCU Dos... · Multithreading dos and don'ts ACCU 2015, Bristol – June 2015 Hubert Matthews hubert@oxyware.com. ... –In Java and C# it means “atomic

Copyright © 2015 Oxyware Ltd 52/55

Memory model adviceMemory model advice

• This is a complex and subtle area and you This is a complex and subtle area and you should avoid using it unless you can prove should avoid using it unless you can prove that you can’t get adequate performance that you can’t get adequate performance without itwithout it– yes, really, I mean it....yes, really, I mean it....

• Even experts get confused by this stuff!Even experts get confused by this stuff!– did I mention you should avoid it....did I mention you should avoid it....

• If you do use it, use acquire on load and If you do use it, use acquire on load and release on storerelease on store– Anything else will be a source of subtle bugsAnything else will be a source of subtle bugs

Page 53: Multithread dos and don'ts - ACCU Dos... · Multithreading dos and don'ts ACCU 2015, Bristol – June 2015 Hubert Matthews hubert@oxyware.com. ... –In Java and C# it means “atomic

Copyright © 2015 Oxyware Ltd 53/55

std::atomic<int> counter(0);

void count(){ for (auto i = 0; i != numLoops; ++i) counter.store(5); //counter.store(5, std::memory_order_seq_cst); //counter.store(5, std::memory_order_release);}

$ time ./a.outreal 0m2.039suser 0m5.932ssys 0m0.002s

$ time taskset -c 1 ./a.outreal 0m1.004suser 0m1.002ssys 0m0.001s

$ time ./a.outreal 0m2.118suser 0m6.496ssys 0m0.009s

$ time taskset -c 1 ./a.outreal 0m0.999suser 0m0.994ssys 0m0.004s

$ time ./a.outreal 0m0.097suser 0m0.225ssys 0m0.002s

$ time taskset -c 1 ./a.outreal 0m0.057suser 0m0.055ssys 0m0.002s

code as

shown m_o_release (no mfence)

m_o_seq_cst (default)

Memory model ­ exampleMemory model ­ example

Page 54: Multithread dos and don'ts - ACCU Dos... · Multithreading dos and don'ts ACCU 2015, Bristol – June 2015 Hubert Matthews hubert@oxyware.com. ... –In Java and C# it means “atomic

Copyright © 2015 Oxyware Ltd 54/55

Concurrency spectrumConcurrency spectrum

• Invest your time in splitting up the problemInvest your time in splitting up the problem• You know your domainYou know your domain• Leave concurrency parts to othersLeave concurrency parts to others

processesmessaging

TBB/PPLatomics,futures

threadsmemoryordering

keep as far to the left as possible

shared memory

Page 55: Multithread dos and don'ts - ACCU Dos... · Multithreading dos and don'ts ACCU 2015, Bristol – June 2015 Hubert Matthews hubert@oxyware.com. ... –In Java and C# it means “atomic

Copyright © 2015 Oxyware Ltd 55/55

SummarySummary

• Multithreaded programming is trickyMultithreaded programming is tricky–   New skills and ideas and ways to get it wrongNew skills and ideas and ways to get it wrong

• Focus on partitioning the problemFocus on partitioning the problem– Determines data sharing, locking, work breakdown Determines data sharing, locking, work breakdown 

and schedulingand scheduling• Avoid shared mutable data where possibleAvoid shared mutable data where possible• Know your access patternsKnow your access patterns• Scale down as well as upScale down as well as up• Balance extremes of grain size, lock extent, etcBalance extremes of grain size, lock extent, etc• Don't try and be cleverDon't try and be clever