Beginners guide-concurrency

Preview:

DESCRIPTION

 

Citation preview

The Disruptor - A Beginners Guide to Hardcore Concurrency

Trisha Gee & Michael Barker / LMAX

Sunday, 13 November 11

Why is Concurrency So Difficult?

Sunday, 13 November 11

Program Order:

int w = 10;int x = 20;int y = 30;int z = 40;

int a = w + z;int b = x * y;

Execution Order (maybe):

int x = 20;int y = 30;int b = x * y;

int w = 10;int z = 40;int a = w + z;

Sunday, 13 November 11

Sunday, 13 November 11

Why Should We Care About the Details

?

Sunday, 13 November 11

static long foo = 0;

private static void increment() { for (long l = 0; l < 500000000L; l++) { foo++; }}

Sunday, 13 November 11

public static long foo = 0;public static Lock lock = new Lock();

private static void increment() { for (long l = 0; l < 500000000L; l++){ lock.lock(); try { foo++; } finally { lock.unlock(); } }}

Sunday, 13 November 11

static AtomicLong foo = new AtomicLong(0);

private static void increment() { for (long l = 0; l < 500000000L; l++) { foo.getAndIncrement(); }}

Sunday, 13 November 11

Cost of Contention

Increment a counter 500 000 000 times.

• One Thread : 300 ms

Sunday, 13 November 11

Cost of Contention

Increment a counter 500 000 000 times.

• One Thread : 300 ms• One Thread (volatile): 4 700 ms (15x)

Sunday, 13 November 11

Cost of Contention

Increment a counter 500 000 000 times.

• One Thread : 300 ms• One Thread (volatile): 4 700 ms (15x)• One Thread (Atomic) : 5 700 ms (19x)

Sunday, 13 November 11

Cost of Contention

Increment a counter 500 000 000 times.

• One Thread : 300 ms• One Thread (volatile): 4 700 ms (15x)• One Thread (Atomic) : 5 700 ms (19x)• One Thread (Lock) : 10 000 ms (33x)

Sunday, 13 November 11

Cost of Contention

Increment a counter 500 000 000 times.

• One Thread : 300 ms• One Thread (volatile): 4 700 ms (15x)• One Thread (Atomic) : 5 700 ms (19x)• One Thread (Lock) : 10 000 ms (33x)• Two Threads (Atomic) : 30 000 ms (100x)

Sunday, 13 November 11

Cost of Contention

Increment a counter 500 000 000 times.

• One Thread : 300 ms• One Thread (volatile): 4 700 ms (15x)• One Thread (Atomic) : 5 700 ms (19x)• One Thread (Lock) : 10 000 ms (33x)• Two Threads (Atomic) : 30 000 ms (100x)• Two Threads (Lock) : 224 000 ms (746x) ^^^^^^^^ ~4 minutes!!!

Sunday, 13 November 11

Parallel v. Serial - String Split

15

Guy Steele @ Strangle Loop:

http://www.infoq.com/presentations/Thinking-Parallel-Programming

Scala Implementation and Brute Force version in Java:

https://github.com/mikeb01/folklore/

Sunday, 13 November 11

0

500.0

1000.0

1500.0

2000.0

String Split (ops/sec) higher is better

Parallel (Scala) Serial (Java)

Sunday, 13 November 11

CPUs Are Getting Faster

17

Sunday, 13 November 11

Ya Rly!

18

0

750.0

1500.0

2250.0

3000.0

String Split

P8600 (Core 2 Duo)E5620 (Nehalem EP)i7 2667M (Sandy Bridge ULV)i7 2720QM (Sandy Bride)

Sunday, 13 November 11

What Problem Were Trying To Solve?

Sunday, 13 November 11

20

Sunday, 13 November 11

21

Sunday, 13 November 11

Why Queues Suck - Array Backed

22

Sunday, 13 November 11

Why Queues Suck - Linked List

23

Sunday, 13 November 11

Why Queues Suck - Linked List

24

Sunday, 13 November 11

Contention Free Design

25

Sunday, 13 November 11

26

Sunday, 13 November 11

How Fast Is It - Throughput

27

0

7500000.0

15000000.0

22500000.0

30000000.0

Unicast Diamond

ABQ Disruptor

Sunday, 13 November 11

How Fast Is It - Latency

28

ABQ Disruptor

Min 145 29

Mean 32,757 52

99 Percentile 2,097,152 128

99.99 Percentile 4,194,304 8,192

Max 5,069,086 175,567

Sunday, 13 November 11

How Does It Work?

Sunday, 13 November 11

Ordering and Visibility

30

private static final int SIZE = 32;private final Object[] data = new Object[SIZE];private volatile long sequence = -1;private long nextValue = -1;

public void publish(Object value) { long index = ++nextValue; data[(int)(index % SIZE)] = value; sequence = index;}

public Object get(long index) { if (index <= sequence) { return data[(int)(index % SIZE)]; } return null;}

Sunday, 13 November 11

Ordering and Visibility - Store

31

mov $0x1,%ecxadd 0x18(%rsi),%rcx ;*ladd;...lea (%r12,%r8,8),%r11 ;*getfield data;...mov %r12b,(%r11,%r10,1)mov %rcx,0x10(%rsi)lock addl $0x0,(%rsp) ;*ladd

Sunday, 13 November 11

Ordering and Visibility - Load

32

mov %eax,-0x6000(%rsp)push %rbpsub $0x20,%rsp ;*synchronization entry ; - RingBuffer::get@-1mov 0x10(%rsi),%r10 ;*getfield sequence ; - RingBuffer::get@2cmp %r10,%rdxjl 0x00007ff92505f22d ;*iflt ; - RingBuffer::get@6mov %edx,%r11d ;*l2i ; - RingBuffer::get@14

Sunday, 13 November 11

Look Ma’ No Memory Barrier

33

AtomicLong sequence = new AtomicLong(-1);

public void publish(Object value) { long index = ++nextValue; data[(int)(index % SIZE)] = value; sequence.lazySet(index);}

Sunday, 13 November 11

False Sharing - Hidden Contention

34

Sunday, 13 November 11

Cache Line Padding

35

public class PaddedAtomicLong extends AtomicLong {

public volatile long p1, p2, p3, p4, p5, p6 = 7L;

//... lines omitted

public long sumPaddingToPreventOptimisation() { return p1 + p2 + p3 + p4 + p5 + p6; }}

Sunday, 13 November 11

Summary

36

• Concurrency is a tool• Ordering and visibility are the key challenges• For performance the details matter• Don't believe everything you read

o Come up with your own theories and test them!

Sunday, 13 November 11

Q & A

recruitment@lmax.com

Sunday, 13 November 11