Upload
trisha-gee
View
5.512
Download
1
Embed Size (px)
DESCRIPTION
Presented to the London Java Community on the 11th October 2011.
Citation preview
Understanding the Disruptor
A Beginner's Guide to Hardcore Concurrency
Why is concurrency so difficult
?
Ordering
Program Order:
int w = 10;int x = 20;int y = 30;int z = 40;
int a = w + z;int b = x * y;
Execution Order (maybe):
int x = 20;int y = 30;int b = x * y;
int w = 10;int z = 40;int a = w + z;
Visibility
Why should we care about the details
?
Increment a Counter
static long foo = 0;
private static void increment() { for (long l = 0; l < 500000000L; l++) { foo++; }}
Using a Lock
public static long foo = 0;public static Lock lock = new Lock();
private static void increment() { for (long l = 0; l < 500000000L; l++) { lock.lock(); try { foo++; } finally { lock.unlock(); } }}
Using an AtomicLong
static AtomicLong foo = new AtomicLong(0);
private static void increment() { for (long l = 0; l < 500000000L; l++) { foo.getAndIncrement(); }}
The Cost of Contention
Increment a counter 500 000 000 times.
● One Thread : 300 ms
The Cost of Contention
Increment a counter 500 000 000 times.
● One Thread : 300 ms● One Thread (volatile): 4 700 ms (15x)
The Cost of Contention
Increment a counter 500 000 000 times.
● One Thread : 300 ms● One Thread (volatile): 4 700 ms (15x)● One Thread (Atomic) : 5 700 ms (19x)
The Cost of Contention
Increment a counter 500 000 000 times.
● One Thread : 300 ms● One Thread (volatile): 4 700 ms (15x)● One Thread (Atomic) : 5 700 ms (19x)● One Thread (Lock) : 10 000 ms (33x)
The Cost of Contention
Increment a counter 500 000 000 times.
● One Thread : 300 ms● One Thread (volatile): 4 700 ms (15x)● One Thread (Atomic) : 5 700 ms (19x)● One Thread (Lock) : 10 000 ms (33x)● Two Threads (Atomic) : 30 000 ms (100x)
The Cost of Contention
Increment a counter 500 000 000 times.
● One Thread : 300 ms● One Thread (volatile): 4 700 ms (15x)● One Thread (Atomic) : 5 700 ms (19x)● One Thread (Lock) : 10 000 ms (33x)● Two Threads (Atomic) : 30 000 ms (100x)● Two Threads (Lock) : 224 000 ms (746x)
^^^^^^^^ ~4 minutes!!!
Parallel v. Serial - String Splitting
Guy Steele @ Strangle Loop:
http://www.infoq.com/presentations/Thinking-Parallel-Programming
Scala Implementation and Brute Force version in Java:
https://github.com/mikeb01/folklore/
Performance Test
Parallel (Scala): 440 ops/secSerial (Java) : 1768 ops/sec
CPUs Are Getting Faster
Single threaded string split on different CPUs
What problem were we trying to solve
?
Classic Approach to the Problem
The Problems We Found
Why Queues Suck
Why Queues Suck - Linked List
Why Queues Suck - Linked List
Contention Free Design
Now our Pipeline Looks Like...
How Fast Is It - Throughput
How Fast Is It - Latency
ABQ Disruptor
Min Latency (ns) 145 29
Mean Latency (ns) 32 757 52
99 Percentile Latency (ns) 2 097 152 128
99.99 Percentile Latency (ns) 4 194 304 8 192
Max Latency (ns) 5 069 086 175 567
How does it all work
?
Ordering and Visibility
private static final int SIZE = 32; private final Object[] data = new Object[SIZE]; private volatile long sequence = -1; private long nextValue = -1;
public void publish(Object value) { long index = ++nextValue; data[(int)(index % SIZE)] = value; sequence = index; }
public Object get(long index) { if (index <= sequence) { return data[(int)(index % SIZE)]; } return null; }
Ordering and Visibility - Store
mov $0x1,%ecxadd 0x18(%rsi),%rcx ;*ladd;...lea (%r12,%r8,8),%r11 ;*getfield data;...mov %r12b,(%r11,%r10,1)mov %rcx,0x10(%rsi)lock addl $0x0,(%rsp) ;*ladd
Ordering and Visibility - Load
mov %eax,-0x6000(%rsp)push %rbpsub $0x20,%rsp ;*synchronization entry ; - RingBuffer::get@-1 (line 17)mov 0x10(%rsi),%r10 ;*getfield sequence ; - RingBuffer::get@2 (line 17)cmp %r10,%rdxjl 0x00007ff92505f22d ;*iflt ; - RingBuffer::get@6 (line 17)mov %edx,%r11d ;*l2i ; - RingBuffer::get@14 (line 19)
Look Ma' No Memory Barrier
AtomicLong sequence = new AtomicLong(-1);
public void publish(Object value) { long index = ++nextValue; data[(int)(index % SIZE)] = value; sequence.lazySet(index);}
False Sharing - Hidden Contention
Cache Line Padding
public class PaddedAtomicLong extends AtomicLong {
public volatile long p1, p2, p3, p4, p5, p6 = 7L;
//... lines omitted
public long sumPaddingToPreventOptimisation() { return p1 + p2 + p3 + p4 + p5 + p6; }}
In Summary
● Concurrency is a tool● Ordering and visibility are the key challenges● For performance the details matter● Don't believe everything you read
○ Come up with your own theories and test them!
Q & A