Upload
signalfx
View
75
Download
1
Embed Size (px)
Citation preview
SignalFx
SignalFx
Getting to 20x Performance Improvement on our Data Routing Layer
Rajiv Kurian, Software [email protected]
Agenda
1. Introduction2. Properties of modern memory systems3. Evolution of our data router4. Results5. Q&A (hopefully)
SignalFx
What does SignalFx do?
• High resolution: • Any mix of resolutions up to 1 sec
• Streaming analytics: • custom analytics pipelines at any scale• Streaming dashboards update within seconds
• Multidimensional metrics: • add dimensions to model metrics however you like• Use them to aggregate & filter (e.g. 99th-percentile-of-latency-by-
service-by-customer) interactively on streaming data
SignalFx is an advanced monitoring platform for modern applications
SignalFx
What is the data routing layer
SignalFx data routerRaw data in Processed data out
PUBLISHER0
SUBSCRIBER 1
SUBSCRIBER 0
SUBSCRIBER 2
PUBLISHER1
PUBLISHER2
Time Series ID: 1212450
Payload: 0b1000100010
SignalFx data router - subscribers
Subscriptions
PUBLISHER0
SUBSCRIBER 1
SUBSCRIBER 0
SUBSCRIBER 2
PUBLISHER1
PUBLISHER2
Subscriber ID: 1224525566
Time Series ID: 1212450
Routing table
Routing table
Key: 128759 Set<Subscriber>
Key Subscribers
Routing data
SignalFx
Properties of modern memory systems
SignalFx Main memory
L1 D L1 I
L3
L1 D L1 I
L2L2
CORE 1 CORE 2
11
1
1
Cache Lines
•The memory subsystem makes a few bets to help us:•Temporal locality•Spatial locality•Prefetching
SignalFx
L3
L2L2
CORE 1 CORE 2
L1 L1
Main memory1
1
1
2
1
2
2
2
1 2
SignalFx
L1 L1
L2L2
L3
CORE 1 CORE 2
Main memory 1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8
2
1 2 3 4 5 6 7 8
1 4 3 6 8 7 5
SignalFx
L1 CORE
SignalFx
L2 CORE
SignalFx
MainMemory CORE
SignalFx
The evolution of our data routing layer
Routing table
Routing table
Key: 128759 Set<Subscriber>
Key Subscribers
Routing table v1
HashMap<Long, HashSet<Subscriber>>
Subscriber Objects
Data Key Set<Subscriber>
1212450 {1228, 4412}
3989 {12244}
8921224 {3244}
245819 {3244, 12244, 1228}
Subscriber ID Host Port
1228 …. ….
Subscriber ID Host Port
12244 …. ….
Subscriber ID Host Port
4412 …. ….
Subscriber ID Host Port
3244 …. ….
But …
We want to be able to support millions of subscriptions per publisher, while doing more than 2 million queries per second
Set<Subscriber>Boxed long
key* value*key* value*
List
List
List
List
HashMap <Long, HashSet<Subscriber>>
1
2
3 4
????
So why did we need a better data router?
• Look ups are O(1) ….• Cache misses • High memory overhead
Routing table v2 - bloom filters
A Bloom filter is a space-efficient probabilistic data structure that is used to test whether an element is a member of a set.
False positive matches are possible, but false negatives are not, thus a Bloom filter has a 100% recall rate
SignalFx
Routing table v2 - write
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Subscriber bloom filter
Hash 1 Hash 2 Hash 3
3 9 12
127829
0 0 0 1 0 0 0 0 0 1 0 0 1 0 0 0
SignalFx
0 0 0 1 0 0 0 0 0 1 0 0 1 0 0 0
Routing table v2 - read hit
Subscriber bloom filter
Hash 1 Hash 2 Hash 33 9 12
127829
0 0 0 1 0 0 0 0 0 1 0 0 1 0 0 0
SignalFx
0 0 0 1 0 0 0 0 0 1 0 0 1 0 0 0
Routing table v2 - read miss
Subscriber bloom filter
Hash 1 Hash 2 Hash 33 9 14
120422
0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0
long 0 long 1 long 2 long 3
long 4 long 5 long 6 long 7
long 8 long 9 long 10 long 11
long 12 long 13 long 14 long 15
long 16 long 17 long 18 long 19
long 20 long 21 long 22 long 23
long 24 long 25 long 26 long 27
long 28 long 29 long 30 long 31
long 32 long 33 long 34 long 35
long 36 long 37 long 38 long 39
1
2
3
Typical bloom filter get lookupKey Hash 1 Hash 2 Hash 3
43 168 312
Bloom Filter 1long 4 long 5 long 6
long long long long long long long long long long long long long long long long long long long long long long long long long long long long long long long long
Routing table v2Key Hash 1 Hash 2 Hash 3
43 168 312
Bloom Filter 2long 4 long 5 long 6
long long long long long long long long long long long long long long long long long long long long long long long long long long long long long long long long
Bloom Filter 2long 4 long 5 long 6
long long long long long long long long long long long long long long long long long long long long long long long long long long long long long long long long
Bloom Filter 4long 4 long 5 long 6
long long long long long long long long long long long long long long long long long long long long long long long long long long long long long long long long
Bloom Filter 5long 4 long 5 long 6
long long long long long long long long long long long long long long long long long long long long long long long long long long long long long long long long
Bloom Filter 6long 4 long 5 long 6
long long long long long long long long long long long long long long long long long long long long long long long long long long long long long long long long
1 2 3
1 2 3
1 2 3
1 2 3
1 2 3
1 2 3 num_sub * 3
cache misses
Progress so far
GetSubscribers() Memory
Naive hash map O(1) high
Bloom filter O(num_subscribers) low
So why did we need a better data router?
• CPU Intensive• What did the profiler say? Data
router -> 32%
• Scaled poorly• CPU performance got worse with
the number of subscribers
So how can we do better?
Specialize - we have a limited number of subscribers present at any time. Fewer than 128
ID transformation
Subscriber ID
1228
4412
…
…
12244
3244
Subscriber ID
0
1
2
…
…
127
subscribercoordination
publisherassignment
Producer Routing table
Data Key(8 bytes) Set<Subscriber>
Subscriber ID(0 - 127) Key (64 bit)
0 3890
subscribe message
Routing table V3
0000000000…..00013890
16 bytes bit set
Boxed long
key* value*key* value*
List
List
List
List
Routing table V3 - regular hash map
1
2
3 4
long 1 long 2
Routing table V4 - single array of longsEmpty
Empty
Empty
Empty
Empty
Empty
Empty
Empty
Empty
Empty
Empty
Empty
Key
Value 0-63
Value 64-127
Routing table V4 - single array of longsKey 0 hash 0 Empty
Empty
Empty
Empty
Empty
Empty
Empty
Empty
Empty
Empty
Empty
Empty
Key 0
Value 0-63
Value 64-127
Empty
Empty
Empty
Empty
Empty
Empty
Empty
Empty
Empty
Routing table V4 - single array of longsKey 0 hash 0
Routing table V4 - single array of longsKey 1 hash 0 Key 0
Value 0-63
Value 64-127
Empty
Empty
Empty
Empty
Empty
Empty
Empty
Empty
Empty
Routing table V4 - single array of longsKey 1 hash 0 Key 0
Value 0-63
Value 64-127
Key 1
Value 0-63
Value 64-127
Empty
Empty
Empty
Empty
Empty
Empty
Routing table V4 - single array of longsKey 2 hash 3 Key 0
Value 0-63
Value 64-127
Key 1
Value 0-63
Value 64-127
Empty
Empty
Empty
Empty
Empty
Empty
Key 0
Value 0-63
Value 64-127
Key 1
Value 0-63
Value 64-127
Empty
Empty
Empty
Key 3
Value 0-63
Value 64-127
Routing table V4 - single array of longsKey 2 hash 3
Routing table V4 - single array of longsKey 0
Value 0-63
Value 64-127
Key 1
Value 0-63
Value 64-127
Empty
Empty
Empty
Key 3
Value 0-63
Value 64-127
1 Key 1 hash 0
Routing table V4 - single array of longs
Key 0
Value 0-63
Value 64-127
Key 1
Value 0-63
Value 64-127
Key 2
Value 0-63
Value 64-127
Subscribers Array
Subscriber 0Subscriber 1Subscriber 2Subscriber 3Subscriber 4
…Subscriber 127
BitSet024
127
Key 1 hash 0
Key 0
Value 0-63
Value 64-127
Key 1
Value 0-63
Value 64-127
Key 2
Value 0-63
Value 64-127
Progress so far
GetSubscribers() Memory
Naive hash map O(1) high
Bloom filter O(num_subscribers) low
Optimized hash map O(1) medium
SignalFx
Results(library)
Microbenchmark• Method:
• Heap: 3G• Number of subscribers: 128• Number of time series: 1048576• All time series have a random number of subscribers: [1, 128]• 2 million random queries
Writes Reads
Naive hash map 34469 ms (42x) 11900 ms (21x)
Bloom filter 31710 ms (39x) 54995 ms (97x)
Optimized hash map 805 ms (1x) 565 ms (1x)
Memory
2.6 GB (27x)
80 MB (0.83x)
96 MB (1x)
SignalFx
Results(Application)
SignalFx
CPU %
SignalFx
CPU %
6 subscribers45 %
SignalFx
Garbage collection
SignalFx
Garbage collection
6 subscribers63 %
Closing remarks / rant
• “Write code first, optimize later”….
• Analyze your data• Metrics• Logging
SignalFx
Thank You!Rajiv Kurian
[email protected]@rzidane360
WE’RE [email protected]
@SignalFx - signalfx.com
SignalFx
Q & A