17
DEBS Grand Challenge: Continuous Analytics on Geospatial Data Streams with WSO2 Complex Event Processor Sachini Jayasekara, Srinath Perera, Miyuru Dayarathna, Sriskandarajah Suhothayan WSO2 Inc.

ACM DEBS Grand Challenge: Continuous Analytics on Geospatial Data Streams with WSO2 Complex Event Processor

Embed Size (px)

Citation preview

Page 1: ACM DEBS Grand Challenge: Continuous Analytics on Geospatial Data Streams with WSO2 Complex Event Processor

DEBS Grand Challenge: Continuous Analytics on Geospatial Data

Streams with WSO2 Complex Event Processor

Sachini Jayasekara, Srinath Perera, Miyuru Dayarathna,

Sriskandarajah SuhothayanWSO2 Inc.

Page 2: ACM DEBS Grand Challenge: Continuous Analytics on Geospatial Data Streams with WSO2 Complex Event Processor

Problem

o Dataset Taxi rides collected from New York in year 2013[1]

o Each line has timestamp, start end locations, fare details etc.

o 13K cars, 173 million eventso 2 Queries o Queries based on 0.5km and

0.25km cells over New York.

[1]. Chris Whong (http://chriswhong.com/open-data/foil_nyc_taxi/)

Page 3: ACM DEBS Grand Challenge: Continuous Analytics on Geospatial Data Streams with WSO2 Complex Event Processor

CEP Operators

1. Filters or transformations (process a single event) from Ball[v>10]

select .. insert into ..

2. Windows + aggregation (track window of events: time, length)

from Ball#window.time(30s) select avg(v) ..

3. Joins (join two event streams to one) from Ball#window.time(30s) as b join Players as p on p.v < b.v

4. Patterns (state machine implementation)from Ball[v>10], Ball[v<10]*,Ball[v>10] select ..

5. Event tables (map a database as an event stream)Define table HitV (v double) using .. db info ..

Page 4: ACM DEBS Grand Challenge: Continuous Analytics on Geospatial Data Streams with WSO2 Complex Event Processor

Complex Event Processing

see http://goo.gl/BaPFYA for more info.

Page 5: ACM DEBS Grand Challenge: Continuous Analytics on Geospatial Data Streams with WSO2 Complex Event Processor

Query 1: Frequent Routes

o Output 10 most frequent routes in last 30 minutes o Need to output when value has changed ( current

time derived from event’s timestamp attribute)

Page 6: ACM DEBS Grand Challenge: Continuous Analytics on Geospatial Data Streams with WSO2 Complex Event Processor

Query 2: Profitable Areas

o Find the cells that are most profitable for taxi drivers at the given moment.

o Profitability = median (fare + tip) for last 15 minutes divided by the number of taxi drivers who have dropped-off and have not taken a new trip in the last 30 minutes per cell.

Page 7: ACM DEBS Grand Challenge: Continuous Analytics on Geospatial Data Streams with WSO2 Complex Event Processor

Optimizations

o WSO2 CEPo Object Poolingo Only keep required Attributes (e.g., in window)

o Algorithmic o String Lookupo Reusing windowso Avoid Joino FrequentKo Counting Patterno Median (Bucket)

o Fully use the computer

Page 8: ACM DEBS Grand Challenge: Continuous Analytics on Geospatial Data Streams with WSO2 Complex Event Processor

Avoid Joins

o Q2 process median and taxi counting in parallelo But join is expensive due to orderingo Instead, calculate median, enrich the event with

results, use enriched event to calculate empty taxi, then divide median by empty taxi without a join.

Page 9: ACM DEBS Grand Challenge: Continuous Analytics on Geospatial Data Streams with WSO2 Complex Event Processor

Taxi Counting Pattern Optimizations

o Query creates a state machine to track taxi’s state, and update counts accordingly

o Slow with CEP pattern as it searches all states to check for expiration

o Fixed by keeping states sorted by starting time (2X improvement)

Page 10: ACM DEBS Grand Challenge: Continuous Analytics on Geospatial Data Streams with WSO2 Complex Event Processor

Fully use the Computer

o So far, we remove unnecessary operations!!o Now we have to use all 4 cores of the VM o How?

o Data Partitiono Pipeline o Pipeline with single buffer

Page 11: ACM DEBS Grand Challenge: Continuous Analytics on Geospatial Data Streams with WSO2 Complex Event Processor

Data Partition : Issues

o Need to reorder and send timing updateso But savings due to partition is small (e.g. frequentK is O(log

(n)) and execution in a partition take O(log(n/p))o All savings lost when reordering

Page 12: ACM DEBS Grand Challenge: Continuous Analytics on Geospatial Data Streams with WSO2 Complex Event Processor

Execution Pipeline

o Break different stages to a pipelineo Now we can use 6 threads ( 1 and 6 does IO so OK)o 125K/sec now, but 50ms latency

o Bottleneck is moving events between queues

Page 13: ACM DEBS Grand Challenge: Continuous Analytics on Geospatial Data Streams with WSO2 Complex Event Processor

Circular Buffer based Pipeline

o One circular buffer with sequence barriers using LMAX disruptor

o Avoid cost of moving events, reduce GC, and works well with the cache

o 2X more throughput and 0ms latency

Page 14: ACM DEBS Grand Challenge: Continuous Analytics on Geospatial Data Streams with WSO2 Complex Event Processor

Results

o Pretty good on real HW (8 core) and AWS ( 4 core), but not as good on VirtualBox ( 4 core)

o Can run on 512M heap size with only 10% slowdown

Page 15: ACM DEBS Grand Challenge: Continuous Analytics on Geospatial Data Streams with WSO2 Complex Event Processor

Results: Speedup vs. Concurrency

o Compared against single node versiono Real HW scaled well, AWS less and VM scale up was

very small

Page 16: ACM DEBS Grand Challenge: Continuous Analytics on Geospatial Data Streams with WSO2 Complex Event Processor

Results: Latency vs. Throughput

o each point is (env, thread count, size of buffer)

Page 17: ACM DEBS Grand Challenge: Continuous Analytics on Geospatial Data Streams with WSO2 Complex Event Processor

Conclusion

o All changes except final circular buffer in WSO2 CEP 4.0 ( released 2015 Q3)

o WSO2 CEP is free and available under Apache Open source Licence

o Fast and flexible, and already used in many critical use cases.