42
Data to Insight in a Flash: Introduction to Real-Time Analytics with WSO2 Complex Event Processor S. Suhothayan (Suho) Technical Lead WSO2 Inc.

Data to Insight in a Flash: Introduction to Real-Time Analytics with WSO2 Complex Event Processor

  • Upload
    wso2

  • View
    707

  • Download
    6

Embed Size (px)

Citation preview

Data to Insight in a Flash:

Introduction to Real-Time Analytics with WSO2 Complex Event Processor

S. Suhothayan (Suho)

Technical LeadWSO2 Inc.

CEP Is & Is NOT!

• Is NOT!

– Simple filters

• Simple Event Processing

• E.g. Is this a gold or platinum customer?

– Joining multiple event streams

• Event Stream Processing

• Is !

– Processing multiple event streams

– Identify meaningful patterns among streams

– Using temporal windows

• E.g. Notify if there is a 10% increase in overall trading

activity AND the average price of commodities has fallen 2%

in the last 4 hours

What is ?

Architecture

Architecture

Event Streams

• Event stream is a sequence of events

• Event streams are defined by Stream Definitions

• Events streams have in-flows and out-flows

– Inflows can be from

• Event builders

Converts incoming XML, JSON, etc events to event

stream

• Execution plans

– Outflows are to

• Event formatters

Converts to event stream to XML, JSON, etc events

• Execution plans

Stream Definition

{

'name':'soft.drink.coop.sales', 'version':'1.0.0',

'nickName': 'Soft_Drink_Sales', 'description': 'Soft drink sales',

'metaData':[

{'name':'region','type':'STRING'}

],

'correlaitonData':[

{'name':’transactionID’,'type':'STRING'}

],

'payloadData':[

{'name':'brand','type':'STRING'}, {'name':'quantity','type':'INT'},

{'name':'total','type':'INT'}, {'name':'user','type':'STRING'}

]

}

Receiving and Publishing Events

Event Adaptors

● For receiving and publishing events

● Has the configurations to connect to external endpoints

● Has many-to-one relationship with Event Streams

Event Adaptors

Support for several transports (network access)

● SOAP

● HTTP

● JMS

● SMTP

● SMS

● Thrift

● Kafka

● File

● Websocket

Supports publishing date to databases

● Cassandra

● MYSQL

● H2

● MSSQL

● Oracle

Supports custom event adaptors via its pluggable architecture!

Event Format

• Standard event formats are available for receiving and publishing

events

– XML

– JSON

– Text

– Map

– WSO2 Event

• If events adhere to the standard format

they do not need data mapping.

• If events do not adhere

custom event mapping should be configured in

Event builder & Event Formatter

appropriately.

Event Format

Standard XML event format

<events>

<event>

<metaData>

<tenant_id>2</tenant_id>

</metaData>

<correlationData>

<activity_id>ID5</activity_id>

</correlationData>

<payloadData>

<clientPhoneNo>0771117673</clientPhoneNo>

<clientName>Mohanadarshan</clientName>

<clientResidenceAddress>15, Alexendra road,

California</clientResidenceAddres>

<clientAccountNo>ACT5673</clientAccountNo>

</payloadData>

</event>

<events>

Processing Events

Execution Plan

● Is an isolated logical execution unit

● Each execution plan imports some of the event streams available

in CEP and defines the execution logic using queries and exports

the results as output event streams.

● Has one-to-one relationship with CEP Backend Runtime (Siddhi).

https://github.com/wso2/siddhi

● Has many-to-many relationship with Event Streams.

● Each execution plan spawns a Siddhi Engine Instance.

CEP Solution patterns

1. Transformation - project, translate, enrich, split

2. Filter

3. Composition / Aggregation / Analytics

● basic stats, group by, moving averages

1. Join multiple streams

2. Detect patterns

● Coordinating events over time

● Trends - increasing, decreasing, stable, non-increasing, non-

decreasing, mixed

1. Blacklisting

2. Building a profile

Siddhi Query Structure

define stream <event stream>

(<attribute> <type>,<attribute> <type>, ...);

from <event stream>

select <attribute>,<attribute>, ...

insert into <event stream> ;

Siddhi Query

define stream SoftDrinkSales

(region string, brand string, quantity int,

price double);

from SoftDrinkSales

select brand, quantity * price as totalCost

insert into TotalCostStream ;

from TotalCostStream

select brand, toUSD(totalCost) as totalCostInUSD,

‘USD’ as currency

insert into OutputStream ;

Siddhi Query: Filter and window

define stream SoftDrinkSales

(region string, brand string, quantity int,

price double);

from SoftDrinkSales

[quantity > 99]#window.time(1 hour)

select region, brand, avg(quantity) as avgQuantity

group by region, brand

insert into AvgWholeSales ;

Siddhi Query: Partition

define stream SoftDrinkSales

(region string, brand string, quantity int,

price double);

partition with (region of SoftDrinkSales)

begin

from SoftDrinkSales

[quantity > 99]#window.length(100)

select region, brand,

avg(quantity) as avgQuantity

insert into AvgWholeSales ;

end;

Siddhi Query: Pattern

define stream Purchase(price double,cardNo long,place string);

from every (a1 = Purchase[price < 10] ) ->

a2 = Purchase[price >10000 and a1.cardNo == a2.cardNo]

within 1 day

select a1.cardNo as cardNo, a2.price as price, a2.place as place

insert into PotentialFraud;

● Matches events arriving in order,

● Sequence is used to matching immediate next events arriving in

order.

Siddhi Query: Event Tables

define stream Purchase (price double, cardNo long, place string);

define stream NewUser (userName string, cardNo long, time long) ;

define table CardUserTable (name string, cardNum long) ;

from NewUser

select userName as name, cardNo as cardNum

insert into CardUserTable ;

from Purchase#window.length(1) join CardUserTable

on Purchase.cardNo == CardUserTable.cardNum

select Purchase.cardNo as cardNo,

CardUserTable.name as name,

Purchase.price as price

insert into PurchaseUserStream ;

● Similarly update and delete can be done

● Event tables can be backed by an RDBMs Database

Siddhi Query Extensions

● Function extension

● Aggregator extension

● Window extension

● Transform extension

from SoftDrinkSales#window.time(30 min)

select brand,

custom:stdev(quantity) as stdevQuantity

insert into OutputStream ;

Monitoring & Debugging

Event Flow

● Visualization of the Event Stream flow in CEP

● Helps to get the big picture

● Good for debugging

Event Tracer

• Dump message traces in a textual format

• Before and after processing each stage of event flow

Event Statistics

• Real-time statistics

• via visual illustrations & JMX

• Time based request & response counts

• Stats on all components of CEP server

Performance Results

• Same JVM Performance (Siddhi with Esper, M means a Million)

4 core machine

– Filters 8M Events/Sec vs Esper 2M

– Window 2.5M Events/Sec vs. Esper 1M

– Patterns 1.4M Events/Sec about 10X faster than Esper

• Over the Network Performance (Using thrift based WSO2 event

format) - 8 core machine

– Filter 0.25M (or 250K) Event/Sec

Lambda Architecture

High Availability

• Option 1: Side by side

– Recommended

– Takes 2X hardware

– Gives zero down time

• Option 2: Snapshot and restore

– Uses less HW

– Will lose events between snapshots

– Downtime while recovery

– ** Some scenarios you can use event tables to keep intermediate state

WSO2 CEP 4.0

• Apache Storm integration (to make WSO2 CEP highly scalable)

• Rewrite of Siddhi

– Single language for scalable and single node deployment

– Achieve maximum parallelism

• Geofencing support

– With management dashboard

• Time series and regression support

• Natural language & sentimental analysis support

• Integration to machine learning model (PMML models)

WSO2 CEP 4.0 - Milestone 1 released

Pack:http://svn.wso2.org/repos/wso2/people/mohan/CEP4.0.0-M1/wso2cep-4.0.0-M1.zip

Docs : https://docs.wso2.com/display/CEP400

Scalable WSO2 CEP Deployment

from CEP 4.0…

https://docs.wso2.com/display/CEP400/Clustered+Deployment

Geo Dashboard

With configurable alerting &

Monitoring capabilities.

http://wso2.com/library/articles/2015/01/article-geo-spatial-data-analysis-using-wso2-

complex-event-processor-0/

Natural Language Processing

Understanding the sentences &

Analyzing sentiments.

● Uses Stanford NLP.

● Adaptors for UIMA is also available.

https://github.com/wso2-gpl/siddhi/tree/master/siddhi-extensions/nlp

NLP Extentions

● findNameEntityType(entityType:string, groupSuccessiveEntities:boolean, text:string)

Extract nouns in the text, which match any predefined entity type such as PERSON, LOCATION, DATE...etc.

● findNameEntityTypeViaDictionary(entityType:string, dictionaryFilePath:string, text:string)

Extract all matches in the text, for entries defined in the dictionary xml file under the given entity type

● findRelationshipByRegex(regex:string, text:string)

Extract (subject, object, verb) relationship from the text, that match the given regular expression.

● findRelationshipByVerb(verb:string, text:string)

Extract (subject, object, verb) relationship from the text that match any form of the verb.

● findTokensRegexPattern(regex, text)

Extract phrases that match the given NLP regular expression pattern

● findSemgrexPattern(regex, text)

Extract words that match the given grammatical relationship regular expression pattern

Machine Learning

Using R, PMML Models for real-time predictive analysis

http://wso2.com/library/tutorials/2014/08/tutorial-implementing-a-wso2-cep-extension-to-run-machine-learning-models-written-in-pmml-format/

http://wso2.com/library/articles/2014/11/article-real-time-intruder-detection-with-r-pmml-and-wso2-cep/

Case study: Smart Energy

•DEBS (Distributed Event Based Systems)

academic conference 2014, yearly event

processing challenge

•Smart Home electricity data: 2000 sensors, 40

houses, 4 Billion events

•WSO2 CEP based solution is one of the four

finalists (Others Dresden University of

Technology and Fraunhofer Institute (Germany),

and Imperial College London)

•We posted fastest single node solution

measured (400K events/sec) and close to one

million distributed throughput.

Case study: Realtime Soccer Analytics

From DEBS 2013 …

http://www.slideshare.net/hemapani/analyzing-a-soccer-game-with-wso2-cep

Siddhi Query: Pattern

● Filters or transformations (process a single event)from Ball[v>10] select .. insert into ..

● Windows + aggregation (track window of events: time, length)from Ball#window.time(30s) select avg(v) ..

● Joins (join two event streams to one)from Ball#window.time(30s) as b join Players as p

on p.v < b.v

● Patterns (state machine implementation)from Ball[v>10], Ball[v<10]*,Ball[v>10] select ..

● Event tables (map a database as an event stream)Define table HitV (v double) using .. db info ..

Running Stats

partition with (id of Players)

begin

from s = Players [v <= 1 or v > 11] ,

t = Players [v > 1 and v <= 11]+ ,

e = Players [v <= 1 or v > 11]

select s.ts as tsStart, e.ts as tsStop,

s.id as playerId , “trot” as intensity,

t [0].v as instantSpeed ,

(e.ts - s.ts )/1000000000 as unitPeriod

insert into RunningStats ;

end;

Detect kicks & Shot on Goals

Detect kicks on the ball, calculate direction after 1m,

and keep giving updates as long as it is in right

direction

Questions ?

Contact us !