23
1 Fast Data Mining Real Time Knowledge Discovery for Predictive Decision Making Nino Guarnacci [email protected]

Fast Data Mining: Real Time Knowledge Discovery for Predictive Decision Making

Embed Size (px)

DESCRIPTION

Fast Data as a different approach to Big Data for managing large quantities of “in-flight” data that help organizations get a jump on those business-critical decisions. Difference between Big Data and Fast Data is comparable to the amount of time you wait downloading a movie from an online store and playing the dvd instantly. Data Mining as a process to extract info from a data set and transform it into an understandable structure in order to deliver predictive, advanced analytics to enterprises and operational environments. The combination of Fast Data and Data Mining are changing the “Rules”

Citation preview

Page 1: Fast Data Mining: Real Time Knowledge Discovery for Predictive Decision Making

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.!1

Fast Data Mining Real Time Knowledge Discovery for Predictive Decision Making

Nino Guarnacci [email protected]

Page 2: Fast Data Mining: Real Time Knowledge Discovery for Predictive Decision Making

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.!2

Data Explosion

Web & social networks experienced it first…

Infographic by Go-gulf.com

Page 3: Fast Data Mining: Real Time Knowledge Discovery for Predictive Decision Making

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.!3

… but enterprises are now facing it too … but enterprises are also facing it now

Utilities deploying smart meters? ! 200x information flowing to data center!

• Services and web transaction data (to refine recommendations, detect trends etc.)

• “Sensor” data: • GPS in mobile phones • RFIDs • NFC • SmartMeters • Etc.

• Log file monitoring and analysis • Security monitoring

Page 4: Fast Data Mining: Real Time Knowledge Discovery for Predictive Decision Making

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.!4

93% believe their organization is losing revenue as a result of not being able to fully leverage information67%89%

executives who say drawing intelligence from data is top priority

executives who would grade themselves C or lower in preparedness

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.6 Source: Oracle Research Study - From Overload to Impact: An Industry Scorecard on Big Data Business Challenges, July 2012

Page 5: Fast Data Mining: Real Time Knowledge Discovery for Predictive Decision Making

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.!5

Obstacles to Faster Manage Data – Latency GapWhile Ensuring Accuracy, Efficiency, and Scale

Business event

Action Time

Bus

ines

s Va

lue

Data captured

Analysis completed

Action taken

Fragmented event entities

Source: Richard Hackethorn’s Component’s of Action Time

The Gap

Page 6: Fast Data Mining: Real Time Knowledge Discovery for Predictive Decision Making

!6

Obstacles to Faster Manage Data – Latency GapWhile Ensuring Accuracy, Efficiency, and Scale

Business event

Action Time

Bus

ines

s Va

lue

Data captured

Analysis completed

Action taken

Fragmented event entities

Source: Richard Hackethorn’s Component’s of Action Time

The Gap

Page 7: Fast Data Mining: Real Time Knowledge Discovery for Predictive Decision Making

!7

What is Fast Data?Turning High Velocity Data into Value

▪ It’s about getting more from in-flight data ▪ It’s about faster action, faster insights ▪ It’s about running your business in real-time

Page 8: Fast Data Mining: Real Time Knowledge Discovery for Predictive Decision Making

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.!8

Oracle Fast Data ApproachFilter, Move, Transform, Analyze, and Act at High Velocity

ACTANALYZE

MOVE & TRANSFORM

FILTER & CORRELATE

Page 9: Fast Data Mining: Real Time Knowledge Discovery for Predictive Decision Making

!9

Oracle Fast Data ApproachFilter, Move, Transform, Analyze, and Act at High Velocity

In-Memory Data GridNetwork Status

Real Time Streams

Information

FILTER & CORRELATE

• Parallel Multiple Streams: jms, files, coherence, db,.. • Different Object Type: text, java object…

• High throughput for data Aggregation and Event Querying

Coherence Data Grid holds the data and compute in parallel

Page 10: Fast Data Mining: Real Time Knowledge Discovery for Predictive Decision Making

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.!10

HTTP Pub/Sub

JSON

- Eve

nt S

trea

ms

-

Adapter Cache ProcessorPOJO

EPN (Event Processing Network) Elements

Channel

Event-type

Event-typeEvent-type

Oracle Fast Data ApproachFilter, Move, Transform, Analyze, and Act at High Velocity

Page 11: Fast Data Mining: Real Time Knowledge Discovery for Predictive Decision Making

<TRACE> <ID_TRACED_ENTITY>HH310665064IT</ID_TRACED_ENTITY> <TRACED_ENTITY>PACCO</TRACED_ENTITY> <WHAT_HAPPENED>ESI_SDA</WHAT_HAPPENED> <WHEN_HAPPENED>2013-09-12</WHEN_HAPPENED> <WHERE_HAPPENED_DETAIL> <OFFICE> <WHERE_DESCRIPTION>MONZA</WHERE_DESCRIPTION> <WHERE_ID>MZ</WHERE_ID> </OFFICE> </WHERE_HAPPENED_DETAIL> </TRACE>

<TRACE> <ID_TRACED_ENTITY>HH310665064IT</ID_TRACED_ENTITY> <TRACED_ENTITY>PACCO</TRACED_ENTITY> <WHAT_HAPPENED>ESI_SDA</WHAT_HAPPENED> <WHEN_HAPPENED>2013-09-12</WHEN_HAPPENED> <WHERE_HAPPENED_DETAIL> <OFFICE> <WHERE_DESCRIPTION>MONZA</WHERE_DESCRIPTION> <WHERE_ID>MZ</WHERE_ID> </OFFICE> </WHERE_HAPPENED_DETAIL> </TRACE>

<TRACE> <ID_TRACED_ENTITY>HH310665064IT</ID_TRACED_ENTITY> <TRACED_ENTITY>PACCO</TRACED_ENTITY> <WHAT_HAPPENED>ESI_SDA</WHAT_HAPPENED> <WHEN_HAPPENED>2013-09-12</WHEN_HAPPENED> <WHERE_HAPPENED_DETAIL> <OFFICE> <WHERE_DESCRIPTION>MONZA</WHERE_DESCRIPTION> <WHERE_ID>MZ</WHERE_ID> </OFFICE> </WHERE_HAPPENED_DETAIL> </TRACE>

SELECT M.SLA_VIOLATED FROM TRACE IN CHANNEL, ENTITIES, SPATIAL CONTEXT MATCH_RECOGNIZE ( MEASURES SLA_VIOLATED PATTERN (A B) DEFINE A (DELIVERY TIME - NOW) < 2 DAYS B DISTANCE BETWEEN (LOCATION, DESTINATION) > 600 KM ) as M

STREAMS

DATABASE

SPATIAL

TIME WINDOW

Oracle Event ProcessingSLA Detection: Pattern Matching

Match Pattern= R 7 ◆

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

Page 12: Fast Data Mining: Real Time Knowledge Discovery for Predictive Decision Making

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.

Oracle Event ProcessingSLA Detection: Filtering & Correlation

ISTREAM( SELECT COUNT(*), START_OFFICE, WHERE_HAPPEND, LATITUDE, LONGITUDE FROM SPATIAL_CONTEXT SLA_VIOLATED_OUT_CHANNEL PARTITION BY START_OFFICE, WHERE_HAPPENED WITHIN 1 HOUR GROUP BY START_OFFICE HAVING COUNT(*) > 5 )

▪ Aggregate and Correlate received filter-events

▪ Partition by Trip-Path probable SLA violations

SELECT M.SLA_VIOLATED FROM TRACE IN CHANNEL, ENTITIES, SPATIAL CONTEXT MATCH_RECOGNIZE ( MEASURES SLA_VIOLATED PATTERN (A B) DEFINE A (DELIVERY TIME - NOW) < 2 DAYS B DISTANCE BETWEEN (LOCATION, DESTINATION) > 600 KM ) as M

Page 13: Fast Data Mining: Real Time Knowledge Discovery for Predictive Decision Making

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. CONFIDENTIAL – ORACLE RESTRICTEDCopyright © 2013, Oracle and/or its affiliates. All rights reserved.!13

• Identify most important factor • Predict customer behavior • Predict or estimate a value • Find profiles of targeted people or items • Segment a population • Find fraudulent or “rare events” • Determine co-occurring items in a “baskets”

Oracle Fast Data ApproachFilter, Move, Transform, Analyze, and Act at High Velocity

Real-Time Streams analysis, correlate events from different source, manage and use them as a windows and slides relational data.

Automatically sifting through large amounts of data to find previously hidden patterns, discover valuable new insights and make predictions

What is Oracle Data Mining?

!• Identify most important factor (Attribute Importance) • Predict customer behavior (Classification) • Predict or estimate a value (Regression) • Find profiles of targeted people or items (Decision Trees) • Segment a population (Clustering) • Find fraudulent or “rare events” (Anomaly Detection) • Determine co-occurring items in a “baskets” (Associations)

Page 14: Fast Data Mining: Real Time Knowledge Discovery for Predictive Decision Making

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.Copyright © 2013, Oracle and/or its affiliates. All rights reserved.!14

Data Mining Provides Better Information, Valuable Insights and Predictions

Inco

me

Customer Months

Cell Phone Churners vs. Loyal Customers

Insight & Prediction

Segment #1:

IF CUST_MO > 14 AND INCOME < $90K, THEN Prediction = Cell Phone Churner, Confidence = 100%, Support = 8/39

Segment #3:

IF CUST_MO > 7 AND INCOME < $175K, THEN Prediction = Cell Phone Churner, Confidence = 83%, Support = 6/39

Page 15: Fast Data Mining: Real Time Knowledge Discovery for Predictive Decision Making

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13Copyright © 2013, Oracle and/or its affiliates. All rights reserved.!15

My credit card statement—Can you see the fraud?

May 22 1:14 PM FOOD Monaco Café $127.38 May 22 7:32 PM WINE Wine Bistro $28.00 … June 14 2:05 PM MISC Mobil Mart $75.00 June 14 2:06 PM MISC Mobil Mart $75.00 June 15 11:48 AM MISC Mobil Mart $75.00 June 15 11:49 AM MISC Mobil Mart $75.00 May 28 6:31 PM WINE Acton Shop $31.00 May 29 8:39 PM FOOD Crossroads $128.14 June 16 11:48 AM MISC Mobil Mart $75.00 June 16 11:49 AM MISC Mobil Mart $75.00

Monaco?Gas Station?

All same $75 amount?

Pairs of $75?

Tota

l pur

chas

es e

xcee

ds

time

perio

d av

erag

e

A Real Fraud Example

Page 16: Fast Data Mining: Real Time Knowledge Discovery for Predictive Decision Making

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.Copyright © 2013, Oracle and/or its affiliates. All rights reserved.!16

“Essentially, all models are wrong, …but some are useful.”

- George Box (One of the most influential statisticians of the 20th century and a pioneer in the

areas of quality control, time series analysis, design of experiments and Bayesian inference.)

Page 17: Fast Data Mining: Real Time Knowledge Discovery for Predictive Decision Making

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.Copyright © 2013, Oracle and/or its affiliates. All rights reserved.!17

You Can Think of It Like This…

Traditional SQL• “Human-driven” queries • Domain expertise • Any “rules” must be

defined and managed

• SQL Queries • SELECT • DISTINCT • AGGREGATE • WHERE • AND OR • GROUP BY • ORDER BY • RANK

Oracle Data Mining• Automated knowledge

discovery, model building and deployment

• Domain expertise to assemble the “right” data to mine !

• ODM “Verbs” • PREDICT • DETECT • CLUSTER • CLASSIFY • REGRESS • PROFILE • IDENTIFY FACTORS • ASSOCIATE

+

Page 18: Fast Data Mining: Real Time Knowledge Discovery for Predictive Decision Making

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.Copyright © 2013, Oracle and/or its affiliates. All rights reserved.!18

Real-time Prediction for a Customer!• On-the-fly, single record apply with new data (e.g. from call center) !

Select prediction_probability(CLAS_DT_5_2, 'Yes' USING 7800 as bank_funds, 125 as checking_amount, 20 as credit_balance, 55 as age, 'Married' as marital_status,

250 as MONEY_MONTLY_OVERDRAWN, 1 as house_ownership)

from dual;

Web

Branc

CRM

Call

Email

Social

MobileGet

ECM BI

Page 19: Fast Data Mining: Real Time Knowledge Discovery for Predictive Decision Making

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.Copyright © 2013, Oracle and/or its affiliates. All rights reserved.!19

Predictive and Recommendation Analytics

• Combine Real Time Event Streaming Data Technologies with the Industry leading Oracle Historical Data Mining: – Oracle Data Mining

• Rich set of Algorithms for Data Mining • Predict Customer Behavior • Find Profiles of Targeted People or Items, and

determine important relationships • Immediately Predict Trends and Themes for Data in

motion • Respond to Prevent Business Threats and take

Advantage of Opportunities

Real Time Data Mining Modeling with Streaming Events

Page 20: Fast Data Mining: Real Time Knowledge Discovery for Predictive Decision Making

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13Copyright © 2012, Oracle and/or its affiliates. All rights reserved.!20 http://www.sail-world.com/USA/Americas-Cup:-Oracle-Data-Mining-supports-crew-and-BMW-ORACLE-Racing/68834

Acting Oracle Data Mining: Technology Behind the America’s Cup Win

• “The USA holds 250 sensors to collect raw data: pressure sensors on the wing; angle sensors on the adjustable trailing edge of the wing sail to monitor the effectiveness of each adjustment, allowing the crew to ascertain the amount of lift it’s generating; and fiber-optic strain sensors on the mast and wing to allow maximum thrust without over bending them. !

• But collecting data was only the beginning. ORACLE Racing also had to manage that data, analyze it, and present useful results……

Page 21: Fast Data Mining: Real Time Knowledge Discovery for Predictive Decision Making

!21

▪ Extract Knowledge starting from a csv file ▪ Execute Anomaly Detection Mining on stored data ▪ Put in place a RealTime Event Processing Flow ▪ Consuming event from In-Memory Data Grid ▪ Obtain instantly Fraud Prediction from :

Fast Data Mining Demo: Fraud Prediction in action…

Streaming Data

Page 22: Fast Data Mining: Real Time Knowledge Discovery for Predictive Decision Making

!22

Q & A

Page 23: Fast Data Mining: Real Time Knowledge Discovery for Predictive Decision Making

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.!23

Thanks !Fast Data Mining Real Time Knowledge Discovery for Predictive Decision MakingNino Guarnacci [email protected]