20
ANOMALY DETECTION ON STREAMING DATA WHEN BIG DATA CHALLENGES MEET MACHINE LEARNING PARIS BIG DATA MARCH 12 TH , 2019

ANOMALY DETECTION ON STREAMING DATA · DeepAR (AWS) UNIVARIATE APPROACH Prophet (Facebook) FORECASTING – DEEP AR, PROPHET 1. Train a model to predict the time series accurately

  • Upload
    others

  • View
    18

  • Download
    0

Embed Size (px)

Citation preview

Page 1: ANOMALY DETECTION ON STREAMING DATA · DeepAR (AWS) UNIVARIATE APPROACH Prophet (Facebook) FORECASTING – DEEP AR, PROPHET 1. Train a model to predict the time series accurately

ANOMALY DETECTION ON STREAMING DATA

WHEN BIG DATA CHALLENGESMEET MACHINE LEARNING

PARIS BIG DATA MARCH 12TH, 2019

Page 2: ANOMALY DETECTION ON STREAMING DATA · DeepAR (AWS) UNIVARIATE APPROACH Prophet (Facebook) FORECASTING – DEEP AR, PROPHET 1. Train a model to predict the time series accurately

SPORT, CINEMA, SERIES & MORE 16M SUBSCRIBERS €5.2B REVENUE

Page 3: ANOMALY DETECTION ON STREAMING DATA · DeepAR (AWS) UNIVARIATE APPROACH Prophet (Facebook) FORECASTING – DEEP AR, PROPHET 1. Train a model to predict the time series accurately

40 data experts Within a group of 500 digital experts (Ekino Group)

Founded in 2009 More than 100 projects delivered

International Paris - London - Singapore

Mathematics DNA Founded by 2 mathematicians incl. a Fields Medal holder

« Since 2009, we have helped our clients to model both their strategic and operational challenges, and to solve them with tailored solutions using data and AI. »

VALUES

EXPERTISES

X

Strategy Solutions Foundations Scale

Excellence Integrity Enthusiasm Agility

Page 4: ANOMALY DETECTION ON STREAMING DATA · DeepAR (AWS) UNIVARIATE APPROACH Prophet (Facebook) FORECASTING – DEEP AR, PROPHET 1. Train a model to predict the time series accurately

MFG Labs / CANAL+

4

STREAM PROCESSING

Page 5: ANOMALY DETECTION ON STREAMING DATA · DeepAR (AWS) UNIVARIATE APPROACH Prophet (Facebook) FORECASTING – DEEP AR, PROPHET 1. Train a model to predict the time series accurately

MFG Labs / CANAL+

5

EVENTS

event type start, stop, ...

service SVoD, live TV, ...

timestamp

app version

platformiOS, Android, PC, ...

140M EVENTS / DAY

Page 6: ANOMALY DETECTION ON STREAMING DATA · DeepAR (AWS) UNIVARIATE APPROACH Prophet (Facebook) FORECASTING – DEEP AR, PROPHET 1. Train a model to predict the time series accurately

ANOMALY DETECTION IN HOSTILE ENVIRONMENT POORLY DEFINED LOGS SEMANTICS

STREAMING ENVIRONMENT WHAT IS NORMALITY?

Page 7: ANOMALY DETECTION ON STREAMING DATA · DeepAR (AWS) UNIVARIATE APPROACH Prophet (Facebook) FORECASTING – DEEP AR, PROPHET 1. Train a model to predict the time series accurately

DATA ISSUES

- Regressions - Missing events - Repeated events - Changed event data

- Incidents - Partial data loss - Total data loss

PROGRESSIVE DEPLOYMENT DOESN'T REVEAL ISSUES

MVP & ITERATE

EXPERIMENT

?

Page 8: ANOMALY DETECTION ON STREAMING DATA · DeepAR (AWS) UNIVARIATE APPROACH Prophet (Facebook) FORECASTING – DEEP AR, PROPHET 1. Train a model to predict the time series accurately

DATA ISSUES

- Regressions - Missing events - Repeated events - Changed event data

- Incidents - Partial data loss - Total data loss

PROGRESSIVE DEPLOYMENT DOESN'T REVEAL ISSUES

EXPERIMENT

1. Random Cut Forest - AWS Kinesis

2. DeepAR - AWS SageMaker

3. Prophet - Facebook

Page 9: ANOMALY DETECTION ON STREAMING DATA · DeepAR (AWS) UNIVARIATE APPROACH Prophet (Facebook) FORECASTING – DEEP AR, PROPHET 1. Train a model to predict the time series accurately

METHODOLOGY

KPI DEFINITION& MODELLING

PROTOTYPING DRY RUN DATA COLLECTION& ANALYSIS

BUSINESS UNDERSTANDING

Page 10: ANOMALY DETECTION ON STREAMING DATA · DeepAR (AWS) UNIVARIATE APPROACH Prophet (Facebook) FORECASTING – DEEP AR, PROPHET 1. Train a model to predict the time series accurately

MFG Labs / CANAL+

10

INSIGHT #1 – FREQUENT RELEASES & MULTIPLE VERSIONS

Page 11: ANOMALY DETECTION ON STREAMING DATA · DeepAR (AWS) UNIVARIATE APPROACH Prophet (Facebook) FORECASTING – DEEP AR, PROPHET 1. Train a model to predict the time series accurately

MFG Labs / CANAL+

11

log scale

App version

linear scale

INSIGHT #2 – VERSION-SPECIFIC ANOMALIES

iOS, live TV, forwardButtonPressed, hour resolution iOS, live TV, forwardButtonPressed, hour resolution

Page 12: ANOMALY DETECTION ON STREAMING DATA · DeepAR (AWS) UNIVARIATE APPROACH Prophet (Facebook) FORECASTING – DEEP AR, PROPHET 1. Train a model to predict the time series accurately

MFG Labs / CANAL+

12

iOS, playerError, 2.3.5

INSIGHT #3 – #RECORDS / #DEVICES

Hour resolution

Page 13: ANOMALY DETECTION ON STREAMING DATA · DeepAR (AWS) UNIVARIATE APPROACH Prophet (Facebook) FORECASTING – DEEP AR, PROPHET 1. Train a model to predict the time series accurately

Compare time serieswith similarity tests

Create a complete data dictionary

Explore the dataand learn patterns

Define an indicatorto identify anomalies

Try differentaggregation levels

ANALYSIS – KEY LEARNINGS

Page 14: ANOMALY DETECTION ON STREAMING DATA · DeepAR (AWS) UNIVARIATE APPROACH Prophet (Facebook) FORECASTING – DEEP AR, PROPHET 1. Train a model to predict the time series accurately

• Pick a dimension and do a random cut • Repeat until all points are isolated

14

1. Build a forest of binary trees, each on a subset of time series 2. Use the forest to compute an anomaly score for a new point

MFG Labs / CANAL+

BUILDING A TREE SCORING A POINT

Random cut

S1

S2

A

ANOMALY SCORING – ROBUST RANDOM CUT FOREST

• Inject a new point in each tree • Measure how much the forest changes

i.e. how shallow the new cuts are

http://proceedings.mlr.press/v48/guha16.pdf

Page 15: ANOMALY DETECTION ON STREAMING DATA · DeepAR (AWS) UNIVARIATE APPROACH Prophet (Facebook) FORECASTING – DEEP AR, PROPHET 1. Train a model to predict the time series accurately

15

MFG Labs / CANAL+

• Peak is correctly detected as an anomaly

• Drops are also detected

RESULTS

Android, VoD, hour resolution

Page 16: ANOMALY DETECTION ON STREAMING DATA · DeepAR (AWS) UNIVARIATE APPROACH Prophet (Facebook) FORECASTING – DEEP AR, PROPHET 1. Train a model to predict the time series accurately

MFG Labs / CANAL+

16

MULTIVARIATE APPROACH DeepAR (AWS)

UNIVARIATE APPROACH Prophet (Facebook)

FORECASTING – DEEP AR, PROPHET1. Train a model to predict the time series accurately

2. Compare the real value to decide if it is an anomaly

• Use all time series as inputs • Add exogenous variables • Train a LSTM network

• Build a model for each time series • Predict with an additive regression

• Handles seasonality and trends

Page 17: ANOMALY DETECTION ON STREAMING DATA · DeepAR (AWS) UNIVARIATE APPROACH Prophet (Facebook) FORECASTING – DEEP AR, PROPHET 1. Train a model to predict the time series accurately

17

• Correct prediction of daily patterns

• Better accuracy and smoothness with Prophet

• More difficult to implement than RCF on AWS

RESULTS

Android, VoD, hour resolution

MFG Labs / CANAL+

Page 18: ANOMALY DETECTION ON STREAMING DATA · DeepAR (AWS) UNIVARIATE APPROACH Prophet (Facebook) FORECASTING – DEEP AR, PROPHET 1. Train a model to predict the time series accurately

18

MFG Labs / CANAL+

• Just a SQL function! • Can be called from Java (Flink) • Limited memory – sliding window

DEPLOYMENT & COLD START

Page 19: ANOMALY DETECTION ON STREAMING DATA · DeepAR (AWS) UNIVARIATE APPROACH Prophet (Facebook) FORECASTING – DEEP AR, PROPHET 1. Train a model to predict the time series accurately

KEY TAKEAWAYS

1. ENGINEER A FEATURE TO IDENTIFY ANOMALIES 2. ANTICIPATE COLD START IN PRODUCTION 3. ORGANIZE ANOMALY MANAGEMENT

Page 20: ANOMALY DETECTION ON STREAMING DATA · DeepAR (AWS) UNIVARIATE APPROACH Prophet (Facebook) FORECASTING – DEEP AR, PROPHET 1. Train a model to predict the time series accurately

THANK YOU!

QUESTIONS?