Upload
others
View
18
Download
0
Embed Size (px)
Citation preview
ANOMALY DETECTION ON STREAMING DATA
WHEN BIG DATA CHALLENGESMEET MACHINE LEARNING
PARIS BIG DATA MARCH 12TH, 2019
SPORT, CINEMA, SERIES & MORE 16M SUBSCRIBERS €5.2B REVENUE
40 data experts Within a group of 500 digital experts (Ekino Group)
Founded in 2009 More than 100 projects delivered
International Paris - London - Singapore
Mathematics DNA Founded by 2 mathematicians incl. a Fields Medal holder
« Since 2009, we have helped our clients to model both their strategic and operational challenges, and to solve them with tailored solutions using data and AI. »
VALUES
EXPERTISES
X
Strategy Solutions Foundations Scale
Excellence Integrity Enthusiasm Agility
MFG Labs / CANAL+
4
STREAM PROCESSING
MFG Labs / CANAL+
5
EVENTS
event type start, stop, ...
service SVoD, live TV, ...
timestamp
app version
platformiOS, Android, PC, ...
140M EVENTS / DAY
ANOMALY DETECTION IN HOSTILE ENVIRONMENT POORLY DEFINED LOGS SEMANTICS
STREAMING ENVIRONMENT WHAT IS NORMALITY?
DATA ISSUES
- Regressions - Missing events - Repeated events - Changed event data
- Incidents - Partial data loss - Total data loss
PROGRESSIVE DEPLOYMENT DOESN'T REVEAL ISSUES
MVP & ITERATE
EXPERIMENT
?
DATA ISSUES
- Regressions - Missing events - Repeated events - Changed event data
- Incidents - Partial data loss - Total data loss
PROGRESSIVE DEPLOYMENT DOESN'T REVEAL ISSUES
EXPERIMENT
1. Random Cut Forest - AWS Kinesis
2. DeepAR - AWS SageMaker
3. Prophet - Facebook
METHODOLOGY
KPI DEFINITION& MODELLING
PROTOTYPING DRY RUN DATA COLLECTION& ANALYSIS
BUSINESS UNDERSTANDING
MFG Labs / CANAL+
10
INSIGHT #1 – FREQUENT RELEASES & MULTIPLE VERSIONS
MFG Labs / CANAL+
11
log scale
App version
linear scale
INSIGHT #2 – VERSION-SPECIFIC ANOMALIES
iOS, live TV, forwardButtonPressed, hour resolution iOS, live TV, forwardButtonPressed, hour resolution
MFG Labs / CANAL+
12
iOS, playerError, 2.3.5
INSIGHT #3 – #RECORDS / #DEVICES
Hour resolution
Compare time serieswith similarity tests
Create a complete data dictionary
Explore the dataand learn patterns
Define an indicatorto identify anomalies
Try differentaggregation levels
ANALYSIS – KEY LEARNINGS
• Pick a dimension and do a random cut • Repeat until all points are isolated
14
1. Build a forest of binary trees, each on a subset of time series 2. Use the forest to compute an anomaly score for a new point
MFG Labs / CANAL+
BUILDING A TREE SCORING A POINT
Random cut
S1
S2
A
ANOMALY SCORING – ROBUST RANDOM CUT FOREST
• Inject a new point in each tree • Measure how much the forest changes
i.e. how shallow the new cuts are
http://proceedings.mlr.press/v48/guha16.pdf
15
MFG Labs / CANAL+
• Peak is correctly detected as an anomaly
• Drops are also detected
RESULTS
Android, VoD, hour resolution
MFG Labs / CANAL+
16
MULTIVARIATE APPROACH DeepAR (AWS)
UNIVARIATE APPROACH Prophet (Facebook)
FORECASTING – DEEP AR, PROPHET1. Train a model to predict the time series accurately
2. Compare the real value to decide if it is an anomaly
• Use all time series as inputs • Add exogenous variables • Train a LSTM network
• Build a model for each time series • Predict with an additive regression
• Handles seasonality and trends
17
• Correct prediction of daily patterns
• Better accuracy and smoothness with Prophet
• More difficult to implement than RCF on AWS
RESULTS
Android, VoD, hour resolution
MFG Labs / CANAL+
18
MFG Labs / CANAL+
• Just a SQL function! • Can be called from Java (Flink) • Limited memory – sliding window
DEPLOYMENT & COLD START
KEY TAKEAWAYS
1. ENGINEER A FEATURE TO IDENTIFY ANOMALIES 2. ANTICIPATE COLD START IN PRODUCTION 3. ORGANIZE ANOMALY MANAGEMENT
THANK YOU!
QUESTIONS?