Upload
sebastien
View
558
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Talk IEEE RCIS 2013.
Citation preview
l i p 6 u n i v e r s i t e d e p a r i s 1 - c r i
Monitoring User-System
Interactions through Graph-Based
Intrinsic Dynamics Analysis
Sebastien Heymann, Benedicte Le Grand
Emails: [email protected], [email protected] 30, 2013
l i p 6 u n i v e r s i t e d e p a r i s 1 - c r i
Monitoring user-system
interactions
What type of user-system interactions?
• user-invoked services in information systems
• social networks
• ...
What kind of monitoring?
• discovery
• conformance
• model improvement
Our ultimate goal: automatic and real-time anomaly detection.
Sebastien Heymann, Benedicte Le Grand — Monitoring User-System Interactions — May 30, 2013
2/28
l i p 6 u n i v e r s i t e d e p a r i s 1 - c r i
Studied social network
Sebastien Heymann, Benedicte Le Grand — Monitoring User-System Interactions — May 30, 2013
3/28
l i p 6 u n i v e r s i t e d e p a r i s 1 - c r i
Github interaction: code commit
Sebastien Heymann, Benedicte Le Grand — Monitoring User-System Interactions — May 30, 2013
4/28
l i p 6 u n i v e r s i t e d e p a r i s 1 - c r i
Github interaction: bug report
Sebastien Heymann, Benedicte Le Grand — Monitoring User-System Interactions — May 30, 2013
5/28
l i p 6 u n i v e r s i t e d e p a r i s 1 - c r i
Collected Dataset
👤 👤 👤
📸 📸 📸 📸 📸 📸
❞❞
🎔
Interactions examples
commit code / merge
repositories.
open / close bug reports.
❞ comment on bug reports.
🎔edit the repository wiki.
”who contributes to which source code repository”
• 336 000 users and repositories monitored during 4 months.
• 2.2 million interactions recorded sequentially with timestamps.
Sebastien Heymann, Benedicte Le Grand — Monitoring User-System Interactions — May 30, 2013
6/28
l i p 6 u n i v e r s i t e d e p a r i s 1 - c r i
Log trace sample
User, user, repository, event, timestamp
lukearmstrong, fuel, core, IssuesEvent, 1341420003Try-Git, clarkeash, try git, CreateEvent, 1341420006uGoMobi, jquery, jquery-mobile, IssuesEvent, 1341420009jexp, neo4j, java-rest-binding, IssueCommentEvent, 1341420011HosipLan, nette, nette, PullRequestEvent, 1341420152
Sebastien Heymann, Benedicte Le Grand — Monitoring User-System Interactions — May 30, 2013
7/28
l i p 6 u n i v e r s i t e d e p a r i s 1 - c r i
Bipartite graph
👤 👤 👤
📸 📸 📸 📸 📸 📸
>: users
⊥: repositories
Sebastien Heymann, Benedicte Le Grand — Monitoring User-System Interactions — May 30, 2013
8/28
l i p 6 u n i v e r s i t e d e p a r i s 1 - c r i
Links appear over time
👤
📸
Detection of statistically abnormal links dynamics?Model of links dynamics?Link prediction?
Sebastien Heymann, Benedicte Le Grand — Monitoring User-System Interactions — May 30, 2013
9/28
l i p 6 u n i v e r s i t e d e p a r i s 1 - c r i
Links appear over time
👤 👤
📸 📸
Detection of statistically abnormal links dynamics?Model of links dynamics?Link prediction?
Sebastien Heymann, Benedicte Le Grand — Monitoring User-System Interactions — May 30, 2013
9/28
l i p 6 u n i v e r s i t e d e p a r i s 1 - c r i
Links appear over time
👤 👤
📸 📸 📸
Detection of statistically abnormal links dynamics?Model of links dynamics?Link prediction?
Sebastien Heymann, Benedicte Le Grand — Monitoring User-System Interactions — May 30, 2013
9/28
l i p 6 u n i v e r s i t e d e p a r i s 1 - c r i
Links appear over time
👤 👤
📸 📸 📸 📸
Detection of statistically abnormal links dynamics?Model of links dynamics?Link prediction?
Sebastien Heymann, Benedicte Le Grand — Monitoring User-System Interactions — May 30, 2013
9/28
l i p 6 u n i v e r s i t e d e p a r i s 1 - c r i
Links appear over time
👤 👤
📸 📸 📸 📸
Detection of statistically abnormal links dynamics?Model of links dynamics?Link prediction?
Sebastien Heymann, Benedicte Le Grand — Monitoring User-System Interactions — May 30, 2013
9/28
l i p 6 u n i v e r s i t e d e p a r i s 1 - c r i
Links appear over time
👤 👤👤
📸 📸 📸 📸 📸
Detection of statistically abnormal links dynamics?Model of links dynamics?Link prediction?
Sebastien Heymann, Benedicte Le Grand — Monitoring User-System Interactions — May 30, 2013
9/28
l i p 6 u n i v e r s i t e d e p a r i s 1 - c r i
Links appear over time
👤 👤👤
📸 📸 📸 📸 📸 📸
Detection of statistically abnormal links dynamics?Model of links dynamics?Link prediction?
Sebastien Heymann, Benedicte Le Grand — Monitoring User-System Interactions — May 30, 2013
9/28
l i p 6 u n i v e r s i t e d e p a r i s 1 - c r i
Links appear over time
👤 👤👤
📸 📸 📸 📸 📸 📸Detection of statistically abnormal links dynamics?Model of links dynamics?Link prediction?
Sebastien Heymann, Benedicte Le Grand — Monitoring User-System Interactions — May 30, 2013
9/28
l i p 6 u n i v e r s i t e d e p a r i s 1 - c r i
Methodology1 Order links by timestamp.
2 Define a sliding window of width w (time unit?).
3 Extract the bipartite graph from each window at interval i .
4 Compute an appropriate property on each graph.
5 Analyze the time series.Sebastien Heymann, Benedicte Le Grand — Monitoring User-System Interactions — May 30, 2013
10/28
l i p 6 u n i v e r s i t e d e p a r i s 1 - c r i
Example
Date
Nb
node
s
500
1000
1500
11 March 13 April 31 May 18 July
weekly patternNumber of nodes
Date
Nb
node
s
400600800
1000120014001600
15 April 22 April
day-night patternzoom
w =1 hour, i = 5 minutes.
Question: don’t temporal patterns hide information?
Sebastien Heymann, Benedicte Le Grand — Monitoring User-System Interactions — May 30, 2013
11/28
l i p 6 u n i v e r s i t e d e p a r i s 1 - c r i
Notions of time
Extrinsic time (real time)
Time measured in units such as seconds.
Good at revealing exogenous phenomena, e.g. day-night patterns.
Intrinsic time (related to graph dynamics)
Time measured in units such as the transition of two states in thegraph.
Better at revealing endogenous phenomena independently from thegraph dynamics?
Sebastien Heymann, Benedicte Le Grand — Monitoring User-System Interactions — May 30, 2013
12/28
l i p 6 u n i v e r s i t e d e p a r i s 1 - c r i
Window width: high resolution
Time (nb links)
Nb
node
s
200400600800
10001200
500000 1000000 1500000 2000000
Number of nodes
w = 1000 links, i = 100 links.
:) Additional observation
Sebastien Heymann, Benedicte Le Grand — Monitoring User-System Interactions — May 30, 2013
13/28
l i p 6 u n i v e r s i t e d e p a r i s 1 - c r i
Window width: lower resolution
Number of nodes
Time (nb links)
Nb
node
s
15000200002500030000
500000 1000000 1500000 2000000
w = 50, 000 links, i = 1000 links.
:) No need for high resolution
Sebastien Heymann, Benedicte Le Grand — Monitoring User-System Interactions — May 30, 2013
14/28
l i p 6 u n i v e r s i t e d e p a r i s 1 - c r i
Event validation
Visualization of the sub-graph: connected nodes are closer,
disconnected nodes are more distant.
In the sub-graph of8,370 nodes and10,000 links at thetime of the event,one node has a highnumber of links:
Try-Git interacts with4,127 users (over5,000).
Sebastien Heymann, Benedicte Le Grand — Monitoring User-System Interactions — May 30, 2013
15/28
l i p 6 u n i v e r s i t e d e p a r i s 1 - c r i
http://try.github.io
Sebastien Heymann, Benedicte Le Grand — Monitoring User-System Interactions — May 30, 2013
16/28
l i p 6 u n i v e r s i t e d e p a r i s 1 - c r i
Towards automatic anomaly
detection
Need for more elaborate properties, like:
Internal links
Their removal does not change the projection of the graph for agiven set of nodes, either > or ⊥.
👤👤
👤👤 👤👤
📸 📸 📸 📸 📸 📸
G G’ = G - (red link) G’T = GT
👤 👤👤
📸 📸 📸 📸 📸 📸
Sebastien Heymann, Benedicte Le Grand — Monitoring User-System Interactions — May 30, 2013
17/28
l i p 6 u n i v e r s i t e d e p a r i s 1 - c r i
Results
Ratio of >-internal links
Time (nb links)
Rat
io o
f top
−int
erna
l lin
ks
0.5
0.6
0.7
0.8
0.9
1.0
0 500000 1000000 1500000 2000000 2300000
not outlier potential outlier outlier unknown
A
B C D E F GH I
JK
w = 10, 000 links, i = 1000 links.
Color = outlier class using the automatic Outskewer method*.
* S. Heymann, M.Latapy and C. Magnien. Outskewer: Using Skewness to Spot
Outliers in Samples and Time Series, IEEE ASONAM 2012
Sebastien Heymann, Benedicte Le Grand — Monitoring User-System Interactions — May 30, 2013
18/28
l i p 6 u n i v e r s i t e d e p a r i s 1 - c r i
Conclusion
Contributions• Graph-based methodology to monitor user-system interactions
• Intrinsic time unit avoids exogeneous patterns impact
• Smaller windows not necessarily optimal
• Checked relevance of detected events
Applicable in other contexts
• Client-server architectures
• Processes-messages graphs
• File-provider graphs
• User-invoked services in information systems
Sebastien Heymann, Benedicte Le Grand — Monitoring User-System Interactions — May 30, 2013
19/28
l i p 6 u n i v e r s i t e d e p a r i s 1 - c r i
Future work
• Which property for anomaly detection?
• Models of interaction dynamics
• Link prediction
Sebastien Heymann, Benedicte Le Grand — Monitoring User-System Interactions — May 30, 2013
20/28
Questions?Monitoring User-System Interactions through
Graph-Based Intrinsic Dynamics Analysis<[email protected]>
Thank You!Monitoring User-System Interactions through
Graph-Based Intrinsic Dynamics Analysis<[email protected]>
Backup Slides
l i p 6 u n i v e r s i t e d e p a r i s 1 - c r i
Statistically significant anomalies
General definition
Values which deviate remarkably from the remainder of values(Grubbs, 1969)
Outskewer method*:
Our definition
Extremal value which skews a distribution of values.
* Heymann, Latapy and Magnien. Outskewer: Using Skewness to Spot Outliers in Samples and Time Series, IEEEASONAM 2012
Sebastien Heymann, Benedicte Le Grand — Monitoring User-System Interactions — May 30, 2013
24/28
l i p 6 u n i v e r s i t e d e p a r i s 1 - c r i
Skewness coefficient
γ = n(n−1)(n−2)
∑x∈X
(x−mean
standard deviation
)3de
nsity
x dens
ity
xγ < 0γ > 0
Example of skewed distributions.
It is sensitive to extremal values (min/max) far from the mean !
Sebastien Heymann, Benedicte Le Grand — Monitoring User-System Interactions — May 30, 2013
25/28
l i p 6 u n i v e r s i t e d e p a r i s 1 - c r i
Automatic anomaly detection
Outskewer classifies each value as:
Year
∆pop
ulat
ion
−1500000−1000000
−5000000
5000001000000
●
●●●●●●●●●●●●● ●●●
●
●
●●●
●●●●
●●● ●●
● ●●●●
●●●●●●●●●●●●● ●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●
1900 1920 1940 1960 1980 2000
status
● not outlier
potential outlier
outlier
or ’unknown’ for heterogeneous distributions of values.
Sebastien Heymann, Benedicte Le Grand — Monitoring User-System Interactions — May 30, 2013
26/28
l i p 6 u n i v e r s i t e d e p a r i s 1 - c r i
Event detection in time series
On a sliding window of size w , each value of X is classified wtimes.The final class of a value is the one that appears the most.
time
Sebastien Heymann, Benedicte Le Grand — Monitoring User-System Interactions — May 30, 2013
27/28
l i p 6 u n i v e r s i t e d e p a r i s 1 - c r i
Why Outskewer?
• claims no strong hypothesis on data
• 1 parameter: the time window width
• ignores regime changes (shifts in normality)
• can be implemented on-line.
Sebastien Heymann, Benedicte Le Grand — Monitoring User-System Interactions — May 30, 2013
28/28