Upload
wacton
View
33
Download
1
Tags:
Embed Size (px)
Citation preview
Multiple Queries with Conditional Attributes (QCATs) for Anomaly Detection in Visualization
Simon Walton, Eamonn Maguire, Min Chen
Motivation
• Anomaly detection is often hard, and context sensitive• We usually don’t have enough annotated training data,
and annotation itself is uncertain• Many different techniques exist• The human ideally should be in the loop• The visual analytics loop!
Aims
• To develop an anomaly detection method that• Is context-sensitive• Does not rely on supervised learning• Can be expanded and refined easily by the
user when needed• Is not cost-prohibitive to run, and is linearly
scalable
Information is Additive
• Notion: the number of all possible answers is the amount of information
• Roll a fair dice: 6 outcomes, equiprobable• What if I roll it n times?
• We can make information additive:
But… few things are equiprobable!
• Most die are biased• Most coins, too
Few things are equiprobable!
Let’s Play a Game
• 1/3 chance of getting the ball
• What is the amount of information then in the answer?
Defining the Total Information
• Average of all outcomes - i.e. weight according to their probabilities:
• More generally,
QCATs are just like a spam filter
Work 1 Viagra Single GirlsJob Opportunity Work 2 From Mum
0.3
0.99 0.97
0.82
0.33
0.1
Work
Viagra
Work
1
Single Girls in your Area
From Mum
Job Opportunity
BayesianDNS
checksumBlacklists
Work
Viagra
Work
1
Single Girls in your Area
From Mum
Job Opportunity
QCATs: Query with Conditional Attributes
• Dataset A = {a1,a2,...,an} with n attributes
1 2 3 4 5 6 7 8 9 106
Conditional
2
VON VON
4
QCATs: Executing a QCAT
month
machine
user
Take QCAT Bind conditionals
month
machine
user
11=
SELECT uniq(machine, user)WHERE month = 11
Pseudo-SQLInstantiated QCATQCAT Specification
Goals
• An effective UI for designing QCATs• The visual analytics loop (right) is
ideal for this• Primarily this system would be used
by the model designer• A modified version for the analyst,
with additional tool support• A simplified visualisation (e.g.
time-series) for the observers
Visualisation
Knowledge
Models
(QCATs)
Data
CMU-CERT Dataset
• http://www.cert.org/insider-threat/tools/index.cfm
• Contains known ground-truth for insider threat scenarios
• Each event linked to a user
Email: 20mtime, user, machine, to (inc. CC,
BCC), from, size, number of attachments, content
Web: 3.5mtime, user, machine, url
Device: 1.24mtime, user, machine id,
[insert/remove]
Logon/off: 2.6mtime, user, machine, [logon/logoff]
Multiple QCATs
• Real power comes from multiple QCATs• To compare performance• To combine results for analysis
• So let’s add another!
Discussion and Future Work
• Future work• Understanding how mutual information can be represented• Choice of information-theoretic measure still an issue• Binning strategies and assisted bin design
But wait! There’s more!
Analyzing High-dimensional Multivariate Network Links with Integrated Anomaly Detection, Highlighting and Exploration, Sungahn Ko, Shehzad Afzal, Simon Walton, Yang Yang, Junghoon Chae, Abish Malik, Yun Jang, Min Chen, David Ebert
Weds 16:15, Scene C
Questions?
• CC-licensed photo acknowledgements:• Coins - https://www.flickr.com/photos/wwarby• Cups and Balls - https://www.flickr.com/photos/eschipul• Dice - https://www.flickr.com/photos/darwinbell