32
Multiple Queries with Conditional Attributes (QCATs) for Anomaly Detection in Visualization Simon Walton, Eamonn Maguire, Min Chen

Vissec2014

  • Upload
    wacton

  • View
    33

  • Download
    1

Embed Size (px)

Citation preview

Multiple Queries with Conditional Attributes (QCATs) for Anomaly Detection in Visualization

Simon Walton, Eamonn Maguire, Min Chen

Motivation and Theory

Motivation

• Anomaly detection is often hard, and context sensitive• We usually don’t have enough annotated training data,

and annotation itself is uncertain• Many different techniques exist• The human ideally should be in the loop• The visual analytics loop!

Aims

• To develop an anomaly detection method that• Is context-sensitive• Does not rely on supervised learning• Can be expanded and refined easily by the

user when needed• Is not cost-prohibitive to run, and is linearly

scalable

Information Theory

Information is Additive

• Notion: the number of all possible answers is the amount of information

• Roll a fair dice: 6 outcomes, equiprobable• What if I roll it n times?

• We can make information additive:

But… few things are equiprobable!

• Most die are biased• Most coins, too

Few things are equiprobable!

Let’s Play a Game

• 1/3 chance of getting the ball

• What is the amount of information then in the answer?

Defining the Total Information

• Average of all outcomes - i.e. weight according to their probabilities:

• More generally,

Our method: QCATs

QCATs are just like a spam filter

Work 1 Viagra Single GirlsJob Opportunity Work 2 From Mum

0.3

0.99 0.97

0.82

0.33

0.1

Work

Viagra

Work

1

Single Girls in your Area

From Mum

Job Opportunity

BayesianDNS

checksumBlacklists

Work

Viagra

Work

1

Single Girls in your Area

From Mum

Job Opportunity

QCATs: Query with Conditional Attributes

• Dataset A = {a1,a2,...,an} with n attributes

1 2 3 4 5 6 7 8 9 106

Conditional

2

VON VON

4

QCATs: Executing a QCAT

month

machine

user

Take QCAT Bind conditionals

month

machine

user

11=

SELECT uniq(machine, user)WHERE month = 11

Pseudo-SQLInstantiated QCATQCAT Specification

Combining QCATs

xth percentile

A Visual Analytical Workflow for QCATs

Goals

• An effective UI for designing QCATs• The visual analytics loop (right) is

ideal for this• Primarily this system would be used

by the model designer• A modified version for the analyst,

with additional tool support• A simplified visualisation (e.g.

time-series) for the observers

Visualisation

Knowledge

Models

(QCATs)

Data

CMU-CERT Dataset

• http://www.cert.org/insider-threat/tools/index.cfm

• Contains known ground-truth for insider threat scenarios

• Each event linked to a user

Email: 20mtime, user, machine, to (inc. CC,

BCC), from, size, number of attachments, content

Web: 3.5mtime, user, machine, url

Device: 1.24mtime, user, machine id,

[insert/remove]

Logon/off: 2.6mtime, user, machine, [logon/logoff]

QCAT Workflow

• Let’s add a QCAT!

Multiple QCATs

• Real power comes from multiple QCATs• To compare performance• To combine results for analysis

• So let’s add another!

Implementation: Scalability

• Linear with number of columns• Linear with number of rows

Discussion and Future Work

• Future work• Understanding how mutual information can be represented• Choice of information-theoretic measure still an issue• Binning strategies and assisted bin design

But wait! There’s more!

Analyzing High-dimensional Multivariate Network Links with Integrated Anomaly Detection, Highlighting and Exploration, Sungahn Ko, Shehzad Afzal, Simon Walton, Yang Yang, Junghoon Chae, Abish Malik, Yun Jang, Min Chen, David Ebert

Weds 16:15, Scene C