9
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. 7, NO. 1, FEBRUARY 2013 67 Anomaly Detection in Time Series of Graphs using Fusion of Graph Invariants Youngser Park, Carey E. Priebe, Senior Member, IEEE, and Abdou Youssef, Senior Member, IEEE Abstract—Given a time series of graphs , , where the xed vertex set represents “actors” and an edge between vertex and vertex at time represents the existence of a communications event between actors and during the time period, we wish to detect anomalies and/or change points. We consider a collection of graph features, or invariants, and demonstrate that adaptive fusion provides su- perior inferential efcacy compared to naive equal weighting for a certain class of anomaly detection problems. Simulation results using a latent process model for time series of graphs, as well as illustrative experimental results for a time series of graphs derived from the Enron email data, show that a fusion statistic can provide superior inference compared to individual invariants alone. These results also demonstrate that an adaptive weighting scheme for fu- sion of invariants performs better than naive equal weighting. Index Terms—Change point detection, fusion, graph invariants, hypothesis testing, random graphs, statistical inference on graphs, time series analysis. I. INTRODUCTION G IVEN a time series of graphs , , where the vertex set is xed throughout and the edge sets are time-de- pendent, we wish to detect anomalies and/or change points. Let us consider vertices to represent “actors,” and an edge between vertex and vertex at time represents the exis- tence of a communications event between actors and during the time period. Thus represents the collection of (un- ordered) pairs of vertices which communicate during . We will not consider directed edges or hyper-graphs (hyper- edges consisting of more than two vertices) or multi-graphs (more than one edge between any two vertices at any time ) or self-loops (an edge from a vertex to itself) or weighted edges, although all of these generalizations of simple graphs may be relevant for specic applications. The specic anomaly we will consider is the “chatter” alter- native—a small (unspecied) subset of vertices with excessive communication amongst themselves during some time period in Manuscript received July 27, 2012; revised November 09, 2012; accepted December 01, 2012. Date of publication December 12, 2012; date of current version January 22, 2013. This work was supported in part by the Johns Hopkins University Human Language Technology Center of Excellence. The associate editor coordinating the review of this manuscript and approving it for publica- tion was Dr. Alfred Hero. Y. Park and C. E. Priebe are with the Department of Applied Mathematics and Statistics, Johns Hopkins University, Baltimore, MD, 21211 USA (e-mail: [email protected]; [email protected]). A. Youssef is with the Department of Computer Science, George Washington University, Washington, D.C. 20052 USA (e-mail: [email protected]). Color versions of one or more of the gures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identier 10.1109/JSTSP.2012.2233712 Fig. 1. Notional depiction of a time series of graphs in which the entire vertex set behaves in some null state for and then, at time ,a subset of vertices exhibits a change in connectivity behavior. an otherwise stationary setting, as depicted in Fig. 1. This gure notionally depicts the entire vertex set behaving in some null state for ; then, at time , a collection of ver- tices exhibit probabilistically higher connectivity. (The remaining interconnection probabilities remain in their null state at time .) Our statis- tical inference task is then to determine whether or not there has emerged a “chatter” group at some time , as shown in Fig. 1. The latent process model for time series of graphs presented in [1] provides for precisely this temporal structure. Each vertex is governed by a continuous time, nite state stochastic process , with the state-space given by . The probability of edge at time is determined by the inner product of the sub-probability vectors specied by , , for . For the scenario depicted in Fig. 1, the vertex processes are stationary until time and then undergo a change point, while the processes remain stationary throughout all time. In [1], the model produces a dependent time series of graphs , each of which is itself a latent position model with condi- tionally independent edges given . The model allows two simplifying approximations; a second-order (cen- tral limit theorem) approximation with temporally independent random graphs each of which is itself a random dot product ([2], [3], and Section 16.4 in [4]) latent position model [5], and a rst-order (law of large numbers) approximation with tempo- rally independent random graphs each of which is itself an in- dependent edge random graph model [6]. The simplicity of the rst-order approximation, depicted in Fig. 2 for the special case of homogeneity vs. kidney-egg, provides a useful framework for description. If the vertex processes are independent and identical, with stationary probability vector , then the rst-order approximation produces a temporally indepen- dent series of homogeneous independent edge Erdös-Rényi random graph (denoted by ) with , where . The vertex processes 1932-4553/$31.00 © 2012 IEEE

Anomaly Detection in Time Series of Graphs using Fusion of Graph Invariants

  • Upload
    abdou

  • View
    213

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Anomaly Detection in Time Series of Graphs using Fusion of Graph Invariants

IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. 7, NO. 1, FEBRUARY 2013 67

Anomaly Detection in Time Series of Graphsusing Fusion of Graph Invariants

Youngser Park, Carey E. Priebe, Senior Member, IEEE, and Abdou Youssef, Senior Member, IEEE

Abstract—Given a time series of graphs ,, where the fixed vertex set represents “actors”

and an edge between vertex and vertex at timerepresents the existence of a communications event between actorsand during the time period, we wish to detect anomalies

and/or change points. We consider a collection of graph features,or invariants, and demonstrate that adaptive fusion provides su-perior inferential efficacy compared to naive equal weighting fora certain class of anomaly detection problems. Simulation resultsusing a latent process model for time series of graphs, as well asillustrative experimental results for a time series of graphs derivedfrom the Enron email data, show that a fusion statistic can providesuperior inference compared to individual invariants alone. Theseresults also demonstrate that an adaptive weighting scheme for fu-sion of invariants performs better than naive equal weighting.

Index Terms—Change point detection, fusion, graph invariants,hypothesis testing, random graphs, statistical inference on graphs,time series analysis.

I. INTRODUCTION

G IVEN a time series of graphs ,, where the vertex set is

fixed throughout and the edge sets are time-de-pendent, we wish to detect anomalies and/or change points. Letus consider vertices to represent “actors,” and an edge betweenvertex and vertex at time represents the exis-tence of a communications event between actors and duringthe time period. Thus represents the collection of (un-ordered) pairs of vertices which communicate during .We will not consider directed edges or hyper-graphs (hyper-edges consisting of more than two vertices) or multi-graphs(more than one edge between any two vertices at any time ) orself-loops (an edge from a vertex to itself) or weighted edges,although all of these generalizations of simple graphs may berelevant for specific applications.The specific anomaly we will consider is the “chatter” alter-

native—a small (unspecified) subset of vertices with excessivecommunication amongst themselves during some time period in

Manuscript received July 27, 2012; revised November 09, 2012; acceptedDecember 01, 2012. Date of publication December 12, 2012; date of currentversion January 22, 2013. This work was supported in part by the Johns HopkinsUniversity Human Language Technology Center of Excellence. The associateeditor coordinating the review of this manuscript and approving it for publica-tion was Dr. Alfred Hero.Y. Park and C. E. Priebe are with the Department of Applied Mathematics

and Statistics, Johns Hopkins University, Baltimore, MD, 21211 USA (e-mail:[email protected]; [email protected]).A. Youssef is with the Department of Computer Science, George Washington

University, Washington, D.C. 20052 USA (e-mail: [email protected]).Color versions of one or more of the figures in this paper are available online

at http://ieeexplore.ieee.org.Digital Object Identifier 10.1109/JSTSP.2012.2233712

Fig. 1. Notional depiction of a time series of graphs in which the entire vertexset behaves in some null state for and then, at time , asubset of vertices exhibits a change in connectivity behavior.

an otherwise stationary setting, as depicted in Fig. 1. This figurenotionally depicts the entire vertex set behaving in some nullstate for ; then, at time , a collection of ver-tices exhibit probabilisticallyhigher connectivity. (The remaining interconnectionprobabilities remain in their null state at time .) Our statis-tical inference task is then to determine whether or not there hasemerged a “chatter” group at some time , as shown inFig. 1.The latent process model for time series of graphs presented

in [1] provides for precisely this temporal structure. Each vertexis governed by a continuous time, finite state stochastic process

, with the state-space given by .The probability of edge at time is determined by theinner product of the sub-probability vectors specified by

, , for . For thescenario depicted in Fig. 1, the vertex processesare stationary until time and then undergo a changepoint, while the processes remain stationarythroughout all time.In [1], the model produces a dependent time series of graphs, each of which is itself a latent position model with condi-

tionally independent edges given . The modelallows two simplifying approximations; a second-order (cen-tral limit theorem) approximation with temporally independentrandom graphs each of which is itself a random dot product ([2],[3], and Section 16.4 in [4]) latent position model [5], and afirst-order (law of large numbers) approximation with tempo-rally independent random graphs each of which is itself an in-dependent edge random graph model [6].The simplicity of the first-order approximation, depicted

in Fig. 2 for the special case of homogeneity vs. kidney-egg,provides a useful framework for description. If the vertexprocesses are independent and identical, withstationary probability vector , thenthe first-order approximation produces a temporally indepen-dent series of homogeneous independent edge Erdös-Rényirandom graph (denoted by ) with , where

. The vertex processes

1932-4553/$31.00 © 2012 IEEE

Page 2: Anomaly Detection in Time Series of Graphs using Fusion of Graph Invariants

68 IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. 7, NO. 1, FEBRUARY 2013

Fig. 2. The “kidney-egg” random graph model, denoted . Thesmall “egg” represents the vertices that exhibit chatter (each edge oc-curring with probability ). The “kidney” is the population of verticeswhich are not exhibiting chatter (each edge occurring with probability ).Edges between a vertex in the kidney and a vertex in the egg occur with proba-bility . When or , this model degenerates to .

change at time , taking on stationary probability vector ,so that is a kidney-egg independent edgerandom graph with . The idea that the changepoint consists of a small collection of vertices exhibiting ex-cessive interconnection probability results in the restrictionof this model to the case . (Here we have assumed, forsimplicity, that the geometry provides .)1

In [7], the scan statistic graph invariants are introduced andapplied to the problem of detecting “chatter” anomalies in timeseries of Enron graphs. In [8], various graph invariants (size,maximum degree, etc.) are considered for their power as teststatistics in testing vs. . It isdemonstrated that no single invariant is uniformly most pow-erful. See also [9].In [10] the principal eigenvector of a matrix based on the

graph is tracked over time, and an anomaly is declared to bepresent if its direction changes bymore than some threshold. Re-searchers in [11] have addressed problems in dynamic networkanalysis such as detection of anomalies or distinct subgraphsin large, noisy background in signal processing fields. Recently,[12] proposed a methodology of detecting anomalous graphs byexamining distributions of vertex invariants instead of using asingle graph invariant. They used a simple non-time series ofsimulated random graph models. In [13], a locality statisticusing a generalized likelihood ratio test statistic (they call this ascan statistic) has been applied for an online network intrusiondetection. Other notable recent efforts in this direction include[14]–[16].In this paper, we consider the problem of detecting “chatter”

anomalies in time series of graphs using combinations of invari-ants. We present experimental results for anomaly detection ontime series of simulated data from the model in [1], as well as aninvestigation of a time series of graphs extracted from the Enron

1If , then we havefor a , and for a .The difference between these two expected degrees is then

. If is of order , we see that the above expressionis minimized over when , which indicates that the most difficultscenario is when .

email corpus, to demonstrate that a statistic which combinesmultiple invariants can provide superior inference compared toindividual invariants alone. We further demonstrate an adaptiveweighting scheme for fusion of invariants that performs betterthan naive equal weighting.Section II presents the graph features (invariants, used as sta-

tistics) considered herein, Section III introduces our adaptive fu-sion, and Section IV presents results with simulated data as wellas Enron email data. We conclude with discussion in Section V.

II. GRAPH FEATURES

We investigate a collection of nine graph features similar tothat considered in [8]: size,maximumdegree,maximumaveragedegree (eigenvalue approximation), scan statistic (scale 1,2,3),number of triangles, clustering coefficient, and (negative) av-erage path length. In all cases, a large value of the feature is anevidence in favor of excessive interconnection probability.

A. Invariants

1) Size: The size of a graph is the number of edges in thegraph, given by

This is the simplest global graph statistic.2) Maximum Degree: The maximum degree of a

graph is given by

where is the degree of vertex . This is the simplestlocalized graph feature.3) Maximum Average Degree: The maximum average de-

gree of a graph is the maximum over all subgraphs of ofthe average degree of . If is the degree of vertex ,then the average degree of a graph is given by

where , the number of vertices. Thus the max-imum average degree is given by

where the maximum is over all (induced) subgraphs of .Since is difficult to compute exactly [17], we resort

to an eigenvalue approximation. is bounded above bythe largest eigenvalue of the adjacency matrix of , denoted

, and we use

As demonstrated in [8], the eigenvalue method appears to bestrictly better at detecting increased local activity than thegreedy approximation method of [17] (Problem 5.7.2, page 90).4) Scan Statistic: Scan statistics [7] are graph features based

on local neighborhoods of the graph. We will consider the scanstatistic to be the maximum number of edges over all

order neighborhoods, where the order neighborhood ofa vertex , , is the set of vertices whose graph shortest

Page 3: Anomaly Detection in Time Series of Graphs using Fusion of Graph Invariants

PARK et al.: ANOMALY DETECTION IN TIME SERIES OF GRAPHS USING FUSION OF GRAPH INVARIANTS 69

path distance from is less than equal to . We will consider, where is given by

where denotes the induced subgraph.5) Number of Triangles: We consider the total number of

triangles in . If is the adjacency matrix for the graph ,then the number of triangles is given by

The trace is zero if and only if the graph is triangle-free.6) Clustering Coefficient: We consider the global clustering

coefficient (CC) in , given by

where is the number of closed triplets (a subgraph with threevertices and three edges) and is the number of open triplets(a subgraph with three vertices and at least two edges). Thismeasures the probability that the adjacent vertices of a vertex areconnected. This is sometimes called the transitivity of a graph.7) Average Path Length: The average path length (APL) is

given by

where is the shortest path between vertices and . Thismeasures how many steps are required to access every othervertex from a given vertex, on average. Unlike our other invari-ants, a small value of the average path length is an evidencein favor of excessive interconnection probability, so we use thenegated value

in this work. (If no path exists between and , we use, where the maximum is taken over all pairs of

vertices that have an existing path between them.)2

B. Temporal Normalization

The purpose of our inference is to detect a local (temporal)behavior change in the time series of graphs. In particular,we wish to consider as our alternative hypothesis that a small(unspecified) collection of vertices (the “egg”) increases theirwithin-group activity at some time as compared to recentpast while the majority of vertices (the “kidney”) continue withtheir normal behavior. The null hypothesis, then, is a form oftemporal homogeneity—no probabilistic behavior changes interms of graph features. See Fig. 3.As mentioned in [7], the raw features are standard-

ized using a quantity computed from the recent past:

where and are the running mean and standarddeviation estimates of based on the most recent time steps;that is,

2In fact, the average path length (APL) is inappropriate for sparse (highlydisconnected) graphs.

Fig. 3. at and at . The state compares previousmany (10 in this case) null graphs to a null graph, and the statecompares many null graphs to an alternative graph, .

and

Then, a detection at time is obtained when is large.(Note that for the localized statistics (maximum degree, max-imum average degree, and the scan statistics) we must firstperform vertex standardization, as in [7] Section 6, so that,for an inhomogeneous collection of stationary null vertexprocesses, the most active vertices do not dominate thesestatistics.)

C. Simulation

Our general algorithm for implementing the time series ofrandom dot product graphs is presented in Algorithm II.1. Theonly difference among our three models in [1] occurs in line 3,where the probability vectors for vertices are obtained; the firstapproximation uses fixed (non-random or deterministic) proba-bility vectors and so that and

while the second approximation and the exactmodels use random probability vectors [1].

Algorithm II.1 Time Series of Random Dot Product Graph

Require

1: for all time such that do

2: initialize the adjacency matrix with zeros

3: calculate probability vectors for all vertices using

4: for all vertex such that do

5: for all vertex such that do

6: if then

7: {vector dot product}

8: {draw an edge}

9: end if

10: end for

11: end for

12:

13: end for

14: return

Page 4: Anomaly Detection in Time Series of Graphs using Fusion of Graph Invariants

70 IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. 7, NO. 1, FEBRUARY 2013

Fig. 4. Density estimates for Monte Carlo replicates of in the first approximation model. forand . For each invariant, black denotes and red denotes .

Density estimates of for all nine features are presentedin Fig. 4 (using ). Black denotes and reddenotes . As we can see from this figure, all featureshave mean zero and variance one (approximately) for . It isour goal to measure the performance of each individual graphfeature, and then compare these results with the effectiveness ofcombining features, on our statistical inference task.Comparative power results for the individual features are de-

picted in Fig. 5, with a cumulative color bar for each feature.For the most subtle case (when is small, in blue) the power foreach feature is relatively low, while higher power is achievedas increases. These results agree qualitatively with the resultspresented in [8].

III. FUSION OF GRAPH FEATURES

We will consider two weighting methods for fusion of ourgraph features introduced in Section II. Our fusion test statisticis given by

where is the number of graph features ( , for our inves-tigations).

A. Weighting

The naive equal weighting scheme is given by

for all , and .

Fig. 5. Statistical power for our nine graph features in the first approximationmodel. for and

, for and allowableType I error rate , based on Monte Carlo replicates.The error bars represent 1.96 standard error for the sample means.

Our adaptive weighting scheme uses

where and are the mean and the standard devia-tion of over Monte Carlo replicates. (Due toour temporal normalization, all features have mean zero andvariance one (approximately) when “recent past” consists of

Page 5: Anomaly Detection in Time Series of Graphs using Fusion of Graph Invariants

PARK et al.: ANOMALY DETECTION IN TIME SERIES OF GRAPHS USING FUSION OF GRAPH INVARIANTS 71

Fig. 6. Scatter plots for size versus maximum degree for each fusion technique. Each point represents a Monte Carlo replicate. The black points (circles) are, and the color points are ; the points above the detection boundaries (critical values) are colored in green (“ ” symbols). The ratio of the number

of green points over the total of green and red points represents the power of the test: for the equal weighting and for the adaptiveweighting. Blue lines represent detection boundaries, which provide quantitative rejection regions. (a) Equal Weighting; (b) Adaptive Weighting.

stationarity, which is the assumption when testing for changeat time .) A detailed algorithm of this approach is shown inAlgorithm III.1.

Algorithm III.1 Hypothesis Test using Adaptive WeightingFusion

Require: normalized feature matrix,

1: matrix for null at time from, and matrix for alternative at time

from

2: mean vector of , andstandard deviation vector of over Monte Carloreplicated

3:

4: for all replicate such that do

5: {single replicate of }

6: { weight vector}

7: { fused null vector}

8: quantile {critical value: 95% quantile}

9: {fused scalar of }

10: if then

11:

12: end if

13: end for

14: return , power of the test

Notice that the adaptive weights are a function of the graphbeing tested (line 6 of the algorithm). This implies that the

features with larger deviations from the norm get higher weightsand contribute more to the inference.

B. Examples

A graphical example is illustrated in Fig. 6 and 7. InFig. 6, each point represents a Monte Carlo replicate of timeseries of graph in two-dimensional Euclidean space using thefirst two features (size and maximum degree). The blackpoints (circles) are , and the color points

are ; the points above the detection boundaries(critical values in Algorithm III.1, line 8) are colored in green(“ ” symbols) and represent the power of the test. Noticethat this boundary is linear for the equal weighting whileit is not for the adaptive weighting. The former is becausethe boundary is calculated based on equal weighting for all

points; the slope of the line is always 1 and theintercept can be calculated with a given significance level ofthe test (i.e., ,where ). For the adaptive weighting case, meanwhile,the color of the points are determined by the distancefrom each point to , the mean vector of ; thepoints whose fused values are bigger than the critical valuewill get the green colors. This means that every pointgets a different weight and therefore the detection boundaryis not linear. Fig. 7 shows the adaptive weighting case forvarious values of . As increases, there are more greenpoints, which implies higher power as expected.

IV. FUSION EXPERIMENTS

A. Simulations

The simulation setup of this experiment is the same as theone in Section II-C except that fusion of graph features isapplied. The performance of fusion with all nine features isdepicted as horizontal lines in Fig. 8. In all cases, the fusionlines are above the corresponding individual bars, and theadaptive weighting fusion lines are above the equal weightingfusion lines.Fig. 9 depicts power as a function of fusion dimension for

the different weighting schemes for the three models in [1].Given a fusion dimension , all possible combinations offeatures are considered for the fusion and the best performanceis plotted. The difference in performance among the threemodels in [1] is minimal (“qualitatively similar”), while thesuperiority of the adaptive weighting scheme (with symbol)is apparent. Table I depicts the actual weightings obtained viathe adaptive weighting scheme for . We see that, for the

Page 6: Anomaly Detection in Time Series of Graphs using Fusion of Graph Invariants

72 IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. 7, NO. 1, FEBRUARY 2013

Fig. 7. Scatter plots for size vs. maximum degree for adaptive weighting for . Each point represents a Monte Carlo replicate. The blackpoints (circles) are , and the color points are ; the points above the detection boundaries (critical values) are colored in green (“ ” symbols).The actual powers of the test are 0.332, 0.564, 0.775, and 0.917, respectively. As increases, there are more green points (“ ” symbols), which implies higherpower. Blue lines represent detection boundaries, which provide quantitative rejection regions. (a) ; (b) ; (c) ; (d) .

Fig. 8. Statistical power for our nine graph features and two fusion schemesin the first approximation model. for

and , forand allowable Type I error rate , based on

Monte Carlo replicates. The horizontal lines indicate the power usingfusion statistics with . The error bars represent 1.96 standarderror for the sample means. The superiority of adaptive weighting (solid lines)over equal weighting (dashed lines) is apparent.

most part, the same features are selected for all three modelsin [1].

Fig. 9. Statistical power plots for fusion statistics for the three models in [1] asa function of fusion dimension when , , and .The error bars represent 1.96 standard error for the sample means. The fusiondimensions are chosen from the best possible combinations. The differencein performance among the three models is minimal. The adaptive weightingscheme (with symbol) is superior to equal weighting.

In Fig. 10 we present a statistical power plot of fusion usingall nine features with and as a

Page 7: Anomaly Detection in Time Series of Graphs using Fusion of Graph Invariants

PARK et al.: ANOMALY DETECTION IN TIME SERIES OF GRAPHS USING FUSION OF GRAPH INVARIANTS 73

TABLE ITHE ESTIMATED WEIGHTINGS OBTAINED VIA THE ADAPTIVE WEIGHTINGSCHEME FOR FROM FIG. 9. WE SEE THAT, FOR THE MOST PART, THE

SAME FEATURES ARE SELECTED FOR ALL THREE MODELS IN [1]

Fig. 10. Statistical power as a function of rate parameter for models in [1] andboth weighting schemes based on Monte Carlo replicates, with

, , and . The horizontal lines represent resultsfor the first approximation three standard deviations for adaptiveweighting (upper line, at power approximately 0.56) and equal weighting (lowerline, at power approximately 0.45).

function of the rate parameter for the vertex processes.3 Theseresults demonstrate that (1) adaptive weighting is superior toequal weighting, (2) the second approximation is more faithfulto the exact model than is the first approximation, and (3) bothapproximations are accurate for large .

B. Enron Email Data

We use the Enron email data used in [7] for this experiment.The nine features, for , are calculated forgraphs derived from email messages among executivesduring one week periods. Fig. 11 depicts histograms of foreach .Our interest is the “alias” detection identified at week 132 in

[7], when an employee changes his/her email address. There-fore, we choose , the third week of May 2001. Fig. 12depicts scatter plots of for for variouspairs of invariants, where is shown in red. Unlike thesimulation in Fig. 7, Monte Carlo replicates of graph are not

3The parameter controls the variability of the latent stochastic processesfor the vertices. In particular, a large value of corresponds to small

variability in (the second-order approximation), and as theprocesses converge to the stationary probability vectors and(the first-order approximation). See [1] for detail. We have used forall other results presented herein.

Fig. 11. Enron email data histograms of for 189 weeks.

TABLE IITHE PERFORMANCE OF EQUAL AND ADAPTIVE WEIGHTING FUSION METHODSON ENRON EMAIL GRAPHS. FOR EXAMPLE, WHEN THE FUSION DIMENSION

, THE POSSIBLE NUMBER OF COMBINATION OF FEATURE DIMENSIONS IS36, AND BOTH EQUAL AND ADAPTIVE WEIGHTING METHODS CAN DETECT 24CASES, BUT ONLY ADAPTIVE WEIGHTING CAN DETECT 5 ADDITIONAL CASES

available for real data; therefore the 131 previous graphs (shownas black points in the figure) are used to determine detectionboundaries. This investigation reveals that the combination ofsize and maximum degree allows detection based on forboth weighting schemes (the red point is above both criticallines, in panel a), while only the adaptive weighting schemedetects the anomaly for the other three feature pairs depicted(panels b,c,d).The performance of equal and adaptive weighting fusion

methods with all possible combinations of features atare summarized in Table II. For example, when the fusiondimension , the possible number of combination of fea-ture dimensions is 36, and both equal and adaptive weightingmethods can detect 24 cases, but only adaptive weightingscheme can detect 5 additional cases. Note that there is no casethat only equal weighting scheme can detect while adaptiveweighting scheme cannot.

V. DISCUSSION

We have demonstrated, via simulation results using a latentprocess model for time series of graphs as well as illustrative ex-perimental results for a time series of graphs derived from theEnron email data, that an adaptive weighting methodology for

Page 8: Anomaly Detection in Time Series of Graphs using Fusion of Graph Invariants

74 IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. 7, NO. 1, FEBRUARY 2013

Fig. 12. Enron email data scatter plots of for for various pairs of invariants. is shown in red. The red point is above both criticallines in panel a, indicating that the combination of size and maximum degree allows detection based on for both weighting schemes. In panels b,c,d, it isapparent that only the adaptive weighting scheme detects the anomaly. Unlike Fig. 7, the detection boundaries for the adaptive weighting is linear, and it is becausethere is only one graph. (a) size vs. maxd; (b) MAD vs. cc; (c) scan1 vs. numtri; (d) scan3 vs numtri.

fusing information from graph features provides superior infer-ential efficacy for a certain class of anomaly detection problems.One notable implication of this work is that inferential

performance in the mathematically tractable approximationmodels in [1] does indeed provide guidance for methodolog-ical choices applicable to the exact (realistic but intractable)model. Furthermore, to the extent possible, we may tentativelyconclude that model investigations have some bearing on realdata applications.An important extension of this work will be to time series of

weighted and/or attributed graphs, where message count and/orcontent is used to augment edges with (categorical) “topic”attributes. See, for instance, [18]–[20], wherein authors demon-strated for a number of inferential tasks that using contentand context together can provide superior inferential capabilitywhen compared to either alone. Along with the fusion techniqueintroduced in this paper, changes in communication content,in addition to excessive communication probability, can aiddetection.

ACKNOWLEDGMENT

The authors also would like to thank the editors and theanonymous referees for their valuable comments and critiquesthat greatly improved this work.

REFERENCES

[1] N. H. Lee and C. E. Priebe, “A latent process model for time seriesof attributed random graphs,” Statistical Inference for Stochastic Pro-cesses, vol. 14, no. 3, pp. 231–253, 2011.

[2] E. R. Scheinerman and K. Tucker, “Modeling graphs using dot productrepresentations,”Comput. Statist. vol. 25, pp. 1–16, Jan. 2010 [Online].Available: http://dx.doi.org/10.1007/s00180-009-0158-8

[3] S. J. Young and E. R. Scheinerman, “Random dot product graphmodelsfor social networks,” in Proc. 5th Int. Conf. Algorithms and Models forthe Web-Graph, 2007, pp. 138–149 [Online]. Available: http://portal.acm.org/citation.cfm?id=1777879.1777890

[4] B. Bollobás, S. Janson, and O. Riordan, “The phase transition in inho-mogeneous random graphs,” Random Structures and Algorithm, vol.31, pp. 3–122, 2007.

[5] P. Hoff, A. E. Raftery, and M. S. Handcock, “Latent space approachesto social network analysis,” J. Amer. Statist. Assoc., vol. 97, pp.1090–1098, 2002.

[6] B. Bollobás, Random Graphs, 2nd ed. Cambridge, U.K.: CambridgeUniv. Press, 2001.

[7] C. E. Priebe, J. M. Conroy, D. J. Marchette, and Y. Park, “Scan sta-tistics on enron graphs,” Comput. Math. Organiz. Theory, vol. 11, pp.229–247, Oct. 2005.

[8] H. Pao, G. A. Coppersmith, and C. E. Priebe, “Statistical inferenceon random graphs: Comparative power analyses via Monte Carlo,” J.Comput. Graph. Statist., vol. 20, no. 2, pp. 395–416, 2011.

[9] C. E. Priebe, G. A. Coppersmith, and A. Rukhin, “You say graph in-variant, I say test statistic,” ASA Sec. Statist. Comput. Statist. Graph.SCGN Newsletter, vol. 21, no. 2, Dec. 2010.

[10] T. Ide and H. Kashima, “Eigenspace-based anomaly detection in com-puter systems,” in Proc. Tenth ACM SIGDD Int. Conf. Knowl. Discov.Data Mining, 2005, pp. 440–449.

Page 9: Anomaly Detection in Time Series of Graphs using Fusion of Graph Invariants

PARK et al.: ANOMALY DETECTION IN TIME SERIES OF GRAPHS USING FUSION OF GRAPH INVARIANTS 75

[11] B. A. Miller, M. S. Beard, and N. T. Bliss, “Matched filtering formatchedfiltering for subgraph detection in dynamic networks,” inProc.IEEE Statist. Signal Process. Workshop (SSP), 2011, pp. 509–512.

[12] N. Borges, G. A. Coppersmith, G. G. L. Meyer, and C. E. Priebe,“Anomaly detection for random graphs using distributions of vertexinvariants,” in Proc. 2011 45th Annu. Conf. Inf. Sci. Syst. (CISS), Mar.2011, pp. 1–6.

[13] J. Neil, C. Storlie, C. Hash, A. Brugh, and M. Fisk, “Scan statistics forthe online detection of locally anomalous subgraphs,” Technometrics(in review), 2012, submitted for publication.

[14] C. Horn and R. Willett, “Online anomaly detection with expert systemfeedback in social networks,” in Proc. 2011 IEEE Int. Conf. Acoust.,Speech, Signal Process. (ICASSP), May 2011, pp. 1936–1939.

[15] J. Sharpnack, A. Rinaldo, and A. Singh, Changepoint Detection OverGraphs With the Spectral Scan Statistic arXiv/1206.0773, 2012.

[16] M. Valko, “Adaptive Graph-based Algorithms for ConditionalAnomaly Detection and Semi-Supervised Learning,” Ph.D. disserta-tion, University of Pittsburgh, , 2011.

[17] D. Ullman and E. R. Scheinerman, Fractional Graph Theory. NewYork: Wiley, 1997.

[18] J. Grothendieck, C. E. Priebe, and A. L. Gorin, “Statistical inferenceon attributed random graphs: Fusion of graph features and content,”Comput. Statist. Data Anal., vol. 54, pp. 1777–1790, 2010.

[19] C. E. Priebe, Y. Park, D. J. Marchette, J. M. Conroy, J. Grothendieck,andA.Gorin, “Statistical inference on attributed randomgraphs: Fusionof graph features and content: An experiment on time series of enrongraphs,”Comput. Statist. Data Anal., vol. 54, pp. 1766–1776, 2010.

[20] M. Tang, Y. Park, N. H. Lee, and C. E. Priebe, “Attribute fusion ina latent process model for time-series of graphs,” IEEE Trans. SignalProcess., 2012, accepted for publication.

Youngser Park received the B.E. degree in electricalengineering from Inha University in Seoul, Koreain 1985, the M.S. and Ph.D. degrees in computerscience from The George Washington University in1991 and 2011 respectively. From 1998 to 2000 heworked at the Johns Hopkins Medical Institutes asa senior research engineer. From 2003 until 2011he worked as a senior research analyst, and hasbeen an associated research scientist since 2011in the Center for Imaging Science at the JohnsHopkins University. At Johns Hopkins, he holds

joint appointments in the Department of Applied Mathematics and Statisticsand the Human Language Technology Center of Excellence.He has reviewed papers for ACM Transactions on Knowledge Discovery

in Data, Statistical Analysis and Data Mining, and WIREs ComputationalStatistics.His current research interests are clustering algorithm, pattern classification,

and data mining for high-dimensional and graph data.

Carey E. Priebe (SM’08) received the B.S. degreein mathematics from Purdue University in 1984, theM.S. degree in computer science from San DiegoState University in 1988, and the Ph.D. degree ininformation technology (computational statistics)from George Mason University in 1993. From 1985to 1994 he worked as a mathematician and scientistin the US Navy research and development laboratorysystem. Since 1994 he has been a professor in theDepartment of Applied Mathematics and Statistics,Whiting School of Engineering, Johns Hopkins Uni-

versity, Baltimore, Maryland. At Johns Hopkins, he holds joint appointmentsin the Department of Computer Science and the Center for Imaging Science.He is a past President of the Interface Foundation of North America—Com-

puting Science & Statistics, a past Chair of the Section on Statistical Computingof the American Statistical Association, and on the editorial boards of Journalof Computational and Graphical Statistics, Computational Statistics and DataAnalysis, and Computational Statistics.His research interests are in computational statistics, kernel and mixture es-

timates, statistical pattern recognition, statistical image analysis, and statisticalinference for high-dimensional and graph data. He was elected Fellow of theAmerican Statistical Association in 2002.

Abdou Youssef (SM’12) received his M.A. andPh.D degrees in Computer Science from PrincetonUniversity, Princeton, NJ, in 1985 and 1988, re-spectively, and his BS in Mathematics from theLebanese University in 1981. He joined the Schoolof Engineering and Applied Science at the GeorgeWashington University (GWU) in Fall 1987 asAssistant Professor, and is now a tenured fullProfessor and, since 2008, Chair of the Depart-ment of Computer Science at GWU. His researchinterests are search & retrieval, audio-visual data

processing, pattern recognition, data error recovery, theory and algorithms.He has published over a hundred papers in those areas, and co-edited a book.His research has been funded by NSF, NSA, and NIST. Dr. Youssef and hisstudents developed for NSA a system that recovers from fax errors withoutretransmission. Recently, he has created for NIST a new math-search engine,which is first of its kind and is deployed online at http://dlmf.nist.gov/. Forthat work, he received a Gold Medal from the US Department of Commercein December 2011, and co-received the 2011 Government Computer NewsAward. Dr. Youssef is a six-time recipient of the Teacher of the Year Awardfrom his Department and School. He is also listed in the Who is Who AmongAmerica’s Teachers. Dr. Youssef has been invited to lecture throughout theworld, including Germany, United Kingdom, Canada, China, and Brazil.Professor Youssef is a senior member of IEEE.