Structural analysis of network traffic matrix via … · Trafﬁc matrix structural analysis ... method for trafﬁc matrix estimation, and the experiments demonstrated that their

Computer Networks 56 (2012) 2049–2067

Contents lists available at SciVerse ScienceDirect

Computer Networks

journal homepage: www.elsevier .com/ locate/comnet

Structural analysis of network traffic matrix via relaxed principalcomponent pursuit q

Zhe Wang a,b,⇑, Kai Hu a, Ke Xu a,b, Baolin Yin a,b, Xiaowen Dong c

a School of Computer Science and Engineering, Beihang University, Beijing 100191, Chinab State Key Laboratory of Software Development Environment, Beihang University, Beijing 100191, Chinac Signal Processing Laboratories (LTS4/LTS2), Ecole Polytechnique Fédérale de Lausanne (EPFL), CH-1015 Lausanne, Switzerland

a r t i c l e i n f o

Article history:Received 23 February 2011Received in revised form 24 February 2012Accepted 27 February 2012Available online 6 March 2012

Keywords:Network measurementTraffic matrix structural analysisRobust principal component analysisRelaxed principal component pursuitAccelerated proximal gradient algorithm

1389-1286/$ - see front matter � 2012 Elsevier B.Vdoi:10.1016/j.comnet.2012.02.017

q This work is supported by the National Natural SChina (Grant No. 61073013) and the Aviation Science(Grant No. 2010ZA04001).⇑ Corresponding author at: School of Computer S

ing, Beihang University, Beijing 100191, China.E-mail addresses: [email protected] (Z.

edu.cn (K. Hu), [email protected] (K. Xu), y(B. Yin), [email protected] (X. Dong).

a b s t r a c t

The network traffic matrix is widely used in network operation and management. It istherefore of crucial importance to analyze the components and the structure of the net-work traffic matrix, for which several mathematical approaches such as Principal Compo-nent Analysis (PCA) were proposed. In this paper, we first argue that PCA performs poorlyfor analyzing traffic matrix that is polluted by large volume anomalies, and then propose anew decomposition model for the network traffic matrix. According to this model, we carryout the structural analysis by decomposing the network traffic matrix into three sub-matri-ces, namely, the deterministic traffic, the anomaly traffic and the noise traffic matrix, whichis similar to the Robust Principal Component Analysis (RPCA) problem previously studiedin [13]. Based on the Relaxed Principal Component Pursuit (Relaxed PCP) method and theAccelerated Proximal Gradient (APG) algorithm, we present an iterative approach fordecomposing a traffic matrix, and demonstrate its efficiency and flexibility by experimen-tal results. Finally, we further discuss several features of the deterministic and noise traffic.Our study develops a novel method for the problem of structural analysis of the trafficmatrix, which is robust against pollution of large volume anomalies.

� 2012 Elsevier B.V. All rights reserved.

1. Introduction

1.1. The Internet traffic data

The Internet traffic data is considered as a significant in-put for network operation and management. To monitorand analyze traffic data efficiently is one of the mostimportant problems in the research field of network mea-surements. In general, there are two levels of traffic data:

. All rights reserved.

cience Foundation ofFoundation of China

cience and Engineer-

Wang), [email protected]@nlsde.buaa.edu.cn

the packet-level data and the flow-level data. With the ra-pid growth of the Internet scale and link transmitting capa-bility, on the one hand, it is usually infeasible to collect andprocess the complete packet-level data; On the other hand,the coarse flow-level data obtained by packet sampling of-ten contains enough network information and has becomeincreasingly popular in recent studies. As an example, onetype of widely used flow-level traffic data is the IP flowscollected in each router by the Netflow protocol. Roughlyspeaking, each IP flow is a sequence of packets sharingthe same Source/Destination IP addresses, Source/Destina-tion port numbers and transport protocol during certaintime intervals. However, in large scale networks, the vol-ume of IP flow data is still too huge for storage and pro-cessing. For instance, the one-month IP flow data of theGEANT backbone network is about 150 GB [1], whichmakes important applications such as anomaly detectionimpractical. Therefore, it is necessary to further compress

http://dx.doi.org/10.1016/j.comnet.2012.02.017

mailto:[email protected]

mailto:hukai@buaa. edu.cn

mailto:hukai@buaa. edu.cn

mailto: [email protected]

mailto: [email protected]

mailto:[email protected]

http://dx.doi.org/10.1016/j.comnet.2012.02.017

http://www.sciencedirect.com/science/journal/13891286

http://www.elsevier.com/locate/comnet

2050 Z. Wang et al. / Computer Networks 56 (2012) 2049–2067

the IP flow data, mainly by means of flow sampling andaggregation.

The network traffic matrix is computed by IP flowaggregation, which records how much data is transmittedbetween each Original-Destination (OD) pair during differ-ent time intervals. For each OD pair (k1, k2), the originalpoint k1 and the destination point k2 are both the Pointsof Presence (PoP) in the network, and we aggregate allthe IP flows which enter the network at k1 and exit at k2

to represent the OD flow corresponding to (k1, k2). The re-search topics in traffic matrix analysis mainly include: (1)To estimate a traffic matrix accurately; (2) To generatesynthetic traffic matrix; (3) To utilize a traffic matrix effec-tively for measurement applications, such as anomalydetection and routing optimization. These topics requirea deep understanding of the components and structure ofthe traffic matrix. In this paper, we carry out structuralanalysis of the traffic matrix by studying different trafficcomponents that constitute the traffic matrix.

1.2. Traffic matrix and its structural analysis

Considering the network traffic with p OD flows duringt time intervals, the corresponding traffic matrix X is a t � pnon-negative matrix. For each integer 1 6 j 6 p, the jth col-umn Xj of X is the traffic time series for the jth OD flow; foreach integer 1 6 i 6 t, the ith row X(i, :) of X is the trafficsnapshot of all the OD flows during the ith time interval.According to the datasets adopted in this paper, we sup-pose t > p. In real network measurements, the Netflow pro-tocol consumes much CPU resources; some PoP routersmight not support Netflow; and the IP flow data mightget lost during the transmission. These limitations makethe collection of a complete traffic matrix a challengingtask, therefore many estimation algorithms by means ofindirect measurements (such as link traffic data collectedby the SNMP protocol) were proposed in the literature. Inrecent studies, the error of the third generation estimationalgorithms have been decreased to below 10%. In this pa-per, however, we do not concentrate on the estimationproblem. Instead, we perform our experiments based onreal world datasets of traffic matrices. These datasets arecollected from the Abilene networks (in the US) and theGEANT networks (in Europe), which are available from[2,3], respectively.

In general, a traffic matrix is a combination of differentclasses of network traffic. In network operation and man-agement, people usually need information on all classesof traffic, such as the deterministic traffic and the anomalytraffic. In this paper, we study the structural analysis prob-lem of a traffic matrix, which is to accurately decomposethe traffic matrix into sub-matrices that correspond to dif-ferent classes of traffic, hence explore in detail various fea-tures of the network traffic.

The most widely used approach for traffic matrix anal-ysis is Principal Component Analysis (PCA) and its variants.For example, Lakhina et al. [4] first introduced the PCAmethod in the studies of traffic matrices, and they foundthat traffic matrices can be well approximated by a fewprincipal components that correspond to the largest singu-lar values of the matrices. Therefore, they claimed that

traffic matrices usually have low effective dimensions. Theyfurther introduced the concept of eigenflow and the eigen-flow classification method, discussed the distribution pat-tern of different eigenflow classes, and proposed amethod to decompose each OD flow time series accordingto the classification results. These ideas were further devel-oped in their later work [5], in which they presented thePCA-subspace method for volume anomaly detection, anddecomposed the link traffic matrix into two sub-matricesthat correspond to a normal subspace and an anomaly sub-space, respectively. During each time interval, the norm ofthe traffic volume that corresponds to the anomaly sub-space was compared with the Q-statistic threshold, whoseresult was used to infer the existence of anomalies in thenetwork.

After Lakhina’s studies, many researchers have enrichedthe PCA-based methods for traffic matrix analysis. Huanget al. [6] proposed a distributed PCA method for volumeanomaly detection, which considered the trade-off be-tween the detection accuracy and the data communicationoverhead. Zhang et al. [7] extended the classical PCA meth-od and argued that large volume anomalies should be ex-tracted via both spatial and temporal approaches, whichwere later named as Network Anomography. Based on thefact that traffic matrices often have low effective dimen-sions, Soule et al. [8] proposed a new principal componentmethod for traffic matrix estimation, and the experimentsdemonstrated that their approach has a lower estimationerror compared to most of the previous methods, such asthe tomogravity method and the route change method.

However, recent studies have shown some limitationsof the PCA-based methods. Ringberg et al. [10] experi-mented the PCA-based anomaly detection method, sug-gesting that its efficiency is very sensitive to the choiceof parameters, such as the number of principal compo-nents in the normal subspace, the value of detectionthreshold, and the level of traffic aggregation. In addition,they found that large volume anomalies might pollutethe normal subspace, which could lead to high false posi-tive rate in anomaly detection. Ohsita et al. [11] arguedthat the traffic matrix estimated by network tomographyis not a proper input for the PCA-based anomaly detectors.Since most estimation methods are designed for an anom-aly-free traffic matrix, the estimation error might increasewhen the network traffic contains large volume anomalies.Instead, they suggested estimating the increased trafficmatrix and obtained a high attack-detection rate. Their re-search also indicated the strict requirements of input traf-fic matrix for the PCA-based methods. More recently,Rubinstein et al. [12] proved that attackers could signifi-cantly degrade the performance of the PCA-based anomalydetectors simply by adding chaff volumes before real at-tacks, and designed an anomaly detector that is more ro-bust against these attacks.

1.3. Main contributions of this paper

As mentioned above, although it has been extensivelystudied before, PCA-based methods still have limitationsfor traffic matrix analysis and related applications. Oneimportant drawback is that, when the traffic matrix is

Z. Wang et al. / Computer Networks 56 (2012) 2049–2067 2051

corrupted by large volume anomalies, the resulting princi-pal components will be significantly skewed from those inthe anomaly-free case. This prevents the subspace-basedmethods from accurately decomposing the total traffic intonormal traffic and anomaly traffic, and decreases the effi-ciency of PCA-based anomaly detectors. However, to ourknowledge there are only a few existing methods for ana-lyzing a traffic matrix with large volume anomalies. This isgoing to be the focus of this paper, where we address theproblem of structural analysis of polluted traffic matrices.The main contribution of our paper is twofold:

(1) As the basic assumption behind the subspace-basedmethods is that each eigenflow can be exactly classi-fied, it is an interesting question whether those clas-sification method still perform well for a pollutedtraffic matrix. Specifically, (i) is there an eigenflowthat satisfies more than one classification criterionof the eigenflow classes? (ii) is there an eigenflowthat satisfies no classification criterion of the eigen-flow classes? (iii) does the distribution pattern ofeigenflows maintain for a polluted traffic matrix?We discuss these problems in Section 2, where weuse PCA for the structural analysis of real world traf-fic matrices, which usually have large volumeanomalies.

(2) As the PCA-based structural analysis performspoorly when the traffic matrix contains large volumeanomalies, it is necessary to provide a new analysistool that is suitable for polluted traffic matrices. InSections 3 and 4, we propose a new decompositionmodel for the traffic matrix based on empirical net-work traffic data, and formalize the mathematicaldefinition of the structural analysis problem. Thismotivates us to discover the equivalence betweenstructural analysis and the Robust Principal Compo-nent Analysis (RPCA) problem. We then design adecomposition algorithm based on the Relaxed Prin-cipal Component Pursuit (Relaxed PCP) method,which is suitable for solving the RPCA problem.Using this algorithm, we are able to obtain a propertraffic decomposition for the polluted traffic matri-ces in our experiments. Finally, we analyze severalproperties of the sub-matrices from the decomposi-tion of the traffic matrix in detail in Section 5.

2. PCA for the structural analysis of polluted trafficmatrix

2.1. The classical PCA method

PCA is widely used in high dimensional data analysis,where the redundant high dimensional data can beapproximated by a low dimensional representation. Inour study, we consider each row vector of the traffic matrixX 2 Rt�p as a data point in Rp, thus X contains t data points.Following the common approach in [4,5], we normalizeeach OD flow vector (columns of X) to have zero mean be-fore performing PCA:

Xj ¼ Xj �meanðXjÞ j ¼ 1;2; . . . ;p: ð1Þ

PCA can be viewed as a coordinate transformation pro-cess, where the data points have been transformed fromthe original coordinate system to a new coordinate system.All the unit vectors of the new coordinate system are rep-resented as fv igp

i¼1, and vi is called the ith principal compo-nent vector. The first principal component vector v1

captures the maximum variance (energy) of the originaltraffic matrix X:

v1 ¼ argmaxkvk¼1

kXvk: ð2Þ

For each integer k P 2, suppose we have obtained thefirst k � 1 principal component vectors, the kth principalcomponent vector vk then captures the maximum varianceof the residual traffic matrix, which is the difference be-tween the original traffic matrix X and its mappings ontothe first k � 1 principal component vectors:

vk ¼ argmaxkvk¼1

X �Pk�1

i¼1Xv ivT

i

� �v

��: ð3Þ

Following this progress, all the principal componentvectors are defined iteratively. It is easy to show thatfv igp

i¼1 form an orthogonal basis of Rp. Thus the traffic ma-trix can be decomposed as:

X ¼ X½v1 v2 . . . vp�½v1 v2 . . . vp�T ¼Ppi¼1

Xv ivTi

¼Ppi¼1kXv ik

Xv i

kXv ikvT

i ¼Ppi¼1kXv ikuivT

i ; ð4Þ

where

ui ¼Xv i

kXv iki ¼ 1;2; . . . ;p ð5Þ

is a unit vector in Rt , which is called the ith eigenflow cor-responding to vi [4].

Following basic matrix theory, we can show that theprincipal component vectors fv igp

i¼1 are the eigenvectorsof the matrix XTX, sorted by the corresponding eigenvaluesfkigp

i¼1 in a descending order:

XT Xv i ¼ kiv i i ¼ 1;2; . . . ;p; ð6Þ

where k1 P k2 P . . . P kp P 0. Furthermore, ri ¼ffiffiffiffikip

iscalled the i-th singular value of X. Therefore, fv igp

i¼1 canbe found by computing the eigenvectors of XTX.

Since kXv ik ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffivT

i XT Xv i

q¼

ffiffiffiffiffiffiffiffiffiffiffiffiffiffikivT

i v i

q¼ ri, Eq. (4) can be

rewritten as:

X ¼Ppi¼1

riuivTi : ð7Þ

Eq. (7) is called the Singular Value Decomposition (SVD) ofX. By Eckart–Young theorem [19], for each integer 1 6 r 6 p,Ar ¼

Pri¼1riuivT

i is the best rank-r approximation of X:

Ar ¼ argminrankðAÞ6r

kX � AkF ; ð8Þ

where k � kF denotes the Frobenius matrix norm. Each datapoint Xðs; : ÞT 2 Rp ð1 6 s 6 tÞ can then be approximated as

Xðs; : ÞT �Pri¼1

riuiðsÞv i; ð9Þ


where friuiðsÞgri¼1 are the first r coefficients of X(s, :)T under

the new coordinate system. Therefore, PCA can be viewedas a technique for dimensionality reduction.

2.2. Eigenflows of the polluted traffic matrix

In the following experiments, we adopt two widelyused traffic matrix datasets, one from the Abilene networkand the other from the GEANT network. Abilene is a Inter-net2 backbone network with 12 PoPs (144 OD flows), andGEANT is a pan-European research network with 23 PoPs(529 OD flows). In this paper, the minimal time intervalin the Abilene dataset is 5 min; the minimal time intervalin the GEANT dataset is 15 min, since the flow files arewritten in 15 min. In fact, there exists flow data with finertime scale. For example, the minimal time interval of an-other GEANT traffic dataset is precise to 1 s in [28]. TheAbilene dataset contains 24 weeks’ traffic records fromMarch 1, 2004 to September 10, 2004, and each week’sdata is represented by a traffic matrix. Here we select thetraffic matrices that correspond to the first 8 weeks (de-noted as X01–X08). For the GEANT dataset, traffic recordsare not complete for certain days. Since most of theresearchers study the weekly traffic matrix, we choose asubset of the GEANT dataset: the four consecutive weeks’traffic matrices from March 28, 2005 to April 24, 2005 (de-noted as Y01–Y04). Each traffic matrix in the Abilene data-set consists of 2016 rows (time steps), while each GEANTtraffic matrix consists of 672 rows.

Notice that for each of these traffic matrices, there are asmall number of OD flow time series that contain a largepercentage of zero entries. This usually means that theseOD flows are not stable, therefore we delete them in theexperiments. For each traffic matrix in the Abilene dataset,we delete the 23 OD flows whose source or destination PoPis ’’ATLA-M5’’ (thus the number of OD flows actually usedis 121); For each traffic matrix in the GEANT dataset, wedelete those OD flows that have more than 50% zero entries(the number of OD flows actually used is between 457 and483). Table 1 summarizes the datasets used in our experi-ments, and Table 2 (the first column) shows the number ofOD flows actually used for each traffic matrix.

In Lakhina’s original study [4], all the eigenflows of atraffic matrix can be classified into three types: First, theeigenflows that exhibit distinct periodical patterns arecalled d-eigenflow (for ‘‘deterministic’’), since they reflectthe diurnal activity in the network traffic, as well as thedifference between weekday and weekend activities; Sec-ond, the eigenflows that represent strong, short-livedspikes are called s-eigenflow (for ‘‘spike’’), as they capturethe occasional traffic bursts and dips which are usually re-ported; Third, the eigenflows that roughly have a station-ary and Gaussian behavior are called n-eigenflow (for

Table 1Datasets used in the experiments.

Name #OD flows/actually used Tim

Abilene 144/121 5 mGEANT 529/457–483 15

‘‘noise’’), since they capture the remaining random varia-tion that arises due to multiplexing of many individualtraffic sources.

In this paper, we follow the classification criteria in [4]for both d-eigenflow and s-eigenflow. The original classifi-cation criterion for n-eigenflow in [4] is to compare theqq-plot of eigenflow’s distribution with the normal distri-bution; However, it is not considered as a quantitativemethod. Therefore, we use another classification criterionfrom the Kolmogorov–Smirnov (K–S) test instead. Supposeuj is an eigenflow of the traffic matrix X, we classify itaccording to the following three criteria:

(1) d-eigenflow: Let H denote the set of period parame-ters measured in hours. For each element h 2 H, wecompute the Fourier power spectrum eujðhÞ of uj:

e interv

inmin

eujðhÞ ¼Pt�1

k¼0ujðkþ 1Þ � expð�xkiÞ

��2

=t

x ¼ 2p=T

T ¼ 60h=t0

8>>><>>>: ;

where T is the period of Fourier transform and t0 is thelength of time interval (measured in minutes). In this pa-per, H ¼ fkg10

k¼1

Sf2kg25

k¼6. If f12;24gT

argmaxk2Hf eujðkÞg– /, uj satisfies the criterion of d-eigenflow and we classifyit as d-eigenflow;

(2) s-eigenflow: Let r denote the standard deviation ofuj. If uj has at least one entry outside the interval[mean (uj) � 5r, mean (uj) + 5r], uj satisfies the cri-terion of s-eigenflow and we classify it as s-eigenflow;

(3) n-eigenflow: We use the K–S test to verify the nor-mal distribution of uj. If the null hypothesis (NormalDistribution) is not rejected at 5% significance level,uj satisfies the criterion of n-eigenflow and we clas-sify it as n-eigenflow.

In order to evaluate the completeness (each eigenflowhas to be classified into at least one class) and orthogonal-ity (the same eigenflow must not be classified into morethan one class at the same time) of eigenflow classification,we further define the following two concepts:

� indeterminate eigenflow: eigenflows that satisfy morethan one classification criterion;� non-determinate eigenflow: eigenflows that satisfy no

classification criterion.

Next, we apply PCA to compute the principal componentvectors and eigenflows of each traffic matrix. Following theclassification criteria proposed above, we summarize theclassification results of eigenflows in Table 2. Here we

al Time steps Peroid

2016 8 weeks672 4 weeks

Table 2Eigenflow classification using PCA.

Traffic matrix name (#OD flows used)

# Satisfy d-eigenflow

# Satisfy s-eigenflow

# Satisfy n-eigenflow

# non-determinateeigenflow

# Indeterminateeigenflow

# Classifiedeigenflow

Unclassifiedenergy rate (%)

X01 (121) 10 42 87 10 26 85 83.08X02 (121) 9 46 80 10 22 89 82.59X03 (121) 14 78 47 12 29 80 11.08X04 (121) 10 62 74 5 28 88 94.67X05 (121) 8 73 62 9 30 82 46.09X06 (121) 15 52 83 9 33 79 0.32X07 (121) 6 63 75 9 30 82 9.39X08 (121) 10 53 71 11 24 86 98.49Y01 (483) 17 54 469 5 59 419 76.67Y02 (465) 5 66 453 6 65 394 77.70Y03 (465) 7 47 454 5 48 412 43.19Y04 (457) 12 62 444 5 66 386 82.42


define the unclassified energy rate as the percentage of en-ergy captured by the principal component vectors that cor-respond to either indeterminate or non-determinateeigenflows. Since the energy captured by one principalcomponent vector is proportional to the square of the cor-responding singular value, we have:

unclassified energy rate ¼P

k2UEIDkkPpi¼1ki

; ð10Þ

where the union of unclassified eigenflow ID (UEID) is

UEID ¼ fkjuk is indeterminate or non-determinateg:ð11Þ

Based on the classification results of the twelve weeklytraffic matrices described above, we make the followingobservations on traffic matrices that are possibly pollutedby large volume anomalies:

(1) One the one hand, only a small number of eigen-flows satisfy the classification criterion of d-eigen-flow (usually less than 20), and most of themcorrespond to large singular values; On the otherhand, there is a considerable number of eigenflowssatisfying the classification criterion of s-eigenflowand n-eigenflows. The proportion of each eigenflowclass varies from one traffic matrix to another. Theseare similar to Lakhina’s experimental results pre-sented in [4].

(2) The PCA-based eigenflow classification methodshows serious limitations in terms of classificationcompleteness and orthogonality. Specifically, a largenumber of eigenflows cannot be exactly classifiedinto one eigenflow class. Furthermore, some unclassi-fied eigenflows correspond to large singular values,and the unclassified energy rate is larger than 70%for seven of the twelve traffic matrices. These resultsare not consistent with those in Lakhina’s study [4], inwhich the non-determinate eigenflows do not existand the indeterminate eigenflows only contribute lit-tle energy. Although the authors in [4] argued thattheir classification method could be enhanced byheuristic mechanisms, our experiments show thatsome unclassified eigenflows are essentially different

from all the eigenflow classes, hence the classificationresults cannot be clearly improved only by changingparameters or adopting heuristic algorithms.

(3) For each traffic matrix in the Abilene dataset (X01–X08), the first six eigenflows (eigenflows corre-sponding to the six largest singular values) oftencontain some instances satisfying the classificationcriterion of s-eigenflow. This does not happen inLakhina’s study [4], where the first six eigenflowsare exactly classified as d-eigenflows. For the GEANTtraffic matrices, the first six eigenflows do not satisfythe criterion of s-eigenflow in general, which can beexplained by the fact that the anomaly volumes inthe GEANT networks are not as large as that in theAbilene networks.

Here we present the experimental result for the trafficmatrix X01 as a case study. Fig. 1 shows the classificationresult of the 121 eigenflows (sorted from large to smallby the corresponding singular values), where each indeter-minate eigenflow appears simultaneously in more thanone classes whose classification criteria it satisfies. Thefirst six eigenflows and their Fourier power spectra areshown in blue in Figs. 2 and 3, respectively. Each pair ofred lines in Fig. 2 is the +5r/�5r boundary for the corre-sponding eigenflow, which is used for the inference of s-eigenflow. It is clear that five out of six eigenflows inFig. 2 satisfy the classification criterion of s-eigenflow.However, for the first four eigenflows, their Fourier powerspectra all achieve the maximum values when the periodparameter is equal to 24 h, suggesting that they satisfythe criterion of d-eigenflow. Therefore, we can view eachof the first four eigenflows as a hybrid time series mixedwith the deterministic diurnal pattern and the short-livedanomaly pattern, which is quite different from the threepre-defined eigenflow classes. Changing parameters orusing heuristic algorithms in classification would not helpmuch in this case. If we classify these four eigenflows as d-eigenflow, the energy of anomaly traffic will be signifi-cantly underestimated, which might increase the false neg-ative rate of anomaly detection algorithms; On thecontrary, classifying them as s-eigenflow will lead tounderestimation of normal network traffic, which could

20 40 60 80 100 120

d−eigenflow

s−eigenflow

n−eigenflow

Eigenflow ID ( From Large to Small )

Fig. 1. Eigenflow classification result for the traffic matrix X01.


prevent us from correctly decomposing the deterministictraffic component from a polluted traffic matrix.

From the discussions above, we can see that PCA-basedmethod has limitations in eigenflow classification: (i) forboth the Abilene dataset and the GEANT dataset, a largeproportion of eigenflows could not be exactly classifiedinto one eigenflow class; (ii) for the Abilene dataset, someeigenflows corresponding to the six largest singular valuessatisfy the classification criterion of s-eigenflow, whichwould be problematic for us to isolate the deterministictraffic trend or to detect the anomaly traffic events. Thismotivates us to propose a new model for the structuralanalysis of traffic matrix in the next section.

3. Relaxed principal component pursuit for thestructural analysis of polluted traffic matrix

3.1. The decomposition model of traffic matrix

In this section we propose a new decomposition modelfor the traffic matrix data, and discuss the mathematicalnature of the structural analysis problem. Next, based onthe optimization process corresponding to PCA, we explainintuitively the limitations of PCA-based method for analyz-ing a polluted traffic matrix. This helps interpret the exper-imental results presented in Section 2.2 more deeply.

According to the empirical measurement data, we sup-pose that there exist three classes of network traffic: thedeterministic traffic which shows diurnal pattern; theanomaly traffic which appears rarely but involves largepeak-like or block-like volumes; the noise traffic that hassmall magnitude but appears in every OD flow during allthe time intervals. Formally speaking, we propose todecompose each traffic matrix into three sub-matrices:

X ¼ Aþ Eþ N; ð12Þ

where A, E and N represent the deterministic traffic matrix,the anomaly traffic matrix and the noise traffic matrix,respectively. Each class of traffic has its own features,based on which we make the following hypotheses:

(1) The deterministic traffic is mainly contributed byperiodical traffic. This means that the periodical traf-fic time series of different OD flows have similar peri-ods and phases, and they mainly differ in magnitudes.This implies that A should be a low-rank matrix.

(2) The anomaly traffic does not show up frequently,which implies that E should be a sparse matrix.However, its nonzero entries may have largemagnitudes.

(3) We assume that N is a random matrix consists ofindependent zero-mean Gaussian random variables.Each column of N (the noise traffic time series of anOD flows) can be viewed as Gaussian random vari-ables which have the same variance. Since the noisevariance is usually proportional to the scale of thecorresponding OD flow, and different OD flowsmay have very different scales, the Gaussian randomvariables corresponding to different columns mayhave different variances.

Eq. (12) and hypotheses (1)–(3) constitute our decom-position model for the traffic matrix. In practical measure-ments, only the traffic matrix X can be observed, thus themathematical nature of the structural analysis problem isto exactly decompose X into three components, namelyA, E and N in our decomposition model.

Consider a simple case where we neglect the noise trafficmatrix N in (12). Suppose that rank(A) = r0. According toSection 2.1, Ar0 ¼

Pr0i¼1riuivT

i is the best rank-r0 approxima-tion of X. Then, one natural question is to ask whetherA ¼ Ar0 . If this is true, the PCA method would achieve adecomposition of X that is consistent with our decomposi-tion model (Ar0 ¼ A, hence X � Ar0 ¼ E). However, it hasnot been shown that A is the solution to the optimizationproblem (8). Considering the magnitude of nonzero entriesin E, our discussion is further divided into the followingtwo cases:

(1) If the anomaly traffic only has entries with smallmagnitude, the Frobenius matrix norm kEkF =kX � AkF would be small, thus the deterministic traf-fic matrix A is most likely to be the solution to prob-lem (8). In this case A ¼ Ar0 , and these two matriceshave the same eigenflows. As the deterministic traf-fic is mainly contributed by diurnal traffic, the first r0

eigenflows of A are usually d-eigenflows (since ran-k(A) = r0). Following the definition of Ar0 , theseeigenflows are also the first r0 eigenflows of X.Therefore, when the anomaly traffic has entries withsmall magnitude, the PCA method performs well ineigenflow classification.

500 1000 1500 2000−0.5

0

0.5# 1 eigenflow

Time Step500 1000 1500 2000

−0.5

0

0.5# 2 eigenflow

Time Step

500 1000 1500 2000−0.5

0

0.5# 3 eigenflow

Time Step500 1000 1500 2000

−0.5

0

0.5# 4 eigenflow

Time Step

500 1000 1500 2000−0.5

0

0.5# 5 eigenflow

Time Step500 1000 1500 2000

−0.5

0

0.5# 6 eigenflow

Time Step

Fig. 2. The first six eigenflows of the traffic matrix X01.

10 20 30 400

0.005

0.01 # 1 eigenflow

Hours10 20 30 40

0

0.005

0.01 # 2 eigenflow

Hours

10 20 30 400

0.005

0.01 # 3 eigenflow

Hours10 20 30 40

0

0.005

0.01 # 4 eigenflow

Hours

10 20 30 400

0.005

0.01 # 5 eigenflow

Hours10 20 30 40

0

0.005

0.01 # 6 eigenflow

Hours

Fig. 3. Fourier power spectra of the first six eigenflows of X01.


(2) If the anomaly traffic has entries with large magni-tude, even though E is a sparse matrix, its Frobe-nius matrix norm kEkF = kX � AkF would be large.As a result, A is usually not the solution to problem

(8), and Ar0 contains a large amount of anomalytraffic. Therefore, some of the first r0 eigenflowsof X may satisfy the classification criterion of s-eigenflow.


The discussions above explain intuitively the experi-mental results presented in Section 2.2. The PCA-basedeigenflow classification can be considered as a special meth-od for traffic matrix decomposition. However, when thetraffic matrix is polluted by large volume anomalies, PCAcannot achieve a complete and orthogonal eigenflow classi-fication, and the PCA-based traffic matrix decomposition isinconsistent with the proposed decomposition model.

3.2. Robust principal component analysis and principalcomponent pursuit

Following the decomposition model proposed in Sec-tion 3.1, the structural analysis problem of the traffic ma-trix is to accurately decompose the original traffic matrixX into a deterministic traffic matrix A, an anomaly trafficmatrix E, and a noise traffic matrix N. This is similar tothe Low-rank Matrix Recovery problem, which is alsoknown as Robust Principal Component Analysis (RPCA) [13].

Recently, developments in the theory of CompressiveSensing [24,20] have attracted wide attentions in the fieldof information science. Compressive Sensing theory statesthat: If the signal has a sparse representation under someorthonormal bases or dictionaries, it can then be recoveredby far fewer samples or measurements than that areneeded using traditional methods. Partially motivated bythis claim, Ma et al. proposed the Principal Component Pur-suit (PCP) method for the RPCA problem. They studied theapproximate algorithms for PCP, and applied it to differentreal world applications such as video background model-ing [13], face alignment [15], and web document corpusanalysis [16]. Following the definitions in [13,14,17], webriefly describe the RPCA problem and the PCP methodas follows. We assume that X, A, E, N are real matrices inRt�p; K(�) denotes the support set of a matrix, which isthe union of non-zero positions of the matrix.

Problem 1 (standard RPCA problem). Suppose thatX = A + E, where A and E are two unknown matrices.Assume that A is a low-rank matrix and E is a sparsematrix, the standard RPCA problem is to recover A and Efrom X.

The authors in [17] suggested that the standard RPCAproblem can be formulated as the following optimizationproblem:

minA;E

rankðAÞ þ ckEk0 s:t: Aþ E ¼ X; ð13Þ

where k � k0 = jK(�)j is the degree of the support set, whichis also called the l0-norm of the matrix; c is a positiveparameter that balance the two competing terms. Problem(13) consists of two sub-problems, namely the low-rankmatrix completion problem and the l0-norm minimizationproblem. Unfortunately, both of them are NP-hard, whichmakes problem (13) intractable in polynomial time.

On the one hand, thanks to the developments inCompressive Sensing theory, a series of previous workssuggested the equivalence between the l0-norm minimiza-tion and the l1-norm minimization problems; On the otherhand, recent research on the matrix completion problem[21] studied the matrix nuclear norm k � k⁄ (for a matrix

A 2 Rt�p, the nuclear norm kAk� ¼Pp

k¼1rkðAÞ is defined asthe sum of its singular values frkðAÞgp

k¼1), and indicatedthat the two optimization problems, namely, the matrixrank minimization problem and the problem of minimiz-ing the matrix nuclear norm, usually produce similar re-sults. More importantly, both the l1-norm minimizationand the nuclear norm minimization problems are convexoptimization problems hence can be solved efficiently.

The PCP method is used for solving a hybrid optimiza-tion problem that consists of l1-norm minimization andnuclear norm minimization: To relax the objective func-tion in (13), we replace the l0-norm with the l1-norm,and the rank with the matrix nuclear norm k � k⁄, respec-tively. Candes et al. [13] proved that, under surprisinglybroad conditions, ‘‘almost all’’ matrices of the formX = A + E, namely, matrices that are the sums of a low-rankmatrix A and a sparse matrix E, can be exactly decomposedinto these two components by solving the following con-vex optimization problem:min

A;EkAk� þ kkEk1 s:t: Aþ E ¼ X; ð14Þ

where k > 0 is a regularization parameter. In addition, theyshowed that k ¼ 1=

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffimaxðt; pÞ

pis a proper choice for the

parameter that is independent from A and E, and we followthis choice throughout the rest of the paper.

The standard RPCA problem assumes that X is strictlyequal to the sum of a low-rank matrix and a sparse matrix.However, in many real world applications, observationaldata often contains certain level of noise, and it usuallypollutes almost all the entries of the matrix. One of themost common perturbations is the Gaussian white noise,which leads to the generalized RPCA problem with Gauss-ian noise:

Problem 2 (generalized RPCA problem with Gaussiannoise). Suppose that X = A + E + N, where A, E and N areunknown matrices. Assume that A is a low-rank matrix, E isa sparse matrix, and N is a random matrix whose entriesfollow i.i.d. zero-mean Gaussian distributions with kNkF < dfor some positive d. The generalized RPCA problem is torecover A and E from X under the perturbation of N.

For Problem 2, Zhou et al. [14] generalized the PCPmethod to propose the Relaxed PCP method, which is tosolve the following optimization problem:

minA;EkAk� þ kkEk1 s:t: kX � A� EkF 6 d: ð15Þ

They proved that, under the same conditions as that PCPrequires, for any realization of the Gaussian noise satisfy-ing kNkF < d, the solution to the generalized RPCA problem(15) gives a stable estimation of A and E with highprobability.

The assumptions in the generalized RPCA problem aresimilar to the hypotheses on the traffic matrix in ourdecomposition model (12). In fact, we can multiply the col-umns of the traffic matrix by some constants to make surethat the Gaussian random variables in the noise traffic ma-trix have the same variance. Meanwhile, this multiplica-tion preserves the rank of the deterministic traffic matrixand the sparsity of the anomaly traffic matrix. Therefore,the Relaxed PCP method can be used for solving our struc-tural analysis problem.


3.3. The accelerated proximal gradient algorithm

The Relaxed PCP method used for solving the con-strained optimization problem (15) is usually computa-tionally expensive. A more efficient way is to solve anequivalent unconstrained optimization problem instead,using algorithms such as Iterative Thresholding (IT) [13],Augmented Lagrange Multiplier (ALM) and AcceleratedProximal Gradient (APG) [14]. In this paper, we adopt theAPG algorithm, which solves the following unconstrainedminimization problem:

minA;E

lkAk� þ lkkEk1 þ12kX � A� Ek2

F ; ð16Þ

where 12 kX � A� Ek2

F is the penalty function, and l > 0 is aLagrangian parameter. It has been proved in [14] that withsome proper choices of l = l(d), the solution to (16) isequivalent to the solution to (15).

As mentioned above, the choice of the regularizationparameter k follows that in [13,14]:

k ¼ 1ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffimaxðt;pÞ

p : ð17Þ

For the Lagrangian parameter l, it is chosen asffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi2 maxðt; pÞ

pr and ð

ffiffitpþ ffiffiffi

pp Þr in [14,22], respectively,

where r is the variance of Gaussian noise matrix N. Thesechoices are motivated by neglecting the effect of the sparsematrix E: if we set E = 0 in problem (16), the APG algorithmwhich solves this problem boils down to the Singular ValueThresholding algorithm with total sampling [14,23]. In ourcase, since the anomaly traffic matrix might contribute alarge proportion of energy, these choices are not suitable.Therefore we present a new choice of l in this paper. Con-sidering the case when A = 0, problem (16) boils down to:

minA;E

lkkEk1 þ12kX � Ek2

F : ð18Þ

If we consider X and E as two column vectors of dimensiont � p, problem (18) becomes the Basis Pursuit Denoisingproblem first introduced in [25,26]. As E is a sparse vector,we follow [25] to choose lk ¼ r

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi2 logðtpÞ

p. Since k is cho-

sen as in (17), we compute l accordingly as:

l ¼ rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi2 logðtpÞmaxðt; pÞ

q: ð19Þ

For each OD flow time series Xj 2 Rt (1 6 j 6 p), we need toestimate the variance rj of the Gaussian noise traffic. Thisis a well-studied signal processing problem. We adoptthe estimation method proposed in [27]: Given an ortho-normal wavelet basis, and let Wj ¼ aj

k

n ot=2

k¼1denote Xj’s

wavelet coefficients at the finest scale, rj is estimated asthe median absolute deviation of Wj divided by 0.6745:

rj ¼1

0:6745median aj

k �medianðWjÞ�� n o

; ð20Þ

where median (�) denotes the median value of a vector.This estimation method is motivated by the empirical factthat, wavelet coefficients at the finest scale are, with fewexceptions, essentially pure noise. In this paper, we adoptthe Daubechies-5 wavelet basis.

Now we are ready to present the proposed Algorithm 1(see Appendix) for traffic matrix decomposition, which ispartially based on the noisy-free version of the APG algo-rithm in [18].

4. Experiments

We decompose the twelve traffic matrices described inSection 2 using Algorithm 1 (X01–X08 are from the Abilenedataset and Y01–Y04 are from the GEANT dataset). The de-tailed experimental results are summarized in Table 3,where each row corresponds to the decomposition resultfor one traffic matrix. From left to right, the columns ofTable 3 represent:

(1) Name of the Traffic matrix.(2) The rank of the deterministic traffic matrix.(3) The rank of the original traffic matrix.(4) The ratio of (2) to (3), to evaluate the relative low-

rank degree of the deterministic traffic matrix.(5) The l0-norm of the anomaly traffic matrix.(6) t � p, where t and p are the number of rows and col-

umns of the traffic matrix, respectively.(7) The ratio of (5) to (6), to evaluate the relative spar-

sity level of the anomaly traffic matrix.(8) The ratio of Frobenius norm of the noise traffic

matrix to that of the corresponding original trafficmatrix, to evaluate how much energy is containedin the noise traffic matrix.

(9) Number of iterations implemented in the APGalgorithm for each traffic matrix. In this paper, thetolerance parameter for the stopping criterion isset to 10�6.

(10) Computational time of the implementation of theAPG algorithm for each traffic matrix (in seconds).In all the experiments, we use a commercial PC with2.0 GHz Intel Core2 CPU and 2.0 GB RAM.

All the traffic matrices in our experiments are decom-posed into three sub-matrices. According to the results inTable 3, these sub-matrices indeed satisfy the hypothesesof the decomposition model (12):

1. The ranks of the deterministic traffic matrices in theAbilene dataset are less than 13; for the GEANT dataset,the ranks are less than 31. The rank of each determinis-tic traffic matrix is less than 11% of the rank of the cor-responding original traffic matrix. Therefore, all thedecomposed deterministic traffic matrices are typicallow-rank matrices.

2. In both datasets, the l0-norm of one anomaly trafficmatrix does not exceed 40000. For each anomaly trafficmatrix, less than 17% entries are non-zero entries.Therefore, all the anomaly traffic matrices are typicalsparse matrices.

3. For all the twelve traffic matrices used in our experi-ments, the ratio of the Frobenius norm between thenoise traffic matrix and the original traffic matrix is lessthan 0.18. Hence, noise traffic matrices usually contrib-ute only a small proportion of the total energy.

Table 3Traffic matrix decomposition results using Relaxed PCP.

Traffic matrix Rank (A) Rank (X) Rank (A)/rank (X) kEk0 t � p kEk0/(t � p) kNkF/kXkF # Iteration Computation time (s)

X01 10 121 0.0826 32766 243936 0.1343 0.1718 86 24X02 11 121 0.0909 34713 243936 0.1423 0.1483 209 56X03 12 121 0.0992 37280 243936 0.1528 0.0824 254 70X04 11 121 0.0909 30519 243936 0.1251 0.0395 663 168X05 10 121 0.0826 30878 243936 0.1266 0.1173 86 31X06 10 121 0.0826 31133 243936 0.1276 0.0562 118 44X07 13 121 0.1074 37463 243936 0.1536 0.0887 170 64X08 12 121 0.0992 39287 243936 0.1611 0.0564 155 60Y01 31 483 0.0642 31505 324576 0.0971 0.1114 168 350Y02 28 465 0.0602 28575 312480 0.0914 0.1330 171 390Y03 30 465 0.0645 28651 312480 0.0917 0.1229 146 312Y04 30 457 0.0656 29119 307104 0.0948 0.0752 219 489


In addition, the number of iterations and the computa-tional time needed in the implementation of the APG algo-rithm are quite acceptable. Specifically, for all the trafficmatrices in the Abilene dataset, the computational time isless than three minutes; the average computational timefor the GEANT traffic matrices is longer, but still less thannine minutes.

To further understand the experimental results, we fo-cus on the decomposition results of three traffic matrices:X01, X04 and Y01. Fig. 4 shows the traffic matrix X01(upper left) in the Abilene dataset, and the three sub-matrices decomposed by the Relaxed PCP method. Wecan see that the deterministic traffic matrix (upper right)contributes most of the energy of the total network traffic,and the columns (corresponding to the deterministic traffictime series) show clear diurnal pattern, especially for ODflows with large magnitude. Most of the large volumeanomalies in X01 are short-lived and well isolated in the

Fig. 4. The decomposition resu

anomaly traffic matrix (bottom left). This matrix is indeedquite sparse, without distinct periodical traffic or noisetraffic. Furthermore, most of the entries in the noise trafficmatrix (bottom right) have small magnitudes, thereforethe noise traffic indeed contributes little energy to the totalnetwork traffic. For most of the OD flows, the variances ofthe noise traffic are proportional to their mean volume,that is, an OD flow with large magnitude usually has largenoise traffic. However, there also exist some OD flows ofmoderate magnitude which contain very large noise traffic,which will be discussed in Section 5.2. In summary, usingthe Relaxed PCP proposed described in Section 3.2, weachieve a proper decomposition for the traffic matrixX01, where all the sub-matrices satisfy the correspondinghypotheses in the decomposition model (12).

For comparison, we then decompose X01 using the PCA-based subspace method. Recall that in Figs. 2 and 3, the firstfour eigenflows satisfy the criteria of both d-eigenflow and

lt of X01 by Relaxed PCP.


s-eigenflow, and they contribute most of the energy of X01.Therefore, the key is the classification results for these foureigenflows. Suppose that for i 2 {1, 2, 4}, we classify the firsti eigenflow (s) as d-eigenflow, and the rest (if exists) in thefirst four eigenflows as s-eigenflow. The normal traffic ma-trix (projection onto the normal subspace) is then gener-ated by the first i d-eigenflow (s), and the residual trafficmatrix is the difference between the original traffic matrixand the normal traffic matrix. In other words, the normaltraffic matrix and the residual traffic matrix constitute adecomposition of X01 For each choice of i, the resulting nor-mal traffic matrix and residual traffic matrix are shown ineach row of Fig. 5. We can see that for i = 1, most of the largeanomaly traffic is isolated in the residual traffic matrix,while the normal traffic matrix only captures partially thedeterministic traffic in X01. In other words, the residualtraffic matrix also contains a large proportion of the diurnaltraffic. For i = 2, although more diurnal traffic is present inthe normal traffic matrix, this matrix still contains largeanomaly traffic. Thus we cannot efficiently identify largevolume anomalies from the residual traffic matrix. Fori = 4, since the fifth eigenflow is not a s-eigenflow and the

Fig. 5. The decomposition r

sixth is one, the normal traffic matrix is generated basedon the first five eigenflows of X01. In this case, the decom-position is even worse, since most of the anomaly traffic iscontained in the normal traffic matrix. This again demon-strates that, for traffic matrices with large volume anoma-lies, PCA is not a suitable method for the structuralanalysis problem. Furthermore, we believe that its perfor-mance cannot be significantly improved only by changingparameters or using heuristic mechanisms.

Fig. 6 shows the traffic matrix X04 in the Abilene data-set, as well as three sub-matrices decomposed by the Re-laxed PCP method (they are arranged in the same way asFig. 4). A significant difference between X04 and X01 isthat the former one contains long-lived large volumeanomalies. Therefore, the anomaly traffic in X04 contrib-utes a larger proportion of the total energy than that inX01. Using the same ways to compute parameters k andl in Algorithm 1, our experimental result in Fig. 6 showsthat the Relaxed PCP method can also exactly decomposea traffic matrix with long-lived large volume anomalies.

Fig. 7 displays the decomposition result of the trafficmatrix Y01 in the GEANT dataset. In general, the result is

esult of X01 by PCA.

Fig. 6. The decomposition result of X04 by Relaxed PCP.


similar to that of X01 and X04. However, since the GEANTtraffic matrices usually contain more unstable OD flowsthan the Abilene traffic matrices, the periodical traffic pat-tern shown in the resulting deterministic traffic matrix isless obvious compared to results of X01 and X04.

5. Discussions

According to the traffic matrix decomposition model(12), we decompose the traffic matrix into three sub-matrices, which correspond to three classes of networktraffic. Based on the experimental results obtained, wenow have further discussions on the deterministic trafficmatrix and the noise traffic matrix in this section (We donot discuss the anomaly traffic matrix since it may varysignificantly for different input traffic matrices).

5.1. Non-periodical traffic in the deterministic traffic matrix

As shown in Section 4, for each traffic matrix in ourexperiments (X01–X08 and Y01–Y04), the deterministictraffic matrix decomposed by Algorithm 1 has a low rankcompared to the corresponding OD flow number. In mostcases, columns of the deterministic traffic matrix (deter-ministic traffic time series) display significant diurnal pat-tern. However, there also exist several columns thatcontain traffic changes, which are quite different fromthe periodical traffic. This observation is quite obvious forthe Abilene traffic matrices X03 and X07.

As an example, Fig. 8 displays X03 and the decomposeddeterministic traffic matrix. Clearly, we observe a fewlong-lived traffic changes in the deterministic traffic ma-trix. These traffic changes affect a few columns (traffic timeseries) with long-lived growth or decline in terms of traffic

volume, and such growths and declines usually happenduring the same time intervals. In particular, we illustratein Fig. 9 eleven affected time series, which have the samesource router ‘‘WASH’’. As we can see, one traffic growthand ten traffic declines all happen during the time intervals[1150; 1450], which share the same starting and endingtime. In fact, more than 20 time series in the deterministictraffic matrix are significantly affected, but their sourceand destination routers do not present clear distributionlaws. These traffic changes have not been reported in theprevious studies. Therefore, it seems that the deterministictraffic matrix may contain non-periodical traffic changes,which are usually combinations of long-lived trafficgrowths and declines during the same time intervals.These changes can hardly be judged as any of the wellknown volume anomalies such as DoS/DDoS, flash crowd,alpha, outages and ingress/egress shift [9]. Since the Abi-lene traffic dataset only records OD flows’ coarse-gainedbyte counts during every five-minute time interval, andwe do not have more detailed information about the net-work when these traffic changes happen, it is difficult toexplain the reason for these long-lived traffic changes.We leave this for future work.

In addition, we illustrate in Fig. 10 the sum of the afore-mentioned eleven OD flow time series in X03, as well as thesum of the corresponding deterministic traffic series (ele-ven columns of the deterministic traffic matrix of X03),both with the same source router ’’WASH’’. We can see thatthe sum of the OD flows contains some short-lived largetraffic growths during the time intervals [1150, 1450],while these needle-like traffic growths cannot be observedin the sum of the deterministic traffic series. In fact, the lat-ter sum presents typical periodical pattern during thewhole week. This shows that, although individual deter-

Fig. 8. The traffic matrix X03 (left) and the associated deterministic traffic matrix (right).

1000 1100 1200 1300 1400 1500 1600 1700

0

1

2

3

4

5x 107

Time Step

Volu

me

/ kby

tes

Fig. 9. Eleven columns of X03 (deterministic traffic time series) with the same source router ‘‘WASH’’.

Fig. 7. The decomposition result of Y01 by Relaxed PCP.


200 400 600 800 1000 1200 1400 1600 1800 20000

2

4

6

8

10

12

14x 107

Time Step

Volu

me

/ kby

tes

Fig. 10. The sum of eleven OD flow time series (blue) and the sum of the corresponding deterministic traffic series (red), both with the same source router‘‘WASH’’. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

500 1000 1500 2000

0

5

10

15

20

x 105

Time Step

Volu

me

/ kby

tes

original trafficdeterministic traffic

500 1000 1500 2000

0

5

10

15

20

x 105

Time Step

Volu

me

/ kby

tes

noise trafficanomaly traffic

Fig. 11. The decomposition of OD flow time series No. 50 in X01. Blue: original traffic; Red: deterministic traffic; Black: anomaly traffic; Green: noise traffic.(For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

500 1000 1500 2000−2

−1

0

1

2

3

4

5

6

7x 106

Time Step

Volu

me

/ kby

tes

original trafficdeterministic traffic

500 1000 1500 2000−2

−1

0

1

2

3

4

5

6

7x 106

Time Step

Volu

me

/ kby

tes

noise trafficanomaly traffic

Fig. 12. The decomposition of OD flow time series No. 51 in X01. Colors indicate the same classes of traffic as that in Fig. 11.


ministic traffic series with the same source router may con-tain significant traffic changes, the sum of them tends toshow expected patterns. As a result, if we consider the total

network traffic with the source router ‘‘WASH’’ (which isthe sum of eleven OD flows), the anomaly traffic compo-nent can be well decomposed by the Relaxed PCP method.

100 105 1010100

102

104

106

108

Mean Volumes of OD Flows / kbytes

Varia

nces

of N

oise

Tra

ffic

/ kby

tes

Abilene

b1=4, c1=0.6b2=4, c2=0.9

100 105 1010100

102

104

106

108

Mean Volumes of OD Flows / kbytes

Varia

nces

of N

oise

Tra

ffic

/ kby

tes GEANT

b1=4, c1=0.5b2=4, c2=0.9

Fig. 13. The relationship between variance of the noise traffic and the mean volume of the corresponding OD flow. Left: the Abilene networks; Right: theGEANT networks.


5.2. Some features of the noise traffic matrix

5.2.1. The proportion of noise traffic in different OD flowsAs can be seen from Table 3 in Section 4, noise traffic

matrices contribute a small proportion of the total networktraffic. However, we observe that the ratios of the noisetraffic to the total traffic vary in different OD flows. For in-stance, Figs. 11 and 12 illustrate decompositions of two ODflows in the Abilene traffic matrix X01, namely OD flow No.50 and No. 51, respectively. More specifically, for OD flowNo. 50, the total1 traffic time series (blue) is mainly con-tributed by the deterministic traffic time series (red) andthe anomaly traffic time series (black), and the noise traffictime series (green) has much smaller average magnitude.Therefore, we conclude that the noise traffic is not animportant component in OD flow No. 50. In fact, this isthe case for most of the OD flows.

However, a small number of OD flows have differentdecomposition results. For example, for OD flow No. 51,the noise traffic time series has quite large average magni-tude compared to the original traffic time series, thereforethe noise traffic becomes a significant component for thisOD flow hence should not be neglected in the analysis.Actually, this OD flow contains large-amplitude oscilla-tions, which is not a common feature for all the OD flows,and it should be classified as the noise traffic in the network.

To summarize, although noise traffic time series are usu-ally small in magnitude, they cannot be neglected in theanalysis of a few OD flows which contain large oscillations.

5.2.2. The variance of noise traffic time seriesThe energy (variance) of the noise traffic may vary sig-

nificantly in different OD flows. Suppose that X is a trafficmatrix, and we compose it as X = A + E + N by Algorithm1. For each OD flow time series Xj (one column of X), weare interested in the relationship between variance Var(Nj)of the noise traffic (which is estimated by the standard

1 For interpretation of color in Figs. 1–14, the reader is referred to theweb version of this article.

deviation of noise traffic time series Nj) and statistics of Xj.For example, Fig. 13 illustrates the relationship betweenVar(Nj) and mean (Xj). Specifically, for each data point inthe figure, the horizontal axis represents the mean valueof an OD flow time series, and the vertical axis representsthe variance of the noise traffic of the same OD flow. Herewe have analyzed all the OD flows in our datasets, wheretime series of the same OD flow in different weeks are con-sidered as different data points. Therefore, we have121 � 8 = 968 data points for the Abilene dataset, and1870 for the GEANT dataset.

It is clear from Fig. 13 that there is a strong positive cor-relation between the mean volume of OD flows and the var-iance of the corresponding noise traffic. In the log–log plot,the distribution of the data points follows a weak linearrelationship, and such relationship is more noticeable forthe Abilene dataset. Therefore, it is reasonable to assumethat in most cases the variance of the noise traffic of anOD flow can be approximated by a power function of themean volume of the OD flow, which can be written as

VarðNjÞ � b meanðXjÞc; ð21Þ

where b and c are some positive parameters. Notice thatthere exist many mathematical methods for the estimationof the parameters b and c; However, this is beyond thescope of the current study and we leave it for future work.Instead, we propose empirical bounds for the variance ofthe noise traffic for the two datasets, which are two param-eter pairs (b1, c1) and (b2, c2) satisfying

b1 meanðXjÞc16 VarðNjÞ 6 b2 meanðXjÞc2

: ð22Þ

As labeled in Fig. 13, for the Abilene dataset, the choicesb1 = b2 = 4, c1 = 0.6 and c2 = 0.9 seem to work well formost of the data points except a few outliers; for theGEANT dataset, b1 = b2 = 4, c1 = 0.5 and c2 = 0.9 are thereasonable choices.

In addition, we have also analyzed the relationships be-tween the variance of the noise traffic and several otherstatistics of the corresponding OD flow, such as the l2-norm, the median value, and the variance of the OD flow.

20 40 60 80 100 120

104

106

OD Flows ID

Noi

se T

raffi

c Va

rianc

e X01X02

20 40 60 80 100 120

104

106

OD Flows ID

Noi

se T

raffi

c Va

rianc

e X03X04

20 40 60 80 100 120

104

106

OD Flows ID

Noi

se T

raffi

c Va

rianc

e X07X08

Fig. 14. The pair of variances of the noise traffic for two consecutive weeks.


For all of them, we have observed the positive correlationbetween the two as well, but not as significant as thecorrelation between the noise variance and the mean vol-ume of the flow. In this case, it is less obvious to find an ex-plicit mathematical model for the correlation as Eq. (21).

Finally, it is interesting to study the temporal stabilityof variances of the noise traffic during different weeks.Suppose we have two traffic matrices that record thetraffic volume of the same network for two consecutiveweeks. By decomposing them using the APG algorithmindependently, we first obtain two noise traffic matrices,one for each week. Recall that each column vector of anoise traffic matrix represents the noise traffic time seriesof an OD flow. For each OD flow, we then compare thepair of variances of the corresponding noise traffic forthe two consecutive weeks. Specifically, we choose trafficmatrices for three pairs of consecutive weeks in the Abi-lene dataset:

� X01 (from March 1, 2004) and X02 (from March 8,2004).� X03 (from April 2, 2004) and X04 (from April 9, 2004).� X07 (from May 8, 2004) and X08 (from May 15, 2004).

Since each Abilene traffic matrix contains 121 OD flows(columns), the variances of the corresponding noise trafficinclude 121 data points. Fig. 14 shows the variances of the

noise traffic for the six chosen traffic matrices (121 �6 = 726 data points in total).

We observe the following features in Fig. 14: (1) Foreach traffic matrix, the variances of the noise traffic of dif-ferent OD flows may vary significantly; (2) For trafficmatrices of two consecutive weeks, the variances of thenoise traffic of the same OD flow are similar in most cases;(3) For two traffic matrices that do not represent two con-secutive weeks, the variances of the noise traffic of thesame OD flow may vary significantly (Take the same ODflows in X01 and X03 as examples).

However, our observations are not sufficient to con-clude that the variance of the noise traffic is strictly stable,the reason being that: (1) The analysis above is notcomprehensive enough as our datasets do not containtraffic matrices for many consecutive weeks; (2) There alsoexist a few variance pairs in which one is obviously differ-ent from the other, although they correspond to the sameOD flow for the two consecutive weeks. We plan toinvestigate the unstable variances of the noise traffic infuture work.

6. Conclusions

In this paper, we focus on the structural analysis ofthe traffic matrix that has been polluted by large volume


anomalies. We first demonstrate that the PCA-basedanalysis method performs poorly for polluted trafficmatrices. Next, we propose a new decomposition modelfor the traffic matrix that is more practical in the analysisof empirical network traffic data, and study thedecomposition problem using the relaxed principalcomponent pursuit method. Finally, we discuss theexperimental results in more details for the deterministicand noise traffic matrix. The major findings in this paperare:

1. We experiment the classical PCA method for trafficmatrix analysis. Different from the previous works[4,5], the traffic matrices that we analyze contain somelarge volume anomalies. In this case, our results showthat the eigenflow classification is neither completenor orthogonal, which suggests that PCA is unable todecompose accurately the traffic matrix into the normaltraffic matrix and the large anomaly traffic matrix.

2. Based on the empirical network traffic data, a newdecomposition model for the traffic matrix is proposedin Section 3.1. To the best of our knowledge, it is a novelway of formalizing the structure of the traffic matrix,which also provides a simple view of the traffic matrixanalysis problem. Moreover, this model helps explainintuitively some of the limitations of the classical PCAmethod in our experiments.

3. According to the decomposition model of the trafficmatrix, we show that the problem of traffic matrixdecomposition is equivalent to the robust PCA problem,which has been extensively studied recently. Based onthe relaxed principal component pursuit method andthe accelerated proximal gradient algorithm, wedevelop an algorithm for the decomposition of trafficmatrices that may contain large volume anomalies.The experimental results demonstrate the efficiencyand flexibility of the proposed algorithm.

4. We discuss some detailed features of the deterministictraffic matrix and the noise traffic matrix. Firstly, weobserve that the deterministic traffic matrix maycontain non-periodical traffic changes, which are usu-ally combinations of long-lived traffic growths anddeclines during the same time intervals. Secondly,although the noise traffic matrix contributes a smallproportion of the total network traffic in general, theratios of the noise traffic to the total traffic may varysignificantly in different OD flows. Thirdly, we findthat there exists significant positive correlationbetween the mean volume of OD flow and the vari-ance of the noise traffic time series, and we furthertest the temporal stability of the variance of the noisetraffic.

To summarize, this paper is a preliminary study onapplying the Relaxed PCP method for network traffic anal-ysis, whose efficiency and flexibility have been demon-strated in the experimental results. For future work, weplan to further optimize the Relaxed PCP method to makeit adaptable to the network traffic data, and explore itsapplications in volume anomaly detection and data clean-ing for the polluted traffic matrix.

Acknowledgments

We thank Professor Jinping Sun at Beihang Universityfor his advices on the earlier draft of this paper. We alsothank Professor Pascal Frossard at Ecole PolytechniqueFédérale de Lausanne (EPFL) for his help with the revisedversion of the paper. Finally, we are grateful to the anony-mous reviewers for their constructive suggestions on thepaper. X. Dong is supported by Nokia Research Center(NRC), Lausanne, Switzerland.

Appendix A

In this appendix, we present the APG algorithm for traf-fic matrix decomposition.

Algorithm 1. APG for traffic matrix decomposition

Input: the traffic matrix X 2 Rt�p.1. Compute the regularization parameter k using (17).2. Compute the Lagrangian parameter l using (19)

with r = 1.3. For each OD flow time series Xj, estimate the

variance rj of its Gaussian noise component using(20).

4. Let X = X/diag{rj}.5. Let A0 = A�1 = 0, E0 = E�1 = 0, t0 = t�1 = 1,

SA1 ¼ SE

1 ¼ 1 and k = 0.6. while not converged do

YAk ¼ Ak þ tk�1�1

tkðAk � Ak�1Þ;

YEk ¼ Ek þ tk�1�1

tkðEk � Ek�1Þ;

GAk ¼ YA

k � 12 YA

k þ YEk � X

� �;

GEk ¼ YE

k � 12 YA

k þ YEk � X

� �;

ðU; S;VÞ ¼ SVD GAk

� �;

Akþ1 ¼ USl2½S�VT ;

Ekþ1 ¼ Skl2

GEk

h i;

tkþ1 ¼1þ

ffiffiffiffiffiffiffiffiffiffi4t2

kþ1p

2 ;

SAkþ1 ¼ 2 YA

k � Ak

� �þ Akþ1 þ Ekþ1 � YA

k � YEk

� �;

SEkþ1 ¼ 2 YE

k � Ek

� �þ Akþ1 þ Ekþ1 � YA

k � YEk

� �;

k = k + 1;end while

7. Let X = X � diag{rj}.Output:A = Ak � diag{rj}; E = Ek � diag{rj}; N = X � A � E.

In Algorithm 1, Se½�� : Rt�p ! Rt�p represents the soft-thresholding operator with parameter e > 0. 8X 2 Rt�p,Se½X� 2 Rt�p and it satisfies

Se½X�ði; jÞ ¼Xði; jÞ � e if Xði; jÞ > eXði; jÞ þ e if Xði; jÞ < �e0 otherwise

8><>: : ð23Þ


We choose the stopping criterion of Algorithm 1 as the onedefined in [18], which terminates the iterations when the

quantity SAkþ1

�� 2

Fþ SE

kþ1

�� 2

Fis less than a pre-defined toler-

ance parameter.

References

[1] S. Uhlig, B. Quoitin, J. Lepropre, S. Balon, Providing publicintradomain traffic matrices to the research community, SIGCOMMComput. Commun. Rev. 36 (1) (2006) 83–86.

[2] Abilene data, <http://www.cs.utexas.edu/yzhang/research/AbileneTM/>.

[3] Geant data, <http://totem.info.ucl.ac.be/dataset.html>.[4] A. Lakhina, K. Papagiannaki, M. Crovella, C. Diot, E.D. Kolaczyk, N.

Taft, Structural analysis of network traffic flows, SIGMETRICSPerform. Eval. Rev. 32 (1) (2004) 61–72.

[5] A. Lakhina, M. Crovella, C. Diot, Diagnosing network-wide trafficanomalies, SIGCOMM Comput. Commun. Rev. 34 (4) (2004) 219–230.

[6] L. Huang, X. Nguyen, M. Garofalakis, M. Jordon, A. Joseph, N. Taft, In-network PCA and anomaly detection, in: Proceedings of NeuralInformation Processing Systems (NIPS) 2006, December 2006.

[7] Y. Zhang, Z. Ge, A. Greenberg, M. Roughan, Network anomography,in: Proceedings of the 5th ACM SIGCOMM conference on InternetMeasurement (IMC ’05), USENIX Association, Berkeley, CA, USA,2005, pp. 317–330.

[8] A. Soule, A. Lakhina, N. Taft, K. Papagiannaki, K. Salamatian, A. Nucci,M. Crovella, C. Diot, Traffic matrices: balancing measurements,inference and modeling, SIGMETRICS Perform. Eval. Rev. 33 (1)(2005) 362–373.

[9] A. Soule, K. Salamatian, N. Taft, Combining filtering and statisticalmethods for anomaly detection, in: Proceedings of the 5th ACMSIGCOMM conference on Internet Measurement (IMC ’05), USENIXAssociation, Berkeley, CA, USA, 2005, pp. 331–344.

[10] H. Ringberg, A. Soule, J. Rexford, C. Diot, Sensitivity of PCA for trafficanomaly detection, SIGMETRICS Perform. Eval. Rev. 35 (1) (2007)109–120.

[11] Y. Ohsita, S. Ata, M. Murata, Identification of attack nodes fromtraffic matrix estimation, IEICE Trans. Commun. E90-B (10) (2007)2854–2864.

[12] B. Rubinstein, B. Nelson, L. Huang, A. Joseph, S. Lau, S. Rao, N. Taft, J.Tygar, ANTIDOTE: understanding and defending against poisoning ofanomaly detectors, in: Proceedings of the 9th ACM SIGCOMMconference on Internet measurement conference (IMC ’09), ACM,New York, NY, USA, 2009, pp. 1–14.

[13] E. Candes, X. Li, Y. Ma, J. Wright, Robust principal componentanalysis?, Arxiv preprint, arXiv:0912.3599, 2009.

[14] Z. Zhou, X. Li, J. Wright, E. Candes, Y. Ma, Stable principal componentpursuit, in: Proceedings of IEEE International Symposium onInformation Theory (ISIT) 2010, June 2010.

[15] Y. Peng, A. Ganesh, J. Wright, W. Xu, Y. Ma, RASL: robust alignmentby sparse and low-rank decomposition for linearly correlatedimages, in: Proceedings of IEEE Conference on Computer Visionand Pattern Recognition (CVPR) 2010, June 2010.

[16] K. Min, Z. Zhang, J. Wright, Y. Ma, Decomposing background topicsfrom keywords using principal component pursuit, in: Proceedingsof ACM International Conference on Information and KnowledgeManagement (CIKM) 2010, October 2010.

[17] J. Wright, A. Ganesh, S. Rao, Y. Peng, Y. Ma, Robust principalcomponent analysis: exact recovery of corrupted low-rank matricesby convex optimization, in: Proceedings of Neural InformationProcessing Systems (NIPS) 2009, December 2009.

[18] Z. Lin, A. Ganesh, J. Wright, L. Wu, M. Chen, Y. Ma, Fast convexoptimization algorithms for exact recovery of a corrupted low-rankmatrix, in: Proceedings of IEEE 3rd International Workshop onComputational Advances in Multi-Sensor Adaptive Processing(CAMSAP), December 2009.

[19] G. Stewart, On the early history of the singular value decomposition,SIAM Rev. 35 (4) (1993) 551–566.

[20] E. Candes, M. Wakin, An introduction to compressive sampling, IEEESignal Process. Mag. 25 (2) (2008) 21–30.

[21] E. Candes, B. Recht, Exact matrix completion via convexoptimization, Found. Comput. Math. 9 (6) (2009) 717–772.

[22] E. Candes, Y. Plan, Matrix completion with noise, Proc. IEEE 98 (6)(2010) 925–936.

[23] J. Cai, E. Candes, Z. Shen, A singular value thresholding algorithm formatrix completion, SIAM J. Optimiz. 20 (4) (2010) 1956–1982.

[24] D. Donoho, Compressed sensing, IEEE Trans. Inform. Theory. 52 (4)(2006) 1289–1306.

[25] S. Chen, D. Donoho, M. Saunders, Atomic decomposition by basispursuit, SIAM Rev. 43 (1) (2001) 129–159.

[26] D. Donoho, De-noising by soft-thresholding, IEEE Trans. Inform.Theory 41 (3) (1995) 613–627.

[27] D. Donoho, I. Johnstone, Ideal spatial adaptation by waveletshrinkage, Biometrika 14 (6) (1994) 425–455.

[28] I. Cunha, F. Silveira, R. Oliveira, R. Teixeira, C. Diot, Uncoveringartifacts of flow measurement tools, in: Proceedings of Passive andActive Measurement Conference. Seoul, Korea, April 2009.

Zhe Wang is currently a Ph.D. candidate inthe School of Computer Science and Engi-neering, Beihang University. He received hisB.S degrees in Mathematics from the School ofMathematics and System Science, BeihangUniversity, in 2007. His research interestsinclude network measurement and analysis,parallel computing.

Kai Hu is currently an Associate Professor inthe School of Computer Science and Engineerof Beihang University. He received Ph.D.degree in 2001 from the School of ComputerScience and Engineering, Beihang university.From 2001 to 2003, He was a Research Fellowin the School of Computer Engineering,Nanyang Technological University, Singapore.Dr. Kai Hu’s research interests include highperformance parallel computing, networktechnology, and embedded system.

Ke Xu is currently a Professor in the School ofComputer Science and Engineer of BeihangUniversity. He received a Ph.D. degree incomputer science from Beihang University in2000. His main research interests includealgorithm and computational complexity,complex networks, network measurementand data mining.

http://www.cs.utexas.edu/yzhang/research/AbileneTM/

http://www.cs.utexas.edu/yzhang/research/AbileneTM/

http://totem.info.ucl.ac.be/dataset.html


Baolin Yin is currently a Professor in theSchool of Computer Science and Engineer ofBeihang University. He graduated from theDepartment of Electronic Engineering, Bei-hang University, in 1973, and received hisPh.D. degree from the Department of ArtificialIntelligence, the University of Edinburgh, in1984. His research interests include distrib-uted applications and image processing.

Xiaowen Dong has been working as a PhDstudent in the Signal Processing Laboratories(LTS4 / LTS2) at Swiss Federal Institute ofTechnology (EPFL), Lausanne, Switzerlandsince October 2009. His research interestsinclude wavelets, spectral graph theory, andtheir applications to mobile and online socialnetwork analysis.

Before joining EPFL, Mr. Dong received hisB.Eng. degree in Information Engineering fromZhejiang University, Hangzhou, China and hisM.Sc. degree in Signal Processing from Insti-

tute for Digital Communications, The University of Edinburgh, Edinburgh,UK.

Documents

Structural analysis of network traffic matrix via … · Trafﬁc matrix structural analysis ... method for trafﬁc matrix estimation, and the experiments demonstrated that their