Anomaly prediction in network traffic using adaptive Wiener filtering and ARMA modeling

Anomaly Prediction in Network Traffic using Adaptive Wiener Filtering and ARMA Modeling

Mehmet Celenk, Thomas Conley, James Graham, and John Willis School of Electrical Engineering and Computer Science Ohio University, Athens, OH 45701 USA

{celenk, conleyt, jg193404, jw174304} @ohio.edu

Abstract— Fast and efficient detection of anomalies is essential for maintaining a robust and secure network. This research presents a method of anomaly detection based on adaptive Wiener filtering of noise followed by ARMA modeling of network flow data. We dynamically calculate noise and traffic signal statistics using network-monitoring metrics for traffic features such as average port, high port, server ports, and peered ports. The underlying approach is tested on near-real-time Internet traffic in the wide-area network (WAN) of Ohio University. The average port feature is determined to be the most informative measure in the estimation process. High port, server ports, and peered ports are used for confirmation of the anomaly detection result. We empirically determine that most of the network features obey Gaussian-like distributions. Experiments reveal that the method is highly effective in predicting anomalies in network traffic flow and preventing any hazard that they may cause.

Keywords—Network anomalies, network security, Wiener filtering, ARMA modeling, adaptive digital anomaly predictor, majority voting

I. INTRODUCTION Firewalls and intrusion detection devices are the primary

way of protecting today’s modern enterprise networks from a host of network anomalies such as viruses, worms, scanners, and denial of service from botnets. The defenses rely on detection of attacks after they have begun affecting the targeted network. Existing methods are able to identify specific packets, which match a known pattern or originate from a known location but these signature-based systems fail to detect unknown anomalies. An anomaly might be old an attack that has changed in some way, to avoid detection, or it could be a completely new form of attack. Significant research has been devoted to the task of identifying network anomalies using methods from statistical signal analysis and pattern recognition theory. Relevant work includes papers by Kwitt and Hoffman [1] and Shen et al. [2]. Their papers dealt with anomaly detection using a robust PCA (principal component analysis) and metrics of aggregated network behavior. Additionally, the approach undertaken by Karasaridis et al. [11] deals primarily with the detection of botnets. The work described in [10] considers the detection of network intrusions in covariance space using pattern recognition methods. Sang and Li [12] describe how far into the future one can predict network traffic by employing ARMA (auto-regressive moving average) as a model. Similarly, Cho et al. [6] describes a method in which near-real-time network traffic data can be measured and filtered utilizing the Patricia tree and LRU (least recently used)

replacement policy. In the work of Pang et al. [9], they examine known anomalies and possible ways for detecting them by filtering data to reduce load on the system. Feldman et al. [3] proposed a cascade-based approach that dealt with multi-fractal behavior in network traffic. The paper also describes a way of detecting network problems using their system. Yurcik and Li [4] and Plonka [5] demonstrate how using NetFlows and FlowScans are becoming a more efficient way to monitor network traffic. Gong [8] suggests methods in which the NetFlows can be used to detect worms and other types of intrusion into a network.

Although network flow data is easily attained and contains a useful set of features, there is often too much information to maintain for a long time, so it is essential to analyze this data “on-the-fly” or in near-real-time mode. None of the studies described above, provides a method of predicting an attack before it occurs. In this work, we aim to predict network anomalies before they are detectable by existing methods. To this end, we statistically analyze network flow data and apply Weiner filtering to reduce normal traffic. This, in turn, helps us identify the signal corresponding to network anomalies in the selected feature measurement, which characterizes the network flow in that dimension. By estimating the auto-correlation function of normal traffic, the ARMA (auto-regressive moving average) predictor [13] is devised using the well-known Yule-Walker regression. [14]

In the following sections we describe the approach and the results achieved in our attempt to predict network traffic anomalies. Section 2 is devoted to the methods undertaken, and describes the necessary background and structure of the proposed algorithm. The experimental test-bed and computer results are described in detail in Section 3. And remaining discussion is devoted to conclusions, future work, and potential applications.

II. DESCRIPTION OF THE OVERALL APPROACH In this work, propose a method of defense based on the

prediction of anomalies before they can have their adverse affect. The mechanism is intended to function out-of-band on a network connection carrying massive amounts of traffic and a large number of connections. Research was conducted using network data captured on the Internet connection of a heavily populated /16 and /18 network using a network monitoring tool Argus [7], which provides real time measurements for flow, connectivity, capacity, demand, loss, delay, and jitter on a per transaction basis. [5] This traffic information can be simply

https://www.researchgate.net/publication/2333619_Data_network_as_cascades_Investigating_the_multifractal_nature_of_Internet_WAN_traffic?el=1_x_8&enrichId=rgreq-d18d76f9-4aa9-4dea-a0cc-4218c4ce9069&enrichSource=Y292ZXJQYWdlOzIyNDQwMDE2MztBUzo5NzU0NTE1Mzk0MTUxN0AxNDAwMjY3OTg1OTY0

https://www.researchgate.net/publication/228654879_Internet_security_visualization_case_study_Instrumenting_a_network_for_NetFlow_security_visualization_tools?el=1_x_8&enrichId=rgreq-d18d76f9-4aa9-4dea-a0cc-4218c4ce9069&enrichSource=Y292ZXJQYWdlOzIyNDQwMDE2MztBUzo5NzU0NTE1Mzk0MTUxN0AxNDAwMjY3OTg1OTY0

https://www.researchgate.net/publication/3942904_An_aggregation_technique_for_traffic_monitoring?el=1_x_8&enrichId=rgreq-d18d76f9-4aa9-4dea-a0cc-4218c4ce9069&enrichSource=Y292ZXJQYWdlOzIyNDQwMDE2MztBUzo5NzU0NTE1Mzk0MTUxN0AxNDAwMjY3OTg1OTY0

https://www.researchgate.net/publication/234785680_Digital_Signal_Processing_of_Random_Signals_Theory_and_Methods?el=1_x_8&enrichId=rgreq-d18d76f9-4aa9-4dea-a0cc-4218c4ce9069&enrichSource=Y292ZXJQYWdlOzIyNDQwMDE2MztBUzo5NzU0NTE1Mzk0MTUxN0AxNDAwMjY3OTg1OTY0

https://www.researchgate.net/publication/2946564_Characteristics_of_Internet_Background_Radiation?el=1_x_8&enrichId=rgreq-d18d76f9-4aa9-4dea-a0cc-4218c4ce9069&enrichSource=Y292ZXJQYWdlOzIyNDQwMDE2MztBUzo5NzU0NTE1Mzk0MTUxN0AxNDAwMjY3OTg1OTY0

https://www.researchgate.net/publication/224289094_Anomaly_Detection_Based_on_Aggregated_Network_Behavior_Metrics?el=1_x_8&enrichId=rgreq-d18d76f9-4aa9-4dea-a0cc-4218c4ce9069&enrichSource=Y292ZXJQYWdlOzIyNDQwMDE2MztBUzo5NzU0NTE1Mzk0MTUxN0AxNDAwMjY3OTg1OTY0

https://www.researchgate.net/publication/234759472_Wide-scale_botnet_detection_and_characterization?el=1_x_8&enrichId=rgreq-d18d76f9-4aa9-4dea-a0cc-4218c4ce9069&enrichSource=Y292ZXJQYWdlOzIyNDQwMDE2MztBUzo5NzU0NTE1Mzk0MTUxN0AxNDAwMjY3OTg1OTY0

https://www.researchgate.net/publication/220900416_FlowScan_A_Network_Traffic_Flow_Reporting_and_Visualization_Tool?el=1_x_8&enrichId=rgreq-d18d76f9-4aa9-4dea-a0cc-4218c4ce9069&enrichSource=Y292ZXJQYWdlOzIyNDQwMDE2MztBUzo5NzU0NTE1Mzk0MTUxN0AxNDAwMjY3OTg1OTY0

https://www.researchgate.net/publication/220900416_FlowScan_A_Network_Traffic_Flow_Reporting_and_Visualization_Tool?el=1_x_8&enrichId=rgreq-d18d76f9-4aa9-4dea-a0cc-4218c4ce9069&enrichSource=Y292ZXJQYWdlOzIyNDQwMDE2MztBUzo5NzU0NTE1Mzk0MTUxN0AxNDAwMjY3OTg1OTY0

https://www.researchgate.net/publication/222406222_A_predictability_analysis_of_network_traffic?el=1_x_8&enrichId=rgreq-d18d76f9-4aa9-4dea-a0cc-4218c4ce9069&enrichSource=Y292ZXJQYWdlOzIyNDQwMDE2MztBUzo5NzU0NTE1Mzk0MTUxN0AxNDAwMjY3OTg1OTY0

measured used or it can be enhanced with specific knowledge about the network. For instance, when measuring port usage an engineer may weight some ports higher than others. This heuristic information is factored into the weighted parameters of the system.

In general, a network anomaly tracking metric is a p-dimensional vector,

€

x = [xi , x2,..., xp ]T where T denotes

matrix transposition and p is set at the discretion of the network engineers. We illustrate this in the following sections by describing adaptive anomaly detection on a single dimension xi using Wiener filtering and ARMA modeling, and extend this to the set

€

X using weighted majority voting. It is preferable to

predict an anomaly and prevent the attack before its onset rather than to wait for the attack to have an adverse affect. We show that the network flow features mentioned earlier can be used to predict network anomalies at the reconnaissance or preparatory phase, or very early at the onset of the attack. Our research focuses on a limited set of features (see Table I) pertaining to overall port usage and throughput. Port usage is considered a primary indicator of type of activity on the network [9] and throughput statistics are used to measure the magnitude of any network event. These characteristics were purposefully chosen because they are not highly specific and do not target specific addresses or ranges. We theorize that by concentrating on discriminating, but general, features we will be able to predict an anomaly without a priori knowledge of any specific activity.

A. Adaptive digital anomaly predictor (ADAP) First, we consider a single component,

€

xi , of the p-dimensional traffic feature set as defined above. This measured input is modeled as a linear combination of normal traffic signal,

€

si(n) and anomalous traffic noise,

€

η(n) , expressed mathematically by

€

xi(n) = si(n)+η(n) (1)

The approach undertaken herein is to extract the normal network flow (

€

si ) and use it to predict anomalies, as shown in Figure 1.

Figure 1. Block diagram of adaptive digital anomaly predictor (ADAP)

The Wiener filter removes the noise from the signal and outputs the estimate of the normal traffic flow. To achieve a balanced estimate, we adjust the window size and associated coefficients in the Wiener filter the ARMA predictor, and thereby achieve better results. The feedback control channel shown by a dashed-line in Figure 1 lets the algorithm adapt to a changing network signal waveform. This is made in accordance with the adaptable Wiener filter implementation method proposed in [16] as

€

ˆ s i(n) = m ˆ s i(n)+

σ ˆ s i2 (n)

σ ˆ s i2 (n)+ση

2 (n)⋅ xi (n)−m ˆ s i

(n)( ) (2)

where

€

ˆ s i(n) is the Wiener filter output in discrete time domain,

€

m ˆ s i(n) is the mean value of the normal flow, and

€

σ ˆ s i2 ,

€

ση2 are the respective variances of the measured traffic

and anomaly computed in a window of size

€

M . They are calculated as

€

m ˆ s i(n) = xi(n)

k=n−M

n+ M

∑

(3)

€

ˆ σ ̂ s i2 (n) =

ˆ σ xi

2 (n)− ˆ σ η2 (n), if ( ˆ σ xi

2 (n)− ˆ σ η2 (n)) > 0

0, otherwise

(4)

€

σ xi

2 (n) =1

(2 ⋅M +1)(xi(k)−m ˆ s i

(k))2

k=n−M

n+m

∑ (5)

The estimated signal

€

ˆ s i(n) is then applied to the ARMA unit, which estimates the next value

€

ˆ s i(n +1) of

€

ˆ s i(n) . This is similar to what has been done in [12] with the exception that, they predict only network traffic but ignore any anomaly that may exist at the time of measurement. Their prediction is based on the assumptions of stationary Gaussian white noise with unit variance and the Gaussian nature of the network traffic without any empirical justification. On the other hand, in this research we have no restriction on network flow, nor do we have restriction on noise. Noise is considered to be the combination of network anomalies and the traditional white noise as described in [12].

ARMA starts the process of estimation by calculating the auto-correlation function for

€

ˆ s i(n) as in equation (6). The auto-correlation function is then used in the 3rd order predictor as in equation (7).

https://www.researchgate.net/publication/2946564_Characteristics_of_Internet_Background_Radiation?el=1_x_8&enrichId=rgreq-d18d76f9-4aa9-4dea-a0cc-4218c4ce9069&enrichSource=Y292ZXJQYWdlOzIyNDQwMDE2MztBUzo5NzU0NTE1Mzk0MTUxN0AxNDAwMjY3OTg1OTY0

https://www.researchgate.net/publication/200132376_Two-dimensional_signal_and_image_processing?el=1_x_8&enrichId=rgreq-d18d76f9-4aa9-4dea-a0cc-4218c4ce9069&enrichSource=Y292ZXJQYWdlOzIyNDQwMDE2MztBUzo5NzU0NTE1Mzk0MTUxN0AxNDAwMjY3OTg1OTY0



€

ˆ s i(n +1) =α1 ⋅ ˆ s i(n)+α2 ⋅ ˆ s i (n −1)+α3 ⋅ ˆ s i (n − 2) (7)

Here,

€

αi represents the predictor coefficient and

€

Rˆ s i(i)

denotes the auto-correlation function value at

€

i .

€

Pη(ω) is the power spectrum (i.e., the Fourier transform) of the auto-correlation of the network anomalous signal

€

η(n) , and

€

Pˆ s i(ω) is the power spectrum (i.e., the Fourier transform) of

the auto-correlation of the normal network flow

€

ˆ s i(n) . We measure these stochastic signal signatures is carried out using

€

Rη(n) =1Μ

η(k)n=−∞

∞

∑ ⋅η∗(k − n) (8)

€

Rˆ s i(n) =

1Μ

ˆ s i(k)n=−∞

∞

∑ ⋅ ˆ s i∗(k − n) (9)

€

Pη(ω) = Rη (n) ⋅n=−∞

∞

∑ e− jωn (10)

€

Pˆ s i(ω) = Rˆ s i

(n)n=−∞

∞

∑ ⋅ e− jωn (11)

where equations (8) and (9) represent the estimated auto-correlation functions of the noise signal

€

η(n) and the normal traffic signal

€

ˆ s i(n), respectively, and in a window of size

€

M , equations (10) and (11) represent the power spectrum of the noise signal

€

η(n) and normal traffic signal

€

ˆ s i(n), respectively.

Since direct measurement of the power spectrum is complex and costly, it is desirable to predict it using the periodogram approach described by [16] and summarized by

€

Pη(ω) =1Μ

N(ω) 2 (12)

€

Pˆ s i(ω) =

1Μ

ˆ S i(ω)2

(13)

where equations (12) and (13) represent the estimated power spectrums using periodograms of the noise

€

η(n) and the normal traffic signal

€

ˆ s i(n) , respectively, and M is the number of measurements.

Notice that the auto-correlation function is even; i.e.,

€

Rˆ s i(n + i) = Rˆ s i

(n − i), and has its maximum at the origin, that is,

€

Rˆ s i(0) ≥ Rˆ s i

(n) for all values of n. Using equations (6)

and (7), the value of

€

ˆ s i(n +1) is predicted by the ARMA and compared to measured signal

€

xi(n+1) as the output,

€

q(n+1) , of the ADAP function. Hence, we have

where

€

ε represents the prediction or estimation error associated with the autoregressive moving average unit. Equations (14) through (17) enable us to come up with a

decision predicate expressed as

• If

€

ˆ s i(n +1) = xi(n +1) , then

€

q(n +1) = 0; hence, no anomaly

• If

€

ˆ s i(n +1) = si(n +1) , then

€

q(n +1) =η(n +1); hence, anomaly predicted

• If

€

ˆ s i(n +1) ≠ si(n +1) , then

€

q(n +1) =η(n +1)+ ε ; hence, anomaly plus error predicted

for identifying any anomaly. This structured predicate extends the work of [12] by directly predicting near-real-time network attacks.

B. Adaptive digital anomaly predictor using multiple features We now extend the single ADAP functionality, described

above, to incorporate information from multiple features. It is important to consider a diverse feature set in order to detect signatures of complex anomalies. Malicious activities are specifically designed to spread out their effect over multiple features in order to avoid detection by narrowly focused detectors. A cyber-attack, for instance, may try to reduce its visibility by using a common, well known port and cause only a slight anomalous increase and no alarm. However, even the slightest increase in a single feature, when combined with anomalous activity in other features, can have a synergistic effect which makes possible the prediction of that anomaly. By using a multiple feature space, this research is able to tap the additive power of the rich network traffic feature set.

As a way of dealing complex anomalies without loosing generality, we use the feature set

€

X = [x1, x2 , x3, x4 ]

T,where

x1 is ‘average port, x2 is ‘high ports’, x3 is ‘server ports’, and x4 is ‘peered ports’, as a representative experiment. Since these individual characteristics may be correlated, this feature space can lead to a wrong conclusion, unless the underlying features are first uncorrelated and the resultant feature space has a Euclidean metric. This is achieved using well-known transformation methods such as Karhunen-Loeve (PCA or discrete Hotelling) [13]. Let

€

C X X be the cross-correlation

matrix for

€

X given by

€

Rˆ s i (n +1)Rˆ s i (n + 2)Rˆ s i (n + 2)

=

Rˆ s i (n + 0)Rˆ s i (n +1)Rˆ s i (n + 2)Rˆ s i (n−1)Rˆ s i (n + 0)Rˆ s i (n +1)Rˆ s i (n− 2)Rˆ s i (n−1)Rˆ s i (n + 0)

⋅

α1

α2

α3

(6)

€

q(n +1) = xi (n +1)− ˆ s i (n +1) (14)

€

q(n +1) = si (n +1)+η(n +1)− ˆ s i(n +1) (15)

€

q(n +1) = si(n +1)− ˆ s i(n +1)[ ] +η(n +1) (16)

€

q(n+1) = ε −η(n+1) (17)

https://www.researchgate.net/publication/200132376_Two-dimensional_signal_and_image_processing?el=1_x_8&enrichId=rgreq-d18d76f9-4aa9-4dea-a0cc-4218c4ce9069&enrichSource=Y292ZXJQYWdlOzIyNDQwMDE2MztBUzo5NzU0NTE1Mzk0MTUxN0AxNDAwMjY3OTg1OTY0


€

C X X =Ε

X ⋅ X T{ } =

Cx1x1Cx1x2

Cx1x3Cx1x4

Cx2x1Cx2x2

Cx2x3Cx2x4

Cx3x1Cx3x2

Cx3x3Cx3x4

Cx4 x1Cx4 x2

Cx4 x3Cx4 x4

(18)

where the diagonal elements are the auto-correlation functions of the features and off-diagonals are cross-correlations of respective feature pairs. Notice that

€

C X X is a symmetric square

matrix leading to a 4th order characteristic equation and results in 4 respective eigenvectors. By normalizing these eigenvectors we generate a 4 dimensional space in which the selected features are uncorrelated. Hence, the ADAP functon can filter features individually by adaptively computing the mean and variances as the predictor output starts to deviate significantly from actual measured values. Here we refer to the uncorrelated xi’s as yi, and the probability density function (pdf) of the uncorrelated version of

€

X , represented by

€

Y , is given by

€

p Y = py1⋅ py2

⋅ py3⋅ py4 (19)

By emperical data analysis, we have verified that each feature in

€

X is normally distributed. Hence, we write

€

p Y =12πσ ii=1

4

∏ ⋅ e−(yi −mi )

2

2σ i2

(20)

where mean mi and

€

σ i2 are the mean and variance of

€

yi . For the ease of implementation, we only consider auto-

correlation for the diagonal elements of the correlation matrix of (18). As a result, the anomaly prediction decision is made in the

€

[Rx1x1,Rx2x2

,Rx3x3,Rx4 x4

] space using a weighted majority-voting scheme as shown in Figure 2. The ADAP function processes each single feature in parallel and provides a prediction output. We present the overall result

€

A(n+1), as a linear combination of the individual channels with weights corresponding to the maximum value of each feature’s autocorrelation function. Thus

€

A(n+1) = q1(n+1) ⋅ Rx1(0)+ q2 (n+1) ⋅ Rx2

(0)+q3(n+1) ⋅ Rx3

(0)+ q4 (n+1) ⋅ Rx4(0)

(21)

If

€

A(n+1) exceeds a predetermined empirical anomaly threshold, then the activity represented by the feature set

€

X is deemed an anomaly. The system can then be directed to respond at the time instant of n+1.

Figure 2. Detecting network anomalies with ADAP and weighted majority voting.

The anomaly detector’s effectiveness is driven by a set of parameters chosen by the researchers. By fine-tuning the set of features, the Wiener filter and auto-correlation window sizes and the majority voting thresholds, we intend to find an optimal feature space in which characteristics are uncorrelated and yet still contain all the information associated with network traffic and attacks.

C. Normal density approximation for network traffic There has been considerable research in the statistical

analysis of network traffic data. However, previous work has assumed the data to be normally distributed without supporting this assumption [10]. In our observations, the Gaussian nature of the traffic is determined by the bell-shaped frequency histogram of the features used in this study. Additionally, we examine the periodogram graph for similarity with graphs of generated Gaussian data, with a matching mean and variance. The periodogram is also an indicator of correlation. [17,18] In addition to visual verification, the mean square error (MSE) between the measured and generated Gaussian shaped density is computed using

€

%MSE =(data(i)− norm(i))2

(mean(data))2 (22)

where data(i) denotes the normalized histogram of the feature

€

xi , norm(i) is the value of the respective normal density, and mean(data) is the mean value of the histograms all the features used. We have experimentally shown that the random Gaussian nature of these features does not adversely affect their ability to discriminate network anomalies. In fact, the feature, which shows the lowest MSE, as shown in Table I, is “average port” which turns out to be the most discerning data feature. The average port is an average of all the port numbers seen on the network and should not change drastically under normal conditions. On the other hand, “peer factor” is the feature with the highest MSE in terms of Gaussian density. This is a measurement of connections to the same port on both the source and destination side, which is very unlikely to happen at random. This indicates a specific a priori agreement between the two peer computers. The probability of two computers picking the same port at random is 1/(65535)2 =2.33x10-10.

TABLE I. MSE SORTED FEATURE SET USED FOR NORMAL DENSITY APPROXIMATION AND ANOMALY DETECTION.

% MSE Measured Feature Description 0.014403 Average port Average port number as

indicator of usage 0.055449 High-ports Percentage of port numbers >

10000 0.105316 Total ports Number of ports seen 0.105888 Flow records Count of flow records 0.119697 Total bits Bits per second load on

network 0.137958 Destination bits Destination bits per second

load 0.148073 Source bits Source bits per second load

0.183783 Packets per second Total packets per second 0.194217 Destination

packets Destination packets per second

0.229815 Server factor Measure of typical port usage 0.257600 Mid-range ports Percentage of ports > 1024

and < 10000 0.284241 Low-range ports Percentage of ports < 1025 0.301142 Source packets Source packets per second 2.109031 Total bytes Total bytes per second 2.244572 Destination bytes Destination bytes per second 4.569049 Source bytes Source bytes per second

39.264246 Peer factor Measure of same port usage

III. RESULTS AND DISCUSSION

A. Experimental Test Bed Raw network flow data is captured at the Internet border of

Ohio University and analyzed using the network monitoring tool, Argus [7] which produces a stream of network flow connections records. A connection is loosely defined as a bi-directional series of packets identified by a protocol type, source IP, source port, destination IP, and destination port. In order to process data as a time series, we collect all the records for the same time period into a single record representing one second in time. Cumulative statistics are gathered using a C++ program and the data are analyzed in MatLab functions based on equations 9.44 - 9.46 in reference [16]. Multiple experiments are run on various feature sets and parameter settings, in order to identify a system configuration which has the most discriminating power

B. Results This section describes the results of the normal density

approximation study and the prediction algorithm results. Figure 3 shows the periodogram, correlation results, and histogram of a selected feature set. Similarity between the measured feature values and generated normal Gaussian data for all features are clearly visible except for the attribute

€

x4 . The significant variation in

€

Px4 may be caused by the highly stochastic nature of the traffic.

−4 −2 0 2 4

x 107

5e+071e+082e+082e+08

(A)

Perio

dogr

am

−4 −2 0 2 4

x 107

5e+071e+082e+082e+08

(B)

−1 0 1

x 105

0e+00

2e+13(C)

Aut

ocor

rela

tion

−1 0 1

x 105

0e+00

2e+13(D)

0 1 2 3

x 104

0e+001e+032e+033e+03

(E)

Hist

ogra

m

0 1 2 3

x 104

0e+001e+032e+030e+00

(F)

Average Port

−5 0 5

x 105

5e+051e+062e+06

(A)

Perio

dogr

am

−4 −2 0 2 4

x 105

5e+051e+062e+06

(B)

−1 0 1

x 105

0e+002e+114e+11

(C)

Aut

ocor

rela

tion

−1 0 1

x 105

0e+002e+114e+11

(D)

−5000 0 5000100000e+00

2e+03(E)

Hist

ogra

m

−5000 0 5000100000e+00

2e+03(F)

High Ports

−1 0 1

x 108

1e+082e+083e+084e+08

(A)

Perio

dogr

am

−1 0 1

x 108

1e+082e+083e+084e+08

(B)

−1 0 1

x 105

0e+005e+131e+14

(C)

Aut

ocor

rela

tion

−1 0 1

x 105

0e+005e+131e+14

(D)

−5 0 5 10

x 105

0e+001e+042e+04

(E)

Hist

ogra

m

−5 0 5 10

x 105

0e+005e+031e+04

(F)

Server Factor

−2 0 2

x 106

0e+005e+061e+07

(A)

Perio

dogr

am

−2 0 2

x 106

0e+005e+061e+07

(B)

−1 0 1

x 105

0e+002e+124e+12

(C)

Aut

ocor

rela

tion

−1 0 1

x 105

0e+000e+002e+12

(D)

−1 0 1 2

x 105

0e+005e+041e+05

(E)

Hist

ogra

m

−1 0 1 2

x 105

0e+002e+044e+04

(F)

Peered Ports

Figure 3. Periodograms (

€

Px1 ,Px2 ,Px3 ,Px4 ), auto-correlation (

€

Rx1 ,Rx2 ,Rx3 ,Rx4 ), and histograms, for measured feature set and normal PDF approximation.

Figure 4 includes the plots of the auto-correlation and cross-correlation functions for the selected feature set. While (x1,x2) and (x1,x3) possess high correlation, the remaining pairs do not.

−1 0 1x 105

0

5x 1013

Cx ix1

Cx1x i

−1 0 1x 105

0

2

4x 1012

Cxix2

−1 0 1x 105

0

5

10x 1013

Cx ix3

−1 0 1x 105

−5

0

5x 1012

−1 0 1x 105

0

2

4x 1012

Cx2x i

−1 0 1x 105

0

2

4x 1011

−1 0 1x 105

0

5

10x 1012

−1 0 1x 105

−2

0

2x 1011

−1 0 1x 105

0

5

10x 1013

Cx3x i

−1 0 1x 105

0

5

10x 1012

−1 0 1x 105

0

5

10x 1013

−1 0 1x 105

−5

0

5x 1012

−1 0 1x 105

−5

0

5x 1012

Cx4x i

−1 0 1x 105

−2

0

2x 1011

−1 0 1x 105

−5

0

5x 1012

−1 0 1x 105

0

2

4x 1012

Cxix4

Figure 4. Cross correlation plots for features x1, x2, x3, and x4 selected in this

particular implementation.

Figure 5 depicts 100 seconds of collected real-time data and illustrates the prediction results at different stages in the ADAP algorithm. In Subplot (A) and (B) a dashed line shows the boundary of the Wiener and auto-correlation windows respectively. Part (C) represents the ARMA predictor signal

€

ˆ s i(n +1), and (D) shows the difference between measured values and ARMA output. The solid vertical line indicates a predicted anomaly at time n+1 (263 seconds). Notice that the peak in predictor measurements (E) and (F) occurs just before the maximum value of the actual anomaly in (A). This supports our conclusion that the algorithm predicts network anomalies.

In figure 6 we demonstrate the affect of varying parameters such as window size (shaded area). The solid vertical lines indicate a location in time of predicted anomalies. Part (A) corresponds to smaller sized windows while part (B) represents a larger window size. The system predicts anomalies of various magnitude by changing parameters. This robust performance is due to the fact that the ADAP unit has a feedback control, which allows it to adjust to the changing signal waveform.

Figure 5. Snapshots of (A) measured signal

€

xi (n) , (B) Wiener output

€

ˆ s i (n) , (C) ARMA prediction

€

ˆ s i (n + 1) , (D) difference between Wiener input and output

€

xi (n) − ˆ s i (n) , (E) difference between measured signal and ARMA

output

€

xi (n) − ˆ s i (n + 1) , and (F) difference between Wiener and ARMA

outputs

€

ˆ s i (n) − ˆ s i (n + 1) .

Figure 6. Experimental results of a sample run on x1, x2, x3, and, x4, respectively, showing the effect of changing parameters (i.e., window sizes, thresholds, and weights).

IV. CONCLUSIONS This research presents a method of anomaly detection based

on Wiener filtering of noise and ARMA modeling of network flow data. Noise and traffic signal statistics are dynamically calculated using network-monitoring metrics for traffic features such as average port, high port, server ports, and peered port. The underlying approach has been tested on near-real-time Internet traffic in the wide-area network (WAN) of Ohio University. The port usage has been determined to be the most usefull measure in the estimation process. Other port

measurements are used to confirm the anomaly prediction as part of the majority-voting scheme. Experiments reveal that our method is highly effective and robust in predicting anomalies in network traffic flow. The proposed system enables a network engineer to develop a defense mechanism, which filters hazards and other cyber attacks. In this particular study, the broader features are targeted due to their independence from specific IP addresses or ranges rather than more specific features. In turn, this allows us to predict an anomaly without a priori knowledge of any specific activity. Additional future work includes the use of an inter-feature cross correlation matrix, expanding the algorithm to handle more attributes, adding additional prediction mechanisms, and/or exploiting the optimal feature space in which a cyber attack can be estimated more easily than the original feature space.

REFERENCES [1] R. Kwitt and U. Hofmann, "Unsupervised anomaly detection in network

traffic by means of robust PCA," Computing in the Global Information Technology, ICCGI 2007, March 2007, pp. 37-37.

[2] G. Shen, et al. “Anomaly detection based on aggregated network behavior metrics,” Proc. of Networks and Mobile Computing, Shanghai, China, Sept. 21-25, 2007, pp. 2210-2213.

[3] A. Feldman, et al., “Data networks as cascades: Investigating the multifractal nature of Internet WAN traffic,” Proc. of ACM/SIGCOMM 98, vol. 28, pp. 42–55, 1998.

[4] W. Yurcik and Y. Li, "Internet security visualization case study: Instrumenting a network for NetFlow security visualization tools, " in Proc. of ACSAC 05, 2005.

[5] [D. Plonka, “A network traffic flow reporting and visualization tool,” in Proc. of the 14th USENIX Conference on System Administration, New Orleans, 2000, pp. 305-318.

[6] [K. Cho, et al., “An aggregation technique for traffic monitoring,” in Proc. of SAINT, 2002, pp. 74-81.

[7] [C. Bullard, “Argus record format,” June 2005 <http://www.qosient.com/argus/argus.5.htm/>

[8] Y. Gong, “Detecting worms and abnormal activities with NetFlows,” August 2004; <http://www.securityfocus.com/infocus/1796>

[9] R. Pang, et al., “Characteristics of internet background radiation,” in Proc. IMC’04, Oct. 25-27, 2004, Taormina, Sicily, Italy.

[10] S. Jin, et al., “Network intrusion detection in covariance feature space,” Pattern Recognition 40 (2007), 2185-2197.

[11] A. Karasaridis, et al., “Wide-scale botnet detection and characterization,” Proc. of the first conference on First Workshop on Hot Topics in Understanding Botnet, Cambridge, MA, 2007, pp. 7 – 7.

[12] A. Sang and S. Li, “A predictability analysis of network traffic,” in Proc. of INFOCAM 2000, vol. 1, pp. 342-351.

[13] S. Theodoridis and K. Koutroumbas, Pattern Recognition, 3rd ed., Academic Press, 2006.

[14] B. Porat, Digital Processing of Random Signals Theory & Methods, Prentice Hall, 1994.

[15] P. Scalart and J.V. Filho, “Speech enhancement based on a priori signal to noise estimation,” Proc. 1996 IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP-96), Vol. 2, 7-10 May, 1996, pp. 629–632.

[16] J. S. Lim, Two-Dimensional Signal and Image Processing, Prentice Hall, 1990.

[17] W. L. Crum, "Tests for Serial Correlation in Regression Analysis Based on the Periodogram of Least-Squares Residuals," Journal of the American Statistical Association, Vol. 18, No. 143. (Sep., 1923), pp. 889-899.

[18] J. Durbin, “The Resemblance Between the Ordinate of the Periodogram and the Correlation Coefficient,” Biometrika, Vol. 56, No. 1. (Mar., 1969), pp. 1-15.

Documents

Anomaly prediction in network traffic using adaptive Wiener filtering and ARMA modeling