Effective Media Traffic Classification Using Deep Learningsecurity.riit.tsinghua.edu.cn/share/QLV-ICCDA19-traffic classification.… · Eective Media Traic Classification Using Deep

E�ective Media Tra�ic Classification Using Deep LearningQing Lyu

Tsinghua UniversityBeijing

[email protected]

Xingjian LuBeijing University of Posts and Telecommunications

[email protected]

ABSTRACTTra�c classi�cation (TC) is very important as it can provide usefulinformation which can be used in the �exible management of thenetwork. However, TC has become more and more complicatedbecause of the emergence of various network applications andtechniques. In this paper, we apply deep learning based methodto the classi�cation of four di�erent kinds of media tra�c, i.e., au-dio, picture, text and video. We collect tra�c data from the realnetwork environment. Multilayer Perceptron (MLP) and Convolu-tional Neural Network (CNN) based tra�c classi�cation methodare designed to accurately classify the target tra�c into di�erentcategories. We found that MLP has very good performance in mostscenarios. Moreover, speci�c architecture can reduce the trainingtime of the neural network in the classi�cation.

CCS CONCEPTS•Networks→Network algorithms; Network services; • Theoryof computation → Design and analysis of algorithms;

KEYWORDSTra�c Classi�cation; Deep Learning

ACM Reference Format:Qing Lyu and Xingjian Lu. 2019. E�ective Media Tra�c Classi�cation UsingDeep Learning. In Proceedings of International Conference on Compute andData Analysis (ICCDA’ 2019). ACM, New York, NY, USA, 7 pages. https://doi.org/10.1145/nnnnnnn.nnnnnnn

1 INTRODUCTIONThe internet has become more and more complex due to the vari-ous appearing new applications and the development of techniquessuch as Software-De�ned Networking (SDN) [10, 27] and NetworkFunction Virtualization (NFV) [11]. Under the situation that nu-merous and complicated tra�c are often protected by the moderninternet techniques, a crucial task is to correctly classify them intodi�erent categories, which is known as tra�c classi�cation (TC).Accurate classi�cation of internet tra�c can bring the InternetService Provider (ISP) with more �exible arrangements of internetservice such as pricing and bandwidth allocation. Operators canalso learn from the classi�cation results and promptly adjust themanagement of internet devices or the forwarding rules to make

Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor pro�t or commercial advantage and that copies bear this notice and the full citationon the �rst page. To copy otherwise, or republish, to post on servers or to redistributeto lists, requires prior speci�c permission and /or a fee.ICCDA’ 2019, March 14-17, 2019, Kahului, Hawaii, USA© 2019 Copyright held by the owner/author(s).ACM ISBN 978-1-4503-6634-2.https://doi.org/10.1145/nnnnnnn.nnnnnnn

the internet more e�cient [3, 23, 44]. The key problem is to provideprecisely tra�c classi�cation.

TC needs to take care of ever-changing tra�c in the internet andmake sure that the classi�cation results can meet the requirementsof network users. For example, �ow priorities of the incoming �owsand the bandwidth allocation [19] should be suitably arranged ac-cording to the classi�cation results. How precisely and can theproblem of TC be solved is responsible for the e�ciency of band-width management [16, 19, 22, 23, 40], On the other hand, di�erentkinds of tra�c from di�erent customers should be properly man-aged to meet the customers’ Quality of Experience (QoE) Therefore,it is of signi�cant importance to improve the performance of TC.

A lot of work has been proceeded by researchers under thetopic of TC. Some of them focus on the classi�cation of internetapplications such as whether it is HTTP tra�c or FTP tra�c [4, 25,28, 29, 36]. The research emphasis is put on the identi�cation ofdi�erent protocols. Some priori knowledge is often required there.Many other researchers are interested in the anomaly detection[31] and malware detection [33]. These works often play a vitalrole in the Intrusion Detection System (IDS) which is designed toprotect the internet from various malicious attacks. To protect theinternet users from network security problems automatically, lesshuman intervention is desired. Some others devote themselves tothe study of the classi�cation of encrypted tra�c [24].

In this paper, we consider the tra�c classi�cation problem fromanother perspective where we do not classify the tra�c with respectto the protocols, but try to study the media types of the tra�c.Generally, we are interested in �nding out whether the currenttra�c is a video tra�c or a audio tra�c or whether it is a picturetra�c or a text tra�c. Because di�erent media type of tra�c requiredi�erent amount of network bandwidth and transmission latency.For example, video streaming tra�c is often huge and need a greatdeal of bandwidth, while audio tra�c cannot tolerate large latencybecause some audio tra�c can be VoIP tra�c which needs to betransmitted in real time. The classi�cation of tra�c types ratherthan speci�c protocols can bene�t the network management in amore �exible level and ISP can provide di�erent kinds of serviceaccording to the tra�c’s bandwidth demands and latency demands.Therefore, the accurate classi�cation of the media types of tra�c isextremely important. We collect the tra�c dataset which containsthe four classes of labeled tra�c from the real network environment.The dataset includemore than 10000 records of �ows that are almostevenly distributed among the four classes of tra�c. In considerationof the de�ciency of the traditional tra�c classi�cation method,deep learning method is adopted to manage the classi�cation ofthe four kinds of tra�c [5, 28]. As deep learning has achieved greatsuccesses in many classi�cation problems [2, 39], we apply deeplearning to the problem of media tra�c classi�cation. Speci�cally,two di�erent deep learning modules, Multilayer Perceptron (MLP)

https://doi.org/10.1145/nnnnnnn.nnnnnnn



ICCDA’ 2019, March 14-17, 2019, Kahului, Hawaii, USA Qing Lyu and Xingjian Lu

[12] and Convolutional Neural Network (CNN) [20] is used in ourtra�c classi�cation system. Both of which apply back propagation[34] to optimize the model parameters so that the practical outputcan approximate the ideal value as much as possible. A combinationof packet level features and �ow level features are employed toimprove the learning performance. The performance of the twodeep learning methods in tra�c classi�cation are compared in theevaluation. Evaluation results show that a very high classi�cationaccuracy can be achieved by using the deep learning method withthe combined features. The precision and recall are also satisfactory.We �nd that MLP is very good at the classi�cation of picture, textand video tra�c while CNN has a preference for audio tra�c. Therelationship between the training time of the neural network fortra�c classi�cation and the neural network’s architecture is alsorevealed at the end of the paper.

2 RELATEDWORKTra�c classi�cation has been a hot topic since a long time ago.There has been a plenty of work on the problem. Previous work canbeen generally divided into three di�erent classes, i.e., port basedmethod [5, 17, 26, 28], payload based method [8, 9, 13, 18, 46] andstatistic or machine learning based method [1, 24, 30, 32, 37, 38, 41–43, 45].

Port based method is e�ective in the early time because theport number is assigned by Internet Assigned Numbers Authority(IANA) [14] and the number is corresponding to packet headerinformation, people can classify internet tra�c by the assignedport number from the packet header. This method can achievevery high classi�cation accuracy. However, with the increase ofinternet complexity, port number is no longer invariable and is oftenassigned randomly. Moreover, the port number is even disguised toavoid the attacks from internet in many cases. Port number basedmethod cannot provide e�ective tra�c classi�cation nowadays[28].

Payload based method is also known as Deep Inspection (DI).It compares the payload of the internet tra�c to a certain patternwhich is corresponding to a certain kind of tra�c [9]. If payloadmatches pattern, then the tra�c can be classi�ed to the certainclass. This method can obtain quite high classi�cation accuracyfor the unencrypted tra�c. Nonetheless, more and more internettra�c tend to be encrypted for the consideration of con�dential-ity. Payload based method cannot work for the classi�cation ofencrypted tra�c. Moreover, some internet tra�c has the choiceof protocol encapsulation, which also leads to the ine�ectivenessof payload based tra�c classi�cation. On the other side, payloadbased method needs to look into the packet content which alsointroduces the problem of privacy. The computation complexityis also very high since the number of tra�c patterns is very large,therefore, it cannot handle the high speed tra�c and large number�ows [5].

Another widely use method for tra�c classi�cation is machinelearning, or statistics based method. Machine learning has provenits e�ectiveness in many areas such as data mining, computer vi-sion, bioinformatics and pattern recognition [2, 7, 21, 39]. A numberof researchers bring machine learning to the classi�cation of inter-net tra�c [24, 32]. Machine learning generally includes two stages:

training stage and test stage. It �rst extracts useful features from thesource data and feed these features to the learning module in thetraining stage. After su�cient times of training, a mature learningmodule is formulated. In the testing stage, input the testing data tothe trained module and one can get the classi�cation result. A pros-perous research direction here is to multiple layer arti�cial neuralnetwork as the learning module, this method is widely known asdeep learning.

[32] conducts the tra�c classi�cation over SDN using machinelearning and try to provide internet operators with useful forecastsand monitoring. [45] designs a QoS (Quality of Service) aware ar-chitecture to classify the tra�c in SDN, the proposed architecturecombines machine learning and DI. DI is used to maintain a dy-namic tra�c database and periodically re-training can be done atthe learning stage so that the authors can obtain a good classi�-cation accuracy. [24] presents deep packet to identify encryptedtra�c and it can classify internet tra�c to major classes as wellas end-user applications. In addition, deep packet can distinguishVPN and non-VPN tra�c. [1] introduces Support Vector Machine(SVM) and Naive Bayesian method to classify the network tra�c.Statistical features and the information of �ow correlation are usedto improve the performance of tra�c classi�cation. [30] uses thesub-�ows information to train the machine learning model, thesub-�ows are all extracted from the original full �ows. In this way,the authors are not possible to miss packets from the start of �ows.The authors apply Decision Tree and Naive Bayes method to clas-sify the IP Tra�c. Besides, this method do not need to know thedirection of �ow. [42] adopts Stacked AutoâĂŘEncoder (SAE) todeal with feature learning and feature selection automatically, SAEcan manage unlabeled data in the training thus is very popularin the unsupervised learning. The authors perform this methodon 25 common protocols and achieve good classi�cation results.The paper also considers the identi�cation of unknown protocols.Above all the works, traditional machine learning method suchas SVM or Bayesian method are easy to implement and widelyused in the classi�cation of di�erent protocols. On the other side,deep learning method like CNN or SAE adopt the arti�cial neuralnetwork as the learning module and achieve good performance inthe classi�cation of speci�c kind of tra�c [24].

3 METHODOLOGYIn recent years, researchers has witnessed the the strong abilityof deep learning in many areas. Plenty of excellent work has beenproduced with the help of deep learning [2, 39]. Indicated by thesuccess of deep learning, in this work, we apply the deep learningmethod to the classi�cation of four di�erent kinds of tra�c, i.e.,audio tra�c, picture tra�c, text tra�c, video tra�c. We design atra�c classi�cation system that uses both the packet level and �owlevel features to improve the classi�cation performance. Firstly, wecollect the four kinds of tra�c data from the real network environ-ment. Then, we excavate useful features in the raw data, we note itas the preprocessing of the dataset. Thirdly, after the preprocessing,the extracted features are fed to train the learning module. Twodi�erent deep learning modules are adopted, MLP and CNN. We donot apply traditional machine learning method such as SVM herebecause it has di�culties in dealing with high dimension data and

E�ective Media Tra�ic Classification Using Deep Learning ICCDA’ 2019, March 14-17, 2019, Kahului, Hawaii, USA

Table 1: Packet Level Features.

Features Descriptionminfps Minimum forward packet sizeminbps Minimum backward packet sizemaxfps Maximum forward packet sizemaxbps Maximum backward packet sizemeanfps Mean forward packet sizemeanbps Mean backward packet sizemedianfps Median forward packet sizemedianbps Median backward packet sizestdfps Standard deviation of forward packet sizestdbps Standard deviation of backward packet sizeminfpt Minimum forward packet inter-arriving timeminbpt Minimum backward packet inter arriving timemaxfpt Maximum forward packet inter arriving timemaxbpt Maximum backward packet inter arriving timemeanfpt Mean forward packet inter arriving timemeanbpt Mean backward packet inter arriving timemedianfpt Median forward packet inter arriving timemedianbpt Median backward packet inter arriving timestdfpt Std deviation of forward inter arriving timestdbpt Std deviation of backward inter arriving timefprt forward packet arriving timebprt backward packet arriving timefpft forward packet �nishing timebpft backward packet �nishing time

often turns out poor performance [1]. Fourthly, the trained learningmodules are used to classify the tra�c to di�erent classes.

3.1 Data CollectionWe collect the four kinds tra�c data from the real network envi-ronment. 5-tuple (source IP, destination IP, source port, destinationport, protocol) information is used to specify a tra�c �ow. As theconcerned �ows have two directions, we separate the �ow with theforward �ow and backward �ow from each other. The forward andbackward �ow make a pair tra�c. The two-directional �ows areboth used to extract useful features.

As described in the formal section, we obtain the raw tra�c fromthe real network environment and use the re�ned data to trainand test the designed neural network based tra�c classi�cationsystem. As the extracted features are various in very large scaleby the numerical measurement, we need to preprocess the featuredata before feeding it to the neural network. We deal with thefeature data under normalization. Normalization is applied here tomake sure that the feature data be processed under the same scale.Max-Min normalization [15] is adopted in our method. Then thenormalized data is used to train and test the neural network.

Once the normalization of the data is completed, we separate thewhole data into two sets, i.e., the training set and the testing set. Tomake the classi�cation results more convincing, we apply 10-foldcross validation to deal with the dataset. Speci�cally, the datasetis partitioned into 10 parts. Then 9 parts of the data are used fortraining and the last 1 part is used for testing, in this way we can

Table 2: Flow Level Features.

Features Descriptionnumpf Number of packets in forward �ownumpb Number of packets in backward �ownumbf Number of bytes in forward �ownumbb Number of bytes in backward �ownumppsf Number of packets per second in forward �ownumppsb Number of packets per second in backward �ownumbpsf Number of bytes per second in forward �ownumbpsb Number of bytes per second in backward �owsrcip Source IP addressdstip Destination IP addresssrcport Source portdstport Destination portprot Protocols

obtain a classi�cation result. Repeat this progress until every part ofthe dataset has been used for testing. Lastly, we compute the averagethe 10 times’ testing results and obtain the �nal classi�cation result.

3.2 Feature SelectionAs the collected tra�c raw data cannot be directly fed into the deeplearning modules like images, we need to extract useful featuresfrom the raw data.

Unlike other works that target on tra�c classi�cation use onlypacket level features such as packet number, packet length, we adoptboth the packet level features as well as the �ow level features toimprove the classi�cation performance. Flow level features such as�ow size can describe the tra�c more directly in the classi�cationof whether the tra�c is video or picture. We have also observedthat the tra�c classi�cation accuracy increased largely since weadd the �ow (sub-�ow) level features.

Since we have obtained the �ows of two directions (forward andbackward) in the data preprocessing stage, we can extract the �ow(sub-�ow) features in both the two directions. For example, in thepacket level, for the forward direction �ows, we calculate the sizeof the packets as an important feature because it can re�ect thetra�c type to a certain degree. We also calculate the same featuresfor the backward �ows. Besides the packet size, we �nd that packetarriving time and packet �nishing time are also important featuresin the classi�cation. For di�erent types of tra�c, the packet arrivingtime and packet �nishing time could be various, and both of whichare counted in the forward direction and the backward direction.Apart from the packet arriving time and �nishing time, we alsoemploy the packet inter arriving time as the classi�cation feature.Packet inter arriving timemeans the time gap between two adjacentpackets. For di�erent kinds of tra�c, the time gap between packetcould be quite di�erent, which makes the packet inter arriving timeplays an important role in the tra�c classi�cation of our design.The packet inter arriving time is also calculated in both the forwarddirection and backward direction.

On the other hand, at the �ow level, we have also found out anumber of important features in the classi�cation. For example, thenumber of packets in a �ow. As we have observed that the number


Figure 1: The architecture inside a neuron.

of packets is diverse in a certain �ow, therefore, the number ofpackets is chosen as an important feature. Moreover, we also noticethat the number of bytes of a �ow also improves the classi�cationaccuracy as a feature. Furthermore, the number of packets persecond and number of bytes per second are also used as the featuresin our design. The number of packets and number of bytes are bothcalculated in the forward �ow direction and the backward �owdirection, and so are the packets per second and number of bytesper second.

For the features such as packet size and packet inter arrivingtime in the packet level, we obtain the minimum, maximum, mean,median, standard deviation value of it. For instance, the minimumpacket inter arriving time and maximum packet inter arriving timeas well as the mean packet inter arriving time and median packetinter arriving time, and the standard deviation of the packet interarriving time are calculated as the classi�cation features. We alsoadopt the 5-tuple information as the input features in the �ow level.We summarize the packet level features and �ow level features usedin the classi�cation in Table 1 and Table 2 respectively.

3.3 Learning ModulesSince MLP and CNN have shown great classi�cation performancein many other areas [2, 39], we develop two tra�c classi�cationmethods that based on MLP and CNN respectively by using boththe packet level features and the �ow level features [12, 20]. MLP isa feedforward network that made up of multiple layers of neurons.Generally, MLP has one input layer, one output layer and at leastone hidden layer or middle layer. Every neuron in one layer is fullyconnected to the next layer’s neurons. The architecture inside aneuron is shown in Figure 1. Assume that the values of the currentlayer’s neurons are denoted as x = (x1, . . . ,xn )T , the weightsvector and o�sets vector between the current layer and the nextlayer’s ith neuron arewi = (wi1, . . . ,win ) andbi respectively. Then

Figure 2: CNN based tra�c classi�cation module.

the next layer’s ith neuron’s value can be represented ashi (x) = f (wix + bi ). (1)

Where f () is the activation function, it could be Logistic Function� (x) = 1

1+exp(�x ) , or Tanh Function tanh(x) = exp(x )�exp(�x )exp(x )+exp(�x ) , or

other activation functions such as Recti�ed Linear Unit (ReLU)ReLU (x) = max(0,x).

At the output layer we compare the predicted value of the neuralnetwork and the practical value, and use the Cross Entropy [6] as thecost function. Back propagation [34] is applied to train the neuralnetwork and obtain the optimal parameters that minimize the costfunction.

Another learning module we apply is CNN. CNN is a kind neuralnetwork that contains the layer of convolutions. Suppose the valueof current layer is denoted as X i , the kernel of the convolutionallayer is denoted asW i , the activation function is ReLU, then outputvalue of the convolutional layer can be represented as

Y i = ReLU (X i ⇤W i ). (2)

Behind the convolutional layer, a pooling layer is there to enhancethe neural network. Pooling is important to increase the stability ofthe neural network in case that a tiny change of the input may causea signi�cant di�erence in the output. The introduce of pooling canalso reduce the computation complexity. Moreover, pooling canavoid over�tting in the learning progress. Here, we use max poolingwhich can be represented as

maxpoolin�

✓x1 x2x3 x4

◆= max(x1,x2,x3,x4). (3)

The framework of CNN for tra�c classi�cation is shown inFigure 2 in our design. After the two layers of convolution and twolayers of pooling, a full connection layer is "fully connected" to theoutput layer. Like MLP, Cross Entropy is chosen as the cost function,also back propagation is applied to train the neural network so thatthe value of cost function is minimized.

3.4 System DesignThe designed tra�c classi�cation system collects tra�c data fromthe real network environment and gives out the classi�cation resultsat the output end. Then, meaningful management of the networkcould be done according to the tra�c classi�cation results, which


is beyond the scope of this paper and will be left for future work. Itshould be paid attention that, after the collection of the networktra�c, the raw data cannot be directly fed into the neural networkfor classi�cation because the tra�c data is not suitable to be formedin a matrix format. Therefore, the raw tra�c data need to be propre-cessed in advance. We re�ne the �ow information which is speci�edby the 5-tuple. Then useful features as described in Subsection 3.2are extracted from the �ow information. We train the neural net-work with the extracted features and obtain the tra�c classi�cationresults by feeding the trained neural network with the testing data.

4 EVALUATIONSIn this section, we present the experiments’ results in detail. Inorder to measure the classi�cation performance, we need to employseveral metrics [35] to evaluate the algorithm. Basically, we havefour types of internet tra�c, i.e., audio, picture, text and video, takethe identi�cation of the video tra�c for the example. Here we have4 di�erent decision scenarios, True Positive (TP), True Negative(TN), False Positive (FP) and False Negative (FN). TP means thevideo tra�c is correctly identi�ed as the video kind, TN means thethe non-video tra�c is not identi�ed as the video kind, FP meansthe non-video tra�c is identi�ed as the video kind, FN means thevideo tra�c is identi�ed as the non-video kind. Accordingly, themeasuring metrics are given as follows

Accuracy =TP +TN

TP +TN + FP + FN. (4)

Precision =TP

TP + FP. (5)

Recall =TP

TP + FN. (6)

Accuracy re�ects the overall e�ectiveness of the classi�er, Preci-sion speci�es the rate that the correctly identi�ed samples (TP) inthe samples that are practically identi�ed (TP+FP), Recall speci�esthe rate that the the correctly identi�ed samples (TP) in the samplesthat should be identi�ed (TP+FN).

Furthermore, we represent the harmomic mean of Precision andRecall as F1, which is given as follows

2F1=

1P

+1R

,

F1 =2P ⇤ RP + R

=2TP

2TP + FP + FN.

(7)

Where P is the Precision value, and R is the Recall value.We have four di�erent types of tra�c in the dataset, audio, pic-

ture, text and video. The designed MLP based and CNN based tra�cclassi�cation method are performed on the collected dataset. Accu-racy, Precision, Recall and F1 value are plotted out to demonstratethe e�ectiveness of the proposed tra�c classi�cation method. Toreveal the intrinsic mechanism of the neural network, we furtherexhibit the relationship that how is the training time varying withthe number of neurons of MLP.

We have found in the experiments that both the MLP based traf-�c classi�cation and CNN based tra�c classi�cation achieve verygood performance. It should be noticed that MLP has two hiddenlayers with 128 neurons and 512 neurons respectively, CNN has two

Figure 3: The tra�c classi�cation accuracy of theMLP basedmethod and CNN based method.

Figure 4: The tra�c classi�cation precision of theMLPbasedmethod and CNN based method.

Figure 5: The tra�c classi�cation recall of the MLP basedmethod and CNN based method.


Figure 6: The tra�c classi�cation F1 of the MLP basedmethod and CNN based method.

convolutional layers with 8 and 16 kernels and a fully connectedlayer with 128 neurons. MLP based tra�c classi�cation methodobtains the classi�cation accuracy of as high as more than 99%, theaccuracy of CNN based method is about 95%. The classi�cationaccuracy is presented in Figure 3. The training time for MLP andCNN is 314s and 611s respectively. At the testing stage, the classi-�cation results are obtained within 0.019s and 0.022s by the MLPbased method and CNN based method respectively.

The Precision and Recall as well as F1 value are also satisfactory.The Precision and Recall result can be found in Figure 4 and Figure 5.It can be observed that the Recall of MLP based tra�c classi�cationmethod has outperformed that of CNN based method for all kindsof the concerned tra�c. As for Precision, MLP based method is alsobetter than CNN based method except for the audio tra�c. BecauseMLP based method does not have a convolutional layer so that it iseasier to train a MLP to obtain a good classi�cation performance.However, as CNN based method adopts max pooling to improvestability so it is more suitable in the classi�cation of tra�c that hascorrelation information, which is more likely to appear in audiotra�c. Therefore, CNN based method has a preference for the classi-�cation of audio tra�c. With regard to picture and video tra�c, thePrecision of CNN based method is about 91% and 88% respectively,much less than that of MLP based method. CNN is not suitable tobe used in the classi�cation of picture and video tra�c but it canmake quite a di�erence in the classi�cation of audio tra�c.

Furthermore, F1 value is presented in Figure 6. As F1 harmonizesthe values of Precision and Recall and can re�ects the performance ofthe classi�cation methods more directly. It can be found that the F1value of MLP based method exceeds that of CNN in all tra�c typesexcept for audio. This is because CNN based method is especiallygood at the classi�cation of audio tra�c as described above.

The training time of MLP is revealed in Figure 7. We count thetraining time of MLP in the condition that the neural network canachieve 95% of classi�cation accuracy at the testing stage. We con-duct the experiments under a number of di�erent neural networkarchitectures. Speci�cally, we vary the number of neurons of MLP,all the tested MLP architectures have two hidden layers. We denotethe number pair (a,b) as the number of neurons of the �rst hiddenlayer and the second hidden layer. For example, (64, 128)means the

Figure 7: The training time of di�erentMLP architectures toachieve the accuracy of 95%.

�rst hidden layer has 64 neurons and the second hidden layer has128 neurons. Basically, it could be observed that the training timeincreases with the number of neurons in large-scale. However, wenotice the training time is in trough of wave when the two hiddenlayers has the same number of neurons and is in wave crest if thenumber of the second hidden layer is about four times that of the�rst layer’s. For the scenario that the number of neurons of thesecond hidden layer is about two times that of the �rst layer’s, thetraining time is closely distributed around the trend curve (dottedline). It is meaningful that if we want to reduce the training timein the tra�c classi�cation when applying MLP based method, wecan try to make the hidden layers have about the same number ofneurons in the condition that the learning architecture can achievethe satisfactory performance.

5 CONCLUSIONSIn this paper, we apply deep learning method to the classi�cation offour di�erent types of media tra�c (audio, picture, text and video)and provide precise classi�cation accuracy to the four kinds oftra�c. We use both the packet level features and �ow level fea-tures to enhance the classi�cation results. We collect the tra�c datafrom the real network environment and design two deep learningbased methods (MLP based method and CNN based method) toclassify the target tra�c. The designed learning architectures canachieve satisfactory classi�cation performance. MLP has outper-formed CNN in the classi�cation of picture, text and video tra�c,CNN is very good at the classi�cation of audio tra�c. Moreover,we have found that the training time can be reduced if the numberof neurons of the hidden layers are close. As a matter of fact, wehave not applied neural network with more that two layers in theconsideration of computation complexity. More complicated neu-ral network architecture for tra�c classi�cation and computationmethod to improve the processing time are left for future work.

ACKNOWLEDGMENTSThe authors would like to thank the colleagues in NSLab TsinghuaUniversity for productive discussions. The authors would also liketo thank the anonymous referees for their valuable comments andhelpful suggestions. This work is supported by National Natural


Science Foundation (No. 61872212) and National Key Research andDevelopment Program (No.2016YFB1000101).

REFERENCES[1] R. Aggarwal and Nanhay Singh. 2017. A New Hybrid Approach for Network

Tra�c Classi�cation Using Svm and Naïve Bayes Algorithm.[2] Babak Alipanahi, Andrew Delong, Matthew T Weirauch, and Brendan J Frey.

2015. Predicting the sequence speci�cities of DNA-and RNA-binding proteins bydeep learning. Nature biotechnology 33, 8 (2015), 831.

[3] Mina Tahmasbi Arashloo, Yaron Koral, Michael Greenberg, Jennifer Rexford,and David Walker. 2016. SNAP: Stateful network-wide abstractions for packetprocessing. In Proceedings of the 2016 ACM SIGCOMM Conference. ACM, 29–43.

[4] TomAuld, AndrewWMoore, and Stephen F Gull. 2007. Bayesian neural networksfor internet tra�c classi�cation. IEEE Transactions on neural networks 18, 1 (2007),223–239.

[5] Alberto Dainotti, Antonio Pescape, and Kimberly C Cla�y. 2012. Issues andfuture directions in tra�c classi�cation. IEEE network 26, 1 (2012).

[6] Pieter-Tjerk De Boer, Dirk P Kroese, Shie Mannor, and Reuven Y Rubinstein.2005. A tutorial on the cross-entropy method. Annals of operations research 134,1 (2005), 19–67.

[7] Cicero dos Santos and Maira Gatti. 2014. Deep convolutional neural networksfor sentiment analysis of short texts. In Proceedings of COLING 2014, the 25thInternational Conference on Computational Linguistics: Technical Papers. 69–78.

[8] Alessandro Finamore, Marco Mellia, Michela Meo, and Dario Rossi. 2010. Kiss:Stochastic packet inspection classi�er for udp tra�c. IEEE/ACM Transactions onNetworking (TON) 18, 5 (2010), 1505–1515.

[9] Michael Finsterbusch, Chris Richter, Eduardo Rocha, Jean-Alexander Muller, andKlaus Hanssgen. 2014. A survey of payload-based tra�c classi�cation approaches.IEEE Communications Surveys & Tutorials 16, 2 (2014), 1135–1156.

[10] Open Networking Fundation. 2012. Software-de�ned networking: The new normfor networks. ONF White Paper 2 (2012), 2–6.

[11] Bo Han, Vijay Gopalakrishnan, Lusheng Ji, and Seungjoon Lee. 2015. Networkfunction virtualization: Challenges and opportunities for innovations. IEEECommunications Magazine 53, 2 (2015), 90–97.

[12] Kurt Hornik, Maxwell Stinchcombe, and Halbert White. 1989. Multilayer feed-forward networks are universal approximators. Neural networks 2, 5 (1989),359–366.

[13] Nan Hua, Haoyu Song, and TV Lakshman. 2009. Variable-stride multi-patternmatching for scalable deep packet inspection. In INFOCOM 2009, IEEE. Citeseer,415–423.

[14] IANA. [n. d.]. https://www.iana.org/. ([n. d.]).[15] T Jayalakshmi and A Santhakumaran. 2011. Statistical normalization and back

propagation for classi�cation. International Journal of Computer Theory andEngineering 3, 1 (2011), 1793–8201.

[16] Chengjun Jia, Zhe Fu, Xiaohe Hu, Shui Cao, Liang Wang, and Jun Li. 2018.Multi-core HTB for bandwidth sharing. In Proceedings of the 2018 Symposium onArchitectures for Networking and Communications Systems. ACM, 169–171.

[17] Thomas Karagiannis, Andre Broido, Nevil Brownlee, Kimberly Cla�y, andMichalis Faloutsos. 2003. File-sharing in the Internet: A characterization ofP2P tra�c in the backbone. University of California, Riverside, USA, Tech. Rep(2003).

[18] Hyang-Ah Kim and Brad Karp. 2004. Autograph: Toward Automated, DistributedWorm Signature Detection.. In USENIX security symposium, Vol. 286. San Diego,CA.

[19] Alok Kumar, Sushant Jain, Uday Naik, Anand Raghuraman, Nikhil Kasinad-huni, Enrique Cauich Zermeno, C Stephen Gunn, Jing Ai, Björn Carlin, MihaiAmarandei-Stavila, et al. 2015. BwE: Flexible, hierarchical bandwidth allocationfor WAN distributed computing. In ACM SIGCOMM Computer CommunicationReview, Vol. 45. ACM, 1–14.

[20] Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Ha�ner. 1998. Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 (1998), 2278–2324.

[21] Honglak Lee, Roger Grosse, Rajesh Ranganath, and Andrew Y Ng. 2009. Convo-lutional deep belief networks for scalable unsupervised learning of hierarchicalrepresentations. In Proceedings of the 26th annual international conference onmachine learning. ACM, 609–616.

[22] Zhi Liu, Shijie Sun, Ju Xing, Zhe Fu, Xiaohe Hu, Jianwen Pi, Xiaofeng Yang,Yunsong Lu, and Jun Li. 2018. MN-SLA: a modular networking SLA frameworkfor cloud management system. Tsinghua Science and Technology 23, 6 (2018),635–644.

[23] Zhi Liu, XiangWang,Weishen Pan, Baohua Yang, Xiaohe Hu, and Jun Li. 2015. To-wards e�cient load distribution in big data cloud. In 2015 International Conferenceon Computing, Networking and Communications (ICNC). IEEE, 117–122.

[24] Mohammad Lotfollahi, Ramin Shirali, Mahdi Jafari Siavoshani, and Mohammd-sadegh Saberian. 2017. Deep Packet: A Novel Approach For Encrypted Tra�cClassi�cation Using Deep Learning. arXiv preprint arXiv:1709.02656 (2017).

[25] Alok Madhukar and Carey Williamson. 2006. A longitudinal study of P2P tra�cclassi�cation. In Modeling, Analysis, and Simulation of Computer and Telecom-munication Systems, 2006. MASCOTS 2006. 14th IEEE International Symposium on.IEEE, 179–188.

[26] Alok Madhukar and Carey Williamson. 2006. A longitudinal study of P2P tra�cclassi�cation. In Modeling, Analysis, and Simulation of Computer and Telecom-munication Systems, 2006. MASCOTS 2006. 14th IEEE International Symposium on.IEEE, 179–188.

[27] Nick McKeown, Tom Anderson, Hari Balakrishnan, Guru Parulkar, Larry Pe-terson, Jennifer Rexford, Scott Shenker, and Jonathan Turner. 2008. OpenFlow:enabling innovation in campus networks. ACM SIGCOMM Computer Communi-cation Review 38, 2 (2008), 69–74.

[28] Andrew W Moore and Konstantina Papagiannaki. 2005. Toward the accurateidenti�cation of network applications. In International Workshop on Passive andActive Network Measurement. Springer, 41–54.

[29] Andrew W Moore and Denis Zuev. 2005. Internet tra�c classi�cation usingbayesian analysis techniques. In ACM SIGMETRICS Performance Evaluation Re-view, Vol. 33. ACM, 50–60.

[30] T Nguyen and Grenville Armitage. 2006. Synthetic sub-�ow pairs for timely andstable IP tra�c identi�cation. In Proc. Australian Telecommunication Networksand Application Conference.

[31] Salima Omar, Asri Ngadi, and Hamid H Jebur. 2013. Machine learning tech-niques for anomaly detection: an overview. International Journal of ComputerApplications 79, 2 (2013).

[32] Mohammad Reza Parsaei, Mohammad Javad Sobouti, Seyed Raouf Khayami,and Reza Javidan. 2017. Network tra�c classi�cation using machine learningtechniques over software de�ned networks. International Journal of AdvancedComputer Science and Applications 8, 7 (2017), 220–225.

[33] RK Rahul, T Anjali, Vijay Krishna Menon, and KP Soman. 2017. Deep learningfor network �ow analysis and malware classi�cation. In International Symposiumon Security in Computing and Communication. Springer, 226–235.

[34] David E Rumelhart, Geo�rey E Hinton, and Ronald J Williams. 1986. Learningrepresentations by back-propagating errors. nature 323, 6088 (1986), 533.

[35] Muhammad Sha�q and Xiangzhan Yu. 2017. E�ective packet number for 5GIM WeChat application at early stage tra�c classi�cation. Mobile InformationSystems 2017 (2017).

[36] Yiyang Shao, Yibo Xue, and Jun Li. 2014. PPP: Towards parallel protocol parsing.China Communications 11, 10 (2014), 106–116.

[37] Yiyang Shao, Baohua Yang, Jingjie Jiang, Yibo Xue, and Jun Li. 2014. Emilie:Enhance the power of tra�c identi�cation. In 2014 International Conference onComputing, Networking and Communications (ICNC). IEEE, 31–35.

[38] Yiyang Shao, Luoshi Zhang, Xiaoxian Chen, and Yibo Xue. 2014. Towardstime-varying classi�cation based on tra�c pattern. In 2014 IEEE Conferenceon Communications and Network Security. IEEE, 512–513.

[39] Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networksfor large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).

[40] Pu Wang, Shih-Chun Lin, and Min Luo. 2016. A framework for QoS-aware tra�cclassi�cation using semi-supervised machine learning in SDNs. In 2016 IEEEInternational Conference on Services Computing (SCC). IEEE, 760–765.

[41] Pan Wang, Feng Ye, Xuejiao Chen, and Yi Qian. 2018. Datanet: Deep learningbased encrypted network tra�c classi�cation in sdn home gateway. IEEE Access6 (2018), 55380–55391.

[42] Zhanyi Wang. 2015. The applications of deep learning on tra�c identi�cation.BlackHat USA (2015).

[43] Baohua Yang, Guangdong Hou, Lingyun Ruan, Yibo Xue, and Jun Li. 2011. Smiler:Towards practical online tra�c classi�cation. In 2011 ACM/IEEE Seventh Sympo-sium onArchitectures for Networking and Communications Systems. IEEE, 178–188.

[44] Baohua Yang, Guodong Li, Yaxuan Qi, Yibo Xue, and Jun Li. 2010. DFC: TowardsE�ective Feedback FlowManagement for Datacenters. In 2010 Ninth InternationalConference on Grid and Cloud Computing. IEEE, 98–103.

[45] Changhe Yu, Julong Lan, JiChao Xie, and Yuxiang Hu. 2018. QoS-aware Tra�cClassi�cation Architecture Using Machine Learning and Deep Packet Inspectionin SDNs. Procedia computer science 131 (2018), 1209–1216.

[46] Zhenlong Yuan, Yibo Xue, and Yingfei Dong. 2013. Harvesting unique characteris-tics in packet sequences for e�ective application classi�cation. InCommunicationsand Network Security (CNS), 2013 IEEE Conference on. IEEE, 341–349.

https://www.iana.org/

Documents

Effective Media Traffic Classification Using Deep Learningsecurity.riit.tsinghua.edu.cn/share/QLV-ICCDA19-traffic classification.… · Eective Media Traic Classification Using Deep