View
215
Download
1
Category
Preview:
Citation preview
Classification of Applications in HTTP Tunnels
By
Gajen Piraisoody, Changcheng Huang ,Biswajit Nandy, Nabil Seddigh
Electrical and Computer EngineeringCarleton University.Ottawa, ON. Canada.
12 November 2013
Slide 2
Outline• Overview• Motivation• Problem Statement• Contribution• Approach to classification• Evaluation• Conclusion
Slide 3
Overview – HTTP Tunnel
What is HTTP Tunnelled Traffic?
• HTTP port used to carry web traffic
• Non-HTTP applications are wrapped in HTTP protocols
• HTTP port now tunnels email, chat, video, image, audio, file-transfer and
peer to peer traffic
Why HTTP Tunnel non-HTTP applications?
• HTTP clients (browser) are readily available and deployable
• Tunneling permits applications to by-pass restricted network connectivity
that exists in the form of firewalls, proxy and NAT
Slide 4
Motivation
HTTP Traffic Classification
• HTTP traffic in an entire network is about 80%
• HTTP tunneled traffic is not identifiable by ports alone
• Tunneled traffic like YouTube and Netflix is increasing in cloud network
• Info on tunneled traffic helps cloud-centre management with planning,
provisioning and ensuring quality of service
Why flow-based against DPI classification process?
• Provides a scalable software solution(less CPU consumption)
• Can classify encrypted data
Slide 5
Problem Statement
Given network traffic measured with NetFlow
Find a way to classify HTTP tunnelled traffic
• Audio (Radio & Music), Video and File-transfer
No training dataset needed for the proposed algorithm
Use information available from NetFlow only
Slide 6
Contribution
Proposed scheme classifies HTTP tunneled traffic: audio(radio
& music), video and file-transfer
Proposed scheme helps audio classification by using
‘occupancy’ feature
Proposed scheme enhances classification performance by
including flow-group found using flows from Content
Servers(subnet masked IP of long-flow)
Slide 7
Approach in detail
Identify long-flow HTTP traffic Parameter : BPF
Classify radio trafficParameter : BPF, BPP, BPS, Occupancy
Classify music trafficParameter : BPF, BPP, BPS, Occupancy
Classify video trafficParameter : BPF, BPP, BPS, Flow-group
Classify file-transfer trafficParameter : BPF, BPP, BPS, Flow-group
Bytes-per-second(BPS), Bytes-per-flow(BPF), Bytes-per-pkt(BPP)
Slide 8
Approach to Classification
Identify Long-flow HTTP Traffic
Classify Audio Traffic
Classify Video & File-transfer Traffic
Slide 9
Identify Long-flow HTTP Traffic
Identifying HTTP Traffic
Long-flow has byte size larger than a threshold Audio, video and file-transfer are generally long-flow
HTTP_PORTS 80, 443, 1935, 8008, 8080, 8088, 8090
Slide 10
Identify Long-flow HTTP Traffic
Classify Audio Traffic
Classify Video & File-transfer Traffic
Approach
Slide 11
Classify Audio Traffic
99.4 % of radio rates are between 20 and 320 Kbps (Statistics from 3683 online radio web sites)
98% of online music rates are between 64 and 320Kbps (Statistics from >20 online music sites)
95% Confidence Interval of radio bytes-per-packet are between 900 and 1470 (Samruay et.al [1])
95% Confidence Interval of music bytes-per-packet are between 1260 and 1500 (Samruay et.al [1])
Slide 12
Classify Audio Traffic
Behavioral analysis: Online audio listener typically listens to
audio for more than 5 minutes
There are two distinct audio types : Radio & Music(songs)
New concept : Occupancy helps classify audio. Occupancy is a ratio of the
flow duration over the entire duration of a chunk of time.
0123456
Ave
rage
dow
nloa
d ra
te (M
bps)
music(Grooveshark)
radio (Hdradio)
video(CTV)
Slide 13
Classify Audio Traffic
Difference between Radio & MusicContinuous - Radio contents appears to download every second of the flow
Dirac - Songs in a playlist are downloaded & played one at a time
The max/min size of a radio flow is dependent on maximum flow-period configuration and the offered radio rates
The max/min size of a music flow is dependent on max/min song duration and offered online music rates
95% confidence interval of radio occupancy from DS-1,DS-2,SME-6,SME-7 and SME-8 is 82%,100%
95% confidence interval of music occupancy from DS-1,DS-2,SME-6,SME-7 and SME-8 is 0%,55%
Assumption : Minimum number of radio-flows are two (5 minutes at least)
Assumption : Minimum number of music-flows are two ( 5 minutes at least)
Assumption : Maximum radio-phase timeout is based on a flow-period(120 seconds)
Maximum music-phase timeout is based on maximum song duration (382 seconds)
Slide 14
Approach
Identify Long-flow HTTP Traffic
Classify Audio Traffic
Classify Video & File-transfer Traffic
Slide 15
CDN’s Authoritative DNS Server
Client Server
1) Client clicks on audio/video hyperlink
2) Metafile sent to client
3) M
etafi
le
Listening
HTTP Server
CDN_1
Web Browser
Media Player
8) Request multimedia content 1
5) Responds with CDN site
6) FromDNS lookup ,request sent tio CDN admin
7) Responds with address of all contents on all CDN’s
CDN_n
4) Request multimedia content
9) Request multimedia content 210) Content1
11) Content2
Background
• Multimedia Distribution (3 types)
Slide 16
Classify Video & File-transfer Traffic
Video flow-attributes (bytes-per-packet, bytes-per-flow, download rates)
& flow-group technique (FG) are used to classify video & file-transfers
Flow-group (FG)
• Video flow is associated with meta-data, style sheet, advertisements
• Kei.et.al[3] defined FG as the number of flows that occur within a few
seconds of video-flow with same destination-IP address
• Our expanded flow-group also includes flows that occur within a
longer duration that have the same subnet masked source-IP
address and the same destination-IP address
An Example
Slide 17
1 2 3 4 5 6 7 8 9 101112131415161718192021222324252627282930313233343536012345678
Flow Size
flow-index
Log
10(B
ytes
)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 360
102030405060708090
Flow Duration
flow-index
TIm
e (S
econ
ds)
Example cont`d
Slide 18
1 2 3 4 5 6 7 8 9 1011121314151617181920212223242526272829303132333435360
200
400
600
800
1000
1200
1400
1600Bytes-per-packet
Flow Index
0 1 2 3 4 5 6 7 8 9 1011 12131415161718192021222324252627282930313233343536
Type of Flow
flow-index
vide
o-flo
w
flow-g
roup
signa
l-flo
w
Slide 19
Classify Video & File-transfer Traffic
-60 -4 0 1 10
Kei.et.al's flow-group - 98% within 4 seconds before video-flow and 97.8% of flow-group are
within 1 seconds after video-flow
Flow-group range (seconds)
Improved flow-group - 94.4% within 60 sec-onds before video-flow and 94.1% of flow-group are within 10 seconds after video-flow
video-flow
All flow-group statistics are estimated from dataset DS-4 and DS-5
-92.6% of flow-group-bytes-per-flow is above 1000 and below 500000 -Almost 100% of flow-group bytes-per-packet are above 200
Slide 20
Classify Video & File-transfer Traffic
Start
Gather potential V/F flows
• flow > 0.5MB
• & > 1260 bytes-per-pkt
• & > 128Kbps
• & order by destination-IP
and flow start time
End
For every potential V/F flow, gather potential
flow-group(FG) flows when:
• FG flow > V/F start-time – 4
• &FG flow < V/F start-time + 1
• & FG flow and V/F has same dest-IP
• & FG flow between 1000B and 0.5 MB
• & FG flow between 200 and 1500 BPP
For V/F-phase gather potential FG flows:
• Same source IP address-subnet
• Same destination IP address
• & FG flow > V/F start-time – 60
• &FG flow < V/F start-time + 10
• & FG flow between 1000B and 0.5 MB
• & FG flow between 200 and 1500 BPP
If FG == true:
inc FG counter
If FG == true:
inc FG counter
If FG >0:Label videoelse:Label file-transfer
Green is original flow-group(FG), Yellow is improvised flow-group. Both FG are run
:
Slide 21
Evaluation
Datasets used to test algorithms Accuracy measurement assessment
• Precision is the systems correct predictions against all predicted value. That is precision = TP / (TP+FP)
• Recall is the systems correct predictions against all actual correct value. That is recall = TP / (TP + FN)
• F-Measure is the harmonic mean of recall and precision. That is F-measure => 2 * Precision * Recall / (Precision + Recall)
• accuracy = TP + TN / (TP + FP + FN + TN) – true results Compare against other algorithms
NaïveBayes SVM (Support Vector Algorithm)
Slide 22
Evaluation – Datasets
SME-6 SME-7 SME-8Date 1/7/2013 1/22/2013 1/23/2013Duration(s) 24723 28207 13628Start-time (GMT-5) 10:18:04 10:29:04 10:56:20Flows 249822 287616 198409Packets 13376109 15351639 10170693
Bytes 11158181285 13589511746 8728052938
HTTP Flows 75485 87181 63951
HTTP Packets 7346663 8814438 5628558
HTTP Bytes 10456335955 12545720613 7982629610
Slide 23
Evaluation – Results
SME6-Audio SME6-File SME6-Video SME7-Audio SME7-File SME7-Video SME8-Audio SME8-File SME8-Video
27.5%
59.5%
39.4%
56.1%
79.7%
70.8%66.5%
64.0%
86.6%
16.8%
23.2%
42.6%
21.6%
12.5%
40.4%
60.4%
49.1%
43.1%
84.9%
60.8%
72.9%
93.0% 93.6%
82.5%85.1%
89.7%94.2%
F-Measure
NaivesBayes SVM Proposed Algorithm
Slide 24
Evaluation – Results
SME-6 SME-7 SME-8
NaivesBayes 39.1% 73.5% 71.4%
SVM 17.8% 16.3% 42.0%
Proposed Algorithm 70.5% 89.9% 90.9%
39.1%
73.5% 71.4%
17.8% 16.3%
42.0%
70.5%
89.9% 90.9%
Accuracy
Slide 25
Conclusion
• Proposed algorithm uses flow-based approach and classifies high percentage of tunneled traffic : audio, video and file-transfer
• Proposed audio algorithm:• Used a concept called occupancy to classify radio & music traffic
• Proposed video & file-transfer algorithm• Used improvised flow-group method to help increase
classification accuracy of video and file-transfer traffic• Proposed scheme’s F-measure is at least 10% more than
NaiveBayes and SVM
Slide 26
Reference[1] Samruay Kaoprakhon , Vasaka Visoottiviseth, "Classification of Audio and Video Traffic over HTTP Protocol," in Communications and Information Technology, 2009. ISCIT 2009. 9th International Symposium on, Sept 2009
[2] M. Twardos, "The Information Diet," 2011. [Online]. Available: http://theinformationdiet.blogspot.ca/2011/11/probability-distribution-of-song-length.html. [Accessed 2013]
[3] K Takeshita, T Kurosawa, M Tsujino and M Iwashita, "Evaluation of HTTP Video Classification Method Using Flow Group Information," in Telecommunications Network Strategy and Planning Symposium (NETWORKS), 2010 14th International, Sept 2010.
[4] H.Kim, K.Claffy, M.Fomenkov, D.Barman, M.Falutsos, K.Lee, " Internet Traffic Classification Demystified: Myths, Caveats, and the Best Practices Classification of Audio and Video Traffic over HTTP Protocol," in ACM, 2008
[5] POWERS, D.M.W. “EVALUATION: FROM PRECISION, RECALL AND F-MEASURE TO ROC, INFORMEDNESS, MARKEDNESS & CORRELATION ," in Journal of Machine Learning Technologies, Volume 2, Issue 1, 2011, pp-37-63
Recommended