Classification of Applications in HTTP Tunnels By Gajen Piraisoody, Changcheng Huang,Biswajit Nandy,...

Classification of Applications in HTTP Tunnels

Gajen Piraisoody, Changcheng Huang ,Biswajit Nandy, Nabil Seddigh

Electrical and Computer EngineeringCarleton University.Ottawa, ON. Canada.

12 November 2013

Outline• Overview• Motivation• Problem Statement• Contribution• Approach to classification• Evaluation• Conclusion

Overview – HTTP Tunnel

What is HTTP Tunnelled Traffic?

• HTTP port used to carry web traffic

• Non-HTTP applications are wrapped in HTTP protocols

• HTTP port now tunnels email, chat, video, image, audio, file-transfer and

peer to peer traffic

Why HTTP Tunnel non-HTTP applications?

• HTTP clients (browser) are readily available and deployable

• Tunneling permits applications to by-pass restricted network connectivity

that exists in the form of firewalls, proxy and NAT

Motivation

HTTP Traffic Classification

• HTTP traffic in an entire network is about 80%

• HTTP tunneled traffic is not identifiable by ports alone

• Tunneled traffic like YouTube and Netflix is increasing in cloud network

• Info on tunneled traffic helps cloud-centre management with planning,

provisioning and ensuring quality of service

Why flow-based against DPI classification process?

• Provides a scalable software solution(less CPU consumption)

• Can classify encrypted data

Problem Statement

Given network traffic measured with NetFlow

Find a way to classify HTTP tunnelled traffic

• Audio (Radio & Music), Video and File-transfer

No training dataset needed for the proposed algorithm

Use information available from NetFlow only

Contribution

Proposed scheme classifies HTTP tunneled traffic: audio(radio

& music), video and file-transfer

Proposed scheme helps audio classification by using

‘occupancy’ feature

Proposed scheme enhances classification performance by

including flow-group found using flows from Content

Servers(subnet masked IP of long-flow)

Approach in detail

Identify long-flow HTTP traffic Parameter : BPF

Classify radio trafficParameter : BPF, BPP, BPS, Occupancy

Classify music trafficParameter : BPF, BPP, BPS, Occupancy

Classify video trafficParameter : BPF, BPP, BPS, Flow-group

Classify file-transfer trafficParameter : BPF, BPP, BPS, Flow-group

Bytes-per-second(BPS), Bytes-per-flow(BPF), Bytes-per-pkt(BPP)

Approach to Classification

Identify Long-flow HTTP Traffic

Classify Audio Traffic

Classify Video & File-transfer Traffic

Identifying HTTP Traffic

Long-flow has byte size larger than a threshold Audio, video and file-transfer are generally long-flow

HTTP_PORTS 80, 443, 1935, 8008, 8080, 8088, 8090

Approach

99.4 % of radio rates are between 20 and 320 Kbps (Statistics from 3683 online radio web sites)

98% of online music rates are between 64 and 320Kbps (Statistics from >20 online music sites)

95% Confidence Interval of radio bytes-per-packet are between 900 and 1470 (Samruay et.al [1])

95% Confidence Interval of music bytes-per-packet are between 1260 and 1500 (Samruay et.al [1])

Behavioral analysis: Online audio listener typically listens to

audio for more than 5 minutes

There are two distinct audio types : Radio & Music(songs)

New concept : Occupancy helps classify audio. Occupancy is a ratio of the

flow duration over the entire duration of a chunk of time.

0123456

music(Grooveshark)

radio (Hdradio)

video(CTV)

Difference between Radio & MusicContinuous - Radio contents appears to download every second of the flow

Dirac - Songs in a playlist are downloaded & played one at a time

The max/min size of a radio flow is dependent on maximum flow-period configuration and the offered radio rates

The max/min size of a music flow is dependent on max/min song duration and offered online music rates

95% confidence interval of radio occupancy from DS-1,DS-2,SME-6,SME-7 and SME-8 is 82%,100%

95% confidence interval of music occupancy from DS-1,DS-2,SME-6,SME-7 and SME-8 is 0%,55%

Assumption : Minimum number of radio-flows are two (5 minutes at least)

Assumption : Minimum number of music-flows are two ( 5 minutes at least)

Assumption : Maximum radio-phase timeout is based on a flow-period(120 seconds)

Maximum music-phase timeout is based on maximum song duration (382 seconds)

Approach

CDN’s Authoritative DNS Server

Client Server

1) Client clicks on audio/video hyperlink

2) Metafile sent to client

Listening

HTTP Server

Web Browser

Media Player

8) Request multimedia content 1

5) Responds with CDN site

6) FromDNS lookup ,request sent tio CDN admin

7) Responds with address of all contents on all CDN’s

4) Request multimedia content

9) Request multimedia content 210) Content1

11) Content2

Background

• Multimedia Distribution (3 types)

Video flow-attributes (bytes-per-packet, bytes-per-flow, download rates)

& flow-group technique (FG) are used to classify video & file-transfers

Flow-group (FG)

• Video flow is associated with meta-data, style sheet, advertisements

• Kei.et.al[3] defined FG as the number of flows that occur within a few

seconds of video-flow with same destination-IP address

• Our expanded flow-group also includes flows that occur within a

longer duration that have the same subnet masked source-IP

address and the same destination-IP address

An Example

1 2 3 4 5 6 7 8 9 101112131415161718192021222324252627282930313233343536012345678

Flow Size

flow-index

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 360

102030405060708090

Flow Duration

flow-index

Example cont`d

1 2 3 4 5 6 7 8 9 1011121314151617181920212223242526272829303132333435360

1600Bytes-per-packet

Flow Index

0 1 2 3 4 5 6 7 8 9 1011 12131415161718192021222324252627282930313233343536

Type of Flow

flow-index

flow-g

-60 -4 0 1 10

Kei.et.al's flow-group - 98% within 4 seconds before video-flow and 97.8% of flow-group are

within 1 seconds after video-flow

Flow-group range (seconds)

Improved flow-group - 94.4% within 60 sec-onds before video-flow and 94.1% of flow-group are within 10 seconds after video-flow

video-flow

All flow-group statistics are estimated from dataset DS-4 and DS-5

-92.6% of flow-group-bytes-per-flow is above 1000 and below 500000 -Almost 100% of flow-group bytes-per-packet are above 200

Gather potential V/F flows

• flow > 0.5MB

• & > 1260 bytes-per-pkt

• & > 128Kbps

• & order by destination-IP

and flow start time

For every potential V/F flow, gather potential

flow-group(FG) flows when:

• FG flow > V/F start-time – 4

• &FG flow < V/F start-time + 1

• & FG flow and V/F has same dest-IP

• & FG flow between 1000B and 0.5 MB

• & FG flow between 200 and 1500 BPP

For V/F-phase gather potential FG flows:

• Same source IP address-subnet

• Same destination IP address

• & FG flow > V/F start-time – 60

• &FG flow < V/F start-time + 10

• & FG flow between 1000B and 0.5 MB

• & FG flow between 200 and 1500 BPP

If FG == true:

inc FG counter

If FG == true:

inc FG counter

If FG >0:Label videoelse:Label file-transfer

Green is original flow-group(FG), Yellow is improvised flow-group. Both FG are run

Evaluation

Datasets used to test algorithms Accuracy measurement assessment

• Precision is the systems correct predictions against all predicted value. That is precision = TP / (TP+FP)

• Recall is the systems correct predictions against all actual correct value. That is recall = TP / (TP + FN)

• F-Measure is the harmonic mean of recall and precision. That is F-measure => 2 * Precision * Recall / (Precision + Recall)

• accuracy = TP + TN / (TP + FP + FN + TN) – true results Compare against other algorithms

NaïveBayes SVM (Support Vector Algorithm)

Evaluation – Datasets

SME-6 SME-7 SME-8Date 1/7/2013 1/22/2013 1/23/2013Duration(s) 24723 28207 13628Start-time (GMT-5) 10:18:04 10:29:04 10:56:20Flows 249822 287616 198409Packets 13376109 15351639 10170693

Bytes 11158181285 13589511746 8728052938

HTTP Flows 75485 87181 63951

HTTP Packets 7346663 8814438 5628558

HTTP Bytes 10456335955 12545720613 7982629610

Evaluation – Results

SME6-Audio SME6-File SME6-Video SME7-Audio SME7-File SME7-Video SME8-Audio SME8-File SME8-Video

70.8%66.5%

93.0% 93.6%

82.5%85.1%

89.7%94.2%

F-Measure

NaivesBayes SVM Proposed Algorithm

Evaluation – Results

SME-6 SME-7 SME-8

NaivesBayes 39.1% 73.5% 71.4%

SVM 17.8% 16.3% 42.0%

Proposed Algorithm 70.5% 89.9% 90.9%

73.5% 71.4%

17.8% 16.3%

89.9% 90.9%

Accuracy

Conclusion

• Proposed algorithm uses flow-based approach and classifies high percentage of tunneled traffic : audio, video and file-transfer

• Proposed audio algorithm:• Used a concept called occupancy to classify radio & music traffic

• Proposed video & file-transfer algorithm• Used improvised flow-group method to help increase

classification accuracy of video and file-transfer traffic• Proposed scheme’s F-measure is at least 10% more than

NaiveBayes and SVM

Reference[1] Samruay Kaoprakhon , Vasaka Visoottiviseth, "Classification of Audio and Video Traffic over HTTP Protocol," in Communications and Information Technology, 2009. ISCIT 2009. 9th International Symposium on, Sept 2009

[2] M. Twardos, "The Information Diet," 2011. [Online]. Available: http://theinformationdiet.blogspot.ca/2011/11/probability-distribution-of-song-length.html. [Accessed 2013]

[3] K Takeshita, T Kurosawa, M Tsujino and M Iwashita, "Evaluation of HTTP Video Classification Method Using Flow Group Information," in Telecommunications Network Strategy and Planning Symposium (NETWORKS), 2010 14th International, Sept 2010.

[4] H.Kim, K.Claffy, M.Fomenkov, D.Barman, M.Falutsos, K.Lee, " Internet Traffic Classification Demystified: Myths, Caveats, and the Best Practices Classification of Audio and Video Traffic over HTTP Protocol," in ACM, 2008

[5] POWERS, D.M.W. “EVALUATION: FROM PRECISION, RECALL AND F-MEASURE TO ROC, INFORMEDNESS, MARKEDNESS & CORRELATION ," in Journal of Machine Learning Technologies, Volume 2, Issue 1, 2011, pp-37-63

Classification of Applications in HTTP Tunnels By Gajen Piraisoody, Changcheng Huang,Biswajit Nandy,...

Documents

InfografiaMatricula AFICHEA3 Nandy

A SSO ’ Y - Nandy

R éunion publique du 9 mars 2014 Liste NANDY 2014

A study in Visual Paradigm (by Indranil Nandy)

Nandy, Ashis h Forgotten Doubles 2505434

Mood Disorders اختلالات خلقی Bipolar Disorders اختلالات دوقطبی Depressive Disorders اختلالات افسردگی By : Dr Seddigh HUMS

“La maison médicale - La ville de Nandy

Ashis Nandy - Colonization of the Mind

Somatoform Disorders By : Dr Seddigh HUMS Dr Seddigh

Psychopharmacology1 Anti Psychotic, Mood Stabilizer DR SEDDIGH

SK Nandy-OISD Standards

Prayas session cgi & cga by – soumalya nandy

AntiPattern (by Indranil Nandy, IIT Kharagpur)

Engineering Group Journal of Civil Engineering and ... · Journal of Civil Engineering and Environmental Sciences ... Nandy S, Nandy A (2017) Utility of the Ancient Indian Science

Culture, Voice and Development- Nandy

History's Forgotten Doubles Ashis Nandy History and Theory, Vol

Pharmacology of Psychotherapeutic Drugs By : Dr Seddigh HUMS

Ashis Nandy El Estado

SAS (by Indranil Nandy)

Guide Nandy pratique - La ville de Nandy€¦ · Directeur de la Communication A travers ce guide pratique de la ville, la municipalité souhaite ... Responsable : Didier ARPAILLANGE