Traffic Clusters in Networks of Convenience Ron McLeod, PhD. (Candidate)
Director - Research and Corporate DevelopmentTelecom Applications Research Alliance
(TARA)
FloCon 2009
Who is TARA
• Private consortium of 35 member companies and research institutions all working in IT/Telecom.
• Most active investor in early stage IT companies in Atlantic Canada.
• Senior Partners include:– Bell Aliant
– Cisco Systems Canada
– Nortel Networks
• We are actively seeking Research collaborations
The Project
TARA has partnered with a group of companies in a multi year projectto analyze the outbound and inbound traffic in Networks ofConvenience.
The specific companies and specific objectives of The Project remainconfidential at this point.
However, From an analysis perspective we are first interested inunderstanding the nature of this traffic.
Data sources are real traffic captures from hotels, airports and generalhotpots from around the world.
The Project
Networks of convenience are a relatively new and rapidly growingsector of the ISP community.
These are networks that serve a transient population.
The provider is compensated either by fees charged to end users, or bythe hosting organization which absorbs the cost as overhead.
The networks may be wired, typically using Ethernet, or wireless(802.11).
Relatively little is known about the ways in which these networks areused.
The Project
We believe that Networks of Convenience may be used by criminals and / or terrorists in attempts to conceal their activities, identities, or both.
Networks of convenience are the “payphones” of the twenty-firstcentury. Users of these networks take advantage of the implicitanonymity that comes with their use.
We do not know how common other forms of malicious activity may bein these networks
The Project
Network traffic characterization approaches in the past have relied onavailability stable data in an environment of perfect information.
An analyst could have access to static IP and MAC address databasesor DHCP lease logs that could be used to collate traffic to specificorigins such as identifiable workstation/user combinations, servers orother network attached devices.
In this environment, normal-versus-anomalous behaviour models couldbe used to profile network and user behaviour to detect misuses oranomalous behaviour such as masquerade attack or worm propagation.
Data Gathering
Since the sources tend to be NAT’ed, we use network tapson the interfaces inside of the edge router. Currentlycapturing inbound and outbound data separately.
Prior to analysis, full packet captures are first converted toprimitive flows.
Our research is focused on flow level analysis but thisconversion also helps to allay provider’s concerns for theircustomer’s privacy. (i.e. we don’t look at your data only thepacket header)
Observations During Conversion
100 Internal IPs Monitored for 1 month.
Of all Packets Read:• Not IPV4: 1.7%• Fragmented: 0.06%• Too Short: 0.0%• Incomplete (No Ports and or Flags): 0.0%
Overall, traffic is characterised by its non-uniformity.
1
10
100
1000
10000
100000
1000000
10000000
1 2 6 17 41 47
Protocol
Flows by Protocol
Flows by Protocol
Protocol Flows were a Little Unusual IPv6 EncapsulationAt 0.00003%
VPN’s smaller than I expected at 0.09 %Multicast Host managementAt 0.18%
TCP=65%UDP=34%
Outbound Bytes by Host Show Large Variations
Outbound Bytes by Internal Host
1
10
100
1000
10000
100000
1000000
10000000
100000000
1000000000
10000000000
Outbound Bytes by Internal Host
Outbound Bytes by Internal Host
0
200000000
400000000
600000000
800000000
1000000000
1200000000
1400000000
1600000000
1800000000
2000000000
Outbound Bytes by Internal Host
Obvious in a Linear Scale
Lets take a closer look at this guy
1
8
15
22
29
36
43
Flo
ws
by D
IP
1
10
100
1000
10000
100000
1000000
10000000
100000000
1000000000
10000000000
Flows and Bytes to DIPs for Suspecious Host
Flows by DIP
Bytes by DIP
21
778
1075
2
2753
5
4033
1
5258
4
6089
0
Flo
ws
by D
port
1
10
100
1000
10000
100000
1000000
10000000
100000000
1000000000
10000000000
Flows and Bytes by Dport to DIPs for Suspecious Host
Flows by Dport
Bytes by Dport
VRML Multi User 4204
We expected DPorts 80 and 443 to represent most traffic….
Together they accounted for 41%
Note that the DPort 0 data point is ICMP Traffic
Flows by Dport
1
10
100
1000
10000
100000
1000000
100000000
1531
2370
3366
4252
5351
6515
7678
8708
9614
10352
11111
12005
12915
13923
14886
15890
16919
17973
19010
20064
21108
22222
23303
24388
25417
26451
27403
28503
29575
30763
31984
33106
34273
35433
36656
37780
38986
40152
41345
42635
43776
44979
46171
47334
48650
49593
50430
51688
53963
56420
59232
60467
61824
63217
64843
Dport
Flows by Dport
Flows by Dport
1
10
100
1000
10000
100000
1000000
10000000
015
3123
7033
6642
5253
5165
1576
7887
0896
1410
352
1111
112
005
1291
513
923
1488
615
890
1691
917
973
1901
020
064
2110
822
222
2330
324
388
2541
726
451
2740
328
503
2957
530
763
3198
433
106
3427
335
433
3665
637
780
3898
640
152
4134
542
635
4377
644
979
4617
147
334
4865
049
593
5043
051
688
5396
356
420
5923
260
467
6182
463
217
6484
3
Dport
Flows by Dport
Note that the DPort 0 data point is ICMP Traffic
4204 lists as VRML Multi-User almost all from 1 host
One host only 25 flows on BitTorent
Only minute traces of Half Life Gaming
Sport Flows
1
10
100
1000
10000
100000
1000000
100000000
1888
2765
3642
4519
3434
7
3700
1
4818
3
4995
6
5083
3
5171
0
5258
7
5346
4
5434
1
5521
8
5609
5
5697
2
5785
0
5873
0
5960
9
6049
6
6138
8
6229
1
6323
7
6412
1
6499
8
Sport Flows
Mac Skype36459SSH 10% of all flows
13991 and 44849 52523
Fasttrack 50 hosts 1700 flows No 6667 listening?
Number of Destination by Host Shows Substantial Spikes
1
10
100
1000
10000
100000
1 191 381 571 761 951 1141 1331 1521 1711 1901 2091 2281 2471 2661 2851 3041
Total Destination Ips by Host
Suspicious Host accessed sequential ranges through multiple /16`s
Linear Scale
Lets look a little closer at his activity
0
2000
4000
6000
8000
10000
12000
14000
1 190 379 568 757 946 1135 1324 1513 1702 1891 2080 2269 2458 2647 2836 3025
Total Destination Ips by Host
36459 Dominates the Sports…
FLOWS by SPORT
1
10
100
1000
10000
100000
049
244
4934
149
440
4954
649
651
4974
949
850
4995
150
049
5015
250
255
5036
150
478
5073
150
949
5182
252
173
5230
852
438
5259
952
710
5283
152
935
5303
253
176
5332
853
469
5372
454
271
5452
154
755
5499
055
264
5579
156
041
5627
056
512
5682
357
338
5746
257
559
5765
757
754
5785
158
025
5950
261
005
6233
864
029
6551
5
FLOWS by SPORT
Removing 36459 in a linear scale we get
Sport use is near sequential above 49153
SPort Flows with 36459 removed
0
500
1000
1500
2000
2500
049
239
4933
149
425
4952
549
624
4971
849
814
4990
850
003
5009
950
195
5029
450
396
5054
550
782
5103
251
951
5218
352
312
5243
052
583
5269
352
810
5291
253
004
5311
253
267
5339
753
596
5403
854
372
5461
154
826
5505
355
326
5583
056
061
5627
556
510
5679
757
310
5744
657
538
5763
157
723
5781
557
907
5878
360
375
6152
562
967
6446
4
SPort Flows
His DPort values in Log Scale
Average is less than 10 flows (packets) per Dport
Not quite sequential but most ports above 1024 are accessed.
Flows by Dport
1
10
100
1000
10000
100000
023
5240
7356
7371
8388
4310
126
1112
212
377
1386
215
281
1676
118
322
1980
521
214
2276
024
347
2580
827
308
2878
830
313
3204
333
611
3517
036
748
3844
840
054
4170
443
418
4506
046
824
4866
250
131
5373
061
036
6407
9
Flows by Dport
Linear version Dports with Port 80 removed
Flows by Dport with 80 removed
0
500
1000
1500
2000
2500
022
8439
3655
0669
1385
4310
021
1083
012
026
1335
914
701
1609
217
568
1908
620
377
2185
923
370
2490
226
307
2771
529
154
3069
932
305
3379
635
294
3681
438
472
4002
541
583
4324
844
857
4645
548
223
4986
650
838
6014
562
896
Dport Flows
count pro dPort flags packets bytes21005 6 80 A 1 405502 6 80 A 1 521373 6 443 A 1 401280 6 80 RA 1 401261 17 1900 1 611173 6 80 S 1 52914 6 80 S 1 48866 6 65209 R 1 40673 6 80 FA 1 40430 6 80 A 1 60
1
10
100
1000
10000
100000
1 2 6 17
Protocol Flows
Protocol Flows
Protocol Distribution for Suspicious Host
65 % TCP34 % UDP
Would expect these tobe more equal for a peerand vastly skewed for ascanner.
Multi-Cast Host Management 0.18%
ICMP ratio (0.08%) Is double the aggregatevalue (0.04%)
Massive Destination IP`s (sequential /16`s)Massive Source Ports (near sequential)Massive Destination Ports (near sequential)Multi-Cast Host Management ProtocolLarger than expected ICMP RatioStandard TCP/UDP Ratio
WHO AM I ?
Tradition Demands that I ask this Question
Massive Destination IP`s (sequential /16`s)Massive Source Ports (near sequential)Massive Destination Ports (near sequential)Multi-Cast Host Management ProtocolLarger than expected ICMP
WHO AM I ?
However, unlike previous years…..
Tradition Demands that I ask this Question
I HAVE NO IDEA…….
Summary
Some obvious challengesHow to tell when host changes?
- Will test user-host profiler presented at flocon 2006.- until this is nailed down – assumptions
are more like ``let`s pretend``.Some Intriguing Opportunities
- Oops – I`m not allowed to talk about those
yet.
Thank You
I am seeking help and would welcome any
private feedback, discussions or ideas you
might have.
If you had access to this data – what would
you do?