Upload
virgil-chase
View
223
Download
0
Embed Size (px)
Citation preview
Internet Observation with ISDAS: How long does a worm
perform scanning?
Tomohiro Kobori , Hiroaki Kikuchi(Tokai Univ, Japan)
Masato Terada(Hitachi, Ltd., Japan)
Background• Witty worm infected 12,000 hosts for 75 minutes. [6]
• Nimda worm infected 150,000 hosts for 6 hours. http://www2.nsknet.or.jp/~azuma/menu.htm
• Portscan have been performed every 18 minutes on average. [1]
Can you guess how long worms infect hosts?
0
2
4
6
8
10
12
11/01/0412/01/0401/01/0502/01/0503/01/0504/01/0505/01/0506/01/0507/01/0508/01/0509/01/05
Se
nso
r ID
Time [day]
Portscans and Infection Period
round 1 round 2silent period
12/01/04 02/01/05 07/01/05 09/01/05
Time [day]
Sen
sor
ID
Our Objective
• To identify how long a host has been infected by worms.
0
2
4
6
8
10
12
Sens
or ID
Time [day]
Difficulties1. Uncertainty in identifying boundaries between rounds.
– Scanning behavior varies by worms– Subjective segmentation
2. Too many packets to dealt with by human specialist
0
2
4
6
8
10
12
Sens
or ID
Time [day]2004/10/24 2004/12/3 2004/12/14 2004/1/14 2004/2/14 2004/12/30 2005/3/2 2005/8/182005/4/27 2005/9/24
Segmentation: Pros and Cons
1. Uncertainty in boundaries between rounds
2. Too many packets
(1) random sampling & manual segmentation
good poor
(2) Constant threshold poor
(No common threshold for all hosts)
good
(3) Adaptive threshold good good
Fundamental Definition
• Number of rounds: r = 2
s1
s3
count c
round 1 round 2
Infection period d1 ( 9 days )
Infection period d2 ( 7 days )
1sts29th
20th 27th
• Total count : c = 7• Unique sensors ( visit ) : k = 3
• Infection duration per round: d1 = 9 、 d2 = 7 [days / rounds]
visit k silent period t
1. Random sampling
• Sample data– An average d and r were evaluated.
• Steps1. As random sampling, 100 source addresses were chosen
out of a subset K6 in which visit is k=6.
2. Sample data was analyzed manually
?
Whole data
K6
random
samplingHuman operator
0
20000
40000
60000
80000
100000
0 2 4 6 8 10 12
30
60
90
120
Un
iqu
e IP
ad
dre
sse
s
Number of visits k
Unique addressesCounts
Relationship Between k and c
A lot of maliciou
s hosts.A lot of counts
1. Statistics for K6 ( 100 )
rounds r[round/host]
count c[packets/round]
visit k[sensors/round]
duration d
[days/round]
average 1.49 8.72 4.36 24.6standard deviation 0.81 11.57 1.99 40.8
2. Constant Threshold Segmentation
A
• Partition activity period by a common threshold T
d2
T Tt td1
Tt
0
10
20
30
40
50
60
70
80
90
0 50 100 150 200 250 300 350 400 0
1
2
3
4
5
6
7
Ave
rag
e d
ura
tio
n p
er
ro
un
d d
[d
ay]
Threshold t [day]
durationround
2. Infected Duration ( |K6|= 1,586 )
T=30
μr=1.67
μd=18.6
d1 d2 d3 d4 d5 d6 d7
d2 d3
3. Drawback of Constant Threshold Segmentation
d1 d2
A
B
TTT
• No common threshold T
T
good
fails
d1 d2 d3T
d6 d7d3
3. Adaptive Threshold Segmentation
d1
d2 d3d1
A
B
d4 d5d2 d2 d3d1
TA*TA*
TB*
TB*
TB*
TB*
TB*
TB*
• T* depends on malicious hosts (source address)
TA: Too short
TB: Too large
Poisson Distribution• Examples include
– The number of cars through an intersection– The number of e-mail received in a day
• The probability that k packets arrive when packets arrive on an average.
• Average arrival ratio!
)(k
ekNP
k
0d
c c : Number of total counts per year
d0: duration for one year.
0
50
100
150
200
0 100 200 300 400 500
# o
f p
acke
ts
Inter-arrival time [min]
3. Distribution of Inter-arrival Time
TeTtP )(
32%
120 187
1%
)01.0ln(*T
An approximation
Actual distribution
3. Distribution of Packet Arrival Ratio
0
200
400
600
800
1000
0 2 4 6 8 10
Fre
qu
en
cy
Packets arrival rate [packet/day]
group1
group2
0
100
200
300
400
500
600
0 50 100 150 200 250 300 350 400
Fre
qu
en
cy
duration d [day]
T*T=30
Distribution of Rounds
Common threshold
Adaptive threshold
80
Distribution of Source Address
0
100
200
300
400
500
600
700
800
900
1000
1 2 3 4 5 6 7 8 9
Fre
qu
en
cy
# of round per address
T* T=30
Summary
round r infection period d count c visit k
μr μd μc μk
(1)
Sampling and manual
segmentation
1.49 24.6 8.72 4.36
(2)
Constant Threshold1.67 18.2 9.15 3.13
(3)
Adaptive Threshold1.57 32.3 9.75 4.32
Conclusions• We have proposed a new segmentation
algorithm using Adaptive Threshold based on the Poisson Distribution
• Our experiment shows:– Average duration of an infection is 32 days a
year– Average hosts has infections 1.5 times a year
How long do you usually have a cold?
(3) A problem to be settled with
the adaptive threshold1. Uncertainly in identifying boundaries between rounds.
– Many behavior by worm– Subjective difference
2. Too many packets
1
10
100
1000
0 50 100 150 200
Fre
qu
en
cy
IP address /8
( 1 ) Distribution of IP address(k6 )
0
50
100
150
200
250
300
350
400
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
To
tal u
niq
ue
ho
st
Number of port
(1) Used frequency of port by (K6)
Port 1433
Port4899
Port 137
(2) Difference of average observation period by port
( K6:1,586 )
0
20
40
60
80
100
120
140
160
180
200
0 20 40 60 80 100 120 140 160
Fre
qu
en
cy
duration d[day]
489914331434
135445
T*=30
0
100
200
300
400
500
600
700
800
0 10 20 30 40 50 60
Un
iqu
e IP
ad
dre
sse
s
Packets per year c
(3) Distribution of scan c
20
0
200,000
400,000
600,000
800,000
1,000,000
1,200,000
1,400,000
1,600,000
1,800,000
2,000,000S
030
S03
3
S03
6
S03
9
S04
2
S04
5
S04
8
S05
1
S05
4
S05
7
S06
0
S06
3
S06
6
S06
9
S07
2
S07
5
S07
8
S08
1
S08
4
S08
7
S09
0
S09
3
S09
6
S09
9
S10
2
S10
5
S10
8
S11
1
S11
4
S11
7
S12
0
n下限n上限n平均
Infection period 43 dayInfection period 63 day
Infection period 86 day
(4) Distribution by fittingE
stim
otie
d va
lue
of u
niqu
e ho
st a
ddre
sses
Duration for fitting [day]