View
218
Download
0
Category
Tags:
Preview:
Citation preview
Towards Modeling Legitimate and Unsolicited Email Traffic Using Social Network Properties 1
Towards Modeling Legitimate and UnsolicitedEmail Traffic Using Social Network Properties
Farnaz Moradi,Tomas Olovsson, Philippas Tsigas
Towards Modeling Legitimate and Unsolicited Email Traffic Using Social Network Properties 2
Legitimate and Unsolicited Email Traffic
The battle between spammers and anti-spam strategies is not over yet.
2 4 6 8 10 12 140
1
2
3
4x 106
Days
Num
ber
of E
mai
ls
All emailLegitimate email
Towards Modeling Legitimate and Unsolicited Email Traffic Using Social Network Properties 3
• Human-generated communications create implicit social networks
• Spam is sent automatically– It is expected that it does not exhibit the social network
properties of human-generated communications
• Spam can be identified based on how it is sent – It is expected that this behavior is more difficult for the
spammers to change than the content of the email
Legitimate and Unsolicited Email Communications
Towards Modeling Legitimate and Unsolicited Email Traffic Using Social Network Properties 4
Outline
• Email Dataset• Email Networks• Social Network Properties• Implication• Conclusions
Towards Modeling Legitimate and Unsolicited Email Traffic Using Social Network Properties 5
Email Dataset• SMTP packets were collected (port 25)• Packets were aggregated into TCP flows • Emails were re-constructed from flows• Emails were classified into Accepted and
Rejected by receiving mail servers • Accepted emails classified into
Ham and Spam using a well-trained SpamAssassin
• Automatic anonymization of email addresses extracted
from SMTP headers and removal of packet content
SUNET Customers
Main Internet
OptoSUNET Core Network
Access Routers
2 Core Routers
40 Gb/s 10 Gb/s (x2)
NORDUnet
Packets
Flows
Spam
Ham
Rejected
Emails
Accepted
797 M
46.8 M
20 M
16.6 M
3.4 M
1.5 M 1.9 M
Towards Modeling Legitimate and Unsolicited Email Traffic Using Social Network Properties 6
Email Networks
• Implicit social networks:– Nodes (V): Email addresses– Edges (E): Transmitted Emails
• Dataset A: – |V| = 10,544,647– |E| = 21,562,306
• Dataset B: – |V| = 4,525,687 – |E| = 8,709,216
Towards Modeling Legitimate and Unsolicited Email Traffic Using Social Network Properties 7
Structural and Temporal Properties of Email Networks
• Do email networks exhibit similar structural and temporal properties to other Social Networks?
– Scale free (power law degree distribution)– Small world (short path length & high clustering)– Connected components (giant core)
Towards Modeling Legitimate and Unsolicited Email Traffic Using Social Network Properties 8
Scale-Free Networks• Power law degree distribution
100
102
104
101
103
105
100
10-2
10-4
10-6
10-8
Degree
Fre
quen
cy
In-degreeOut-degree
100
105
101
102
103
104
10-6
10-4
10-2
100
Degree
Fre
quen
cy
In-degreeOut-degree
100
105
101
102
103
104
10-6
10-4
10-2
100
Degree
Fre
quen
cy
In-degreeOut-degree
100
102
104
101
103
105
100
10-2
10-4
10-6
Degree
Fre
quen
cy
In-degreeOut-degree
Ham
SpamRejected
Complete
Dat
aset
A
Towards Modeling Legitimate and Unsolicited Email Traffic Using Social Network Properties 9
Scale-Free Networks• Power law degree distribution
100
102
104
106
101
103
105
100
10-2
10-4
10-6
Degree
Fre
quen
cy
In-degreeOut-degree
100
105
101
102
103
104
100
10-2
10-4
10-6
Degree
Fre
quen
cy
In-degreeOut-degree
100
102
104
106
101
103
105
100
10-2
10-4
10-6
Degree
Fre
quen
cy
In-degreeOut-degree
100
105
101
102
103
104
100
10-2
10-4
10-6
Degree
Fre
quen
cy
In-degreeOut-degree
Ham
SpamRejected
Complete
Dat
aset
B
Towards Modeling Legitimate and Unsolicited Email Traffic Using Social Network Properties 10
Small-World Networks• Small average shortest path length• High average clustering coefficient
2 4 6 8 10 12 140
0.005
0.01
0.015
Days
Ave
rage
clu
ster
ing
coef
ficie
nt
HamSpam
2 4 6 8 10 12 140
0.005
0.01
0.015
Days
Ave
rage
clu
ster
ing
coef
ficie
nt
HamSpamD
atas
et A
Dataset B
1 4 7 10 146
7
8
9
10
11
12
Days
Ave
rage
sho
rtest
pat
h le
ngth
HamSpam
2 4 6 8 10 12 146
7
8
9
10
Days
Ave
rage
sho
rtest
pat
h le
ngth
HamSpam
Towards Modeling Legitimate and Unsolicited Email Traffic Using Social Network Properties 11
Connected Components• Giant connected component• Power law component size distribution
100
102
104
106
101
103
10510
-6
10-4
10-2
100
CC size
Fre
quen
cy
HamSpam
100
102
104
106
101
103
105
100
10-2
10-4
10-6
CC size
Fre
quen
cy
HamSpam
2 4 6 8 10 12 140.4
0.5
0.6
0.7
0.8
0.9
1
Days
Rel
ativ
e G
CC
siz
e
HamSpam
2 4 6 8 10 12 140.2
0.4
0.6
0.8
1
Days
Rel
ativ
e G
CC
siz
e
HamSpam
Dat
aset
AD
ataset B
Towards Modeling Legitimate and Unsolicited Email Traffic Using Social Network Properties 12
100
101
102
103
10410
-6
10-4
10-2
100
Out-Degree
Fre
quen
cy
100
101
102
103
10410
-6
10-4
10-2
100
Out-Degree
Fre
quen
cy
Out-degree distributionOutliers
Implications• Spam does not exhibit the social network properties of
human-generated communications• The unsolicited email traffic causes anomalies in the
structural properties of email networks• These anomalies can be identified by using an outlier
detection mechanism
Complete
Towards Modeling Legitimate and Unsolicited Email Traffic Using Social Network Properties 13
100
101
102
103
10410
-6
10-4
10-2
100
Out-Degree
Fre
quen
cy
Out-degree distributionOutliers
100
105
101
102
103
10410
-6
10-4
10-2
100
Out-Degree
Fre
quen
cy
Out-degree distributionOutliers
Identifying Spamming Nodes
Dataset NetworkTotal spam
Spam sent by outliers (1<k<100)
1 day 68% 95.5%A 7 days 70% 96.8%
14 days 70% 96.9%1 day 40% 82.7%
B 7 days 35% 81.3%14 days 39% 87.3%
1 day 7 days
Dat
aset
A
Towards Modeling Legitimate and Unsolicited Email Traffic Using Social Network Properties 14
Conclusions
• A network of legitimate email traffic can be modeled similar to other social networks– Small-world, scale-free network
• A network of unsolicited traffic differs from social networks– Spammers do not emulate a social network
• This unsocial behavior of spam is not hidden in the mixture of email traffic– Spammers can be identified without inspecting the content of the emails
Thank
You!
Recommended