Author
others
View
0
Download
0
Embed Size (px)
UNIVERSITI PUTRA MALAYSIA
DEVELOPMENT OF A METHOD FOR FRAUD SEVERITY MEASUREMENT BASED ON USAGE PROFILING
MOHD SHAFRI BIN KAMARUDDIN
FSKTM 2006 14
DEVELOPMENT OF A METHOD FOR FRAUD SEVERITY MEASUREMENT BASED ON USAGE PROFILING
By
MOHD SHAFRI BIN KAMARUDDIN
Thesis Submitted to the School of Graduate Studies, Universiti Putra Malaysia, in Fulfilment of the Requirement for the Degree of Master of Science
July 2006
ii
DEDICATION
Dedicated to my mother; Roosnia Samsodin,
my wife; Rashidah Abidin,
my daughters; Aina Shahidah, Aina Shahirah, Aina Shafiqah
and my family
iii
ABSTRACT Abstract of thesis presented to the Senate of Universiti Putra Malaysia in
fulfilment of the requirement for the degree of Master of Science
DEVELOPMENT OF A METHOD FOR FRAUD SEVERITY MEASUREMENT BASED ON USAGE PROFILING
By
MOHD SHAFRI BIN KAMARUDDIN
July 2006
Chairman : Associate Professor Ramlan Mahmod, PhD
Faculty : Computer Science and Information Technology
The nature of fraud has changed from cloning fraud to subscription fraud, which makes
specialized detection methodologies inadequate. Instead, the focus is on the detection
methodologies that based on the subscriber’s calling activity or calling pattern, which
can be roughly divided into two main categories: absolute analysis and differential
analysis.
Absolute analysis is capable at detecting the extremes of fraudulent activity. However,
absolute analysis cannot trap all types of fraud especially usage behavior fraud related.
An alternative approach to this problem is to perform a differential analysis against
subscriber’s behavioral patterns. Certain behavioral patterns may be considered
anomalous or abnormal for certain subscriber and potentially indicative of fraud but
would be considered acceptable for another. In order to overcome the uncertainty in
iv
behavioral patterns, in this research, we propose to conduct the usage profiling at
individual subscriber level. Usage profiling is a process of generating calling statistic
based on predefined categories, which involve some form of aggregation from
subscriber’s calling activity or CDR.
Usage profiling process will generate two forms of usage profile : usage profile history
(UPH) and current usage profile (CUP). In fraud detection system, comparison of these
two types of usage profile will generate a measure known as fraud severity
measurement. Implementation of the Hellinger distance for measuring a fraud severity,
lack of detection accuracy as this method does not properly define the measurement
scale as the Hellinger distance method will generate variation of values for fraud severity
measurement. Therefore, it is very difficult to define the actual severity level of detected
fraud.
In this research, we propose a new method for measuring fraud severity. The advantages
of the method are detection accuracy and detection speed. With the new method, the
severity measurement scale is properly defined and the detection speed is faster than the
Hellinger distance.
v
ABSTRAK Abstrak tesis yang dikemukakan kepada Senat Universiti Putra Malaysia
sebagai memenuhi keperluan untuk ijazah Master Sains
PEMBANGUNAN KAEDAH BAGI PENGUKURAN TAHAP KEPARAHAN FRAUD MENGGUNAKAN USAGE PROFILING
Oleh
MOHD SHAFRI BIN KAMARUDDIN
Julai 2006
Pengerusi : Profesor Madya Ramlan Mahmod, PhD
Fakulti : Sains Komputer dan Teknologi Maklumat
Keadaan fraud telah berubah dari fraud pengklonan kepada fraud langganan, yang
menyebabkan kaedah pengesanan yang terhad tidak lagi sesuai digunakan. Malahan,
fokus kaedah pengesanan adalah tertumpu kepada kaedah pengesanan berdasarkan
kepada aktiviti panggilan atau ragam panggilan pelanggan yang boleh dibahagikan
kepada dua kaedah utama: “absolute analysis” atau “differential analysis”.
“Absolute analysis” berupaya untuk mengesan aktiviti fraud yang keterlaluan.
Bagaimanapun, “absolute analysis” tidak berupaya untuk mengesan kesemua jenis fraud
terutama sekali fraud yang berkaitan dengan ragam penggunaan. Penggunaan kaedah
“differential analysis” adalah dilihat sebagai kaedah alternatif kepada masalah ini.
Sebahagian daripada corak penggunaan mungkin dianggap ganjil dan berkemungkinan
menyebabkan kes fraud bagi sebahagian pelanggan, tetapi tidak bagi sesetengah yang
vi
lain. Bagi mengatasi masalah ketidaktentuan corak penggunaan ini, di dalam
penyelidikan ini, telah dicadangkan perlaksanaan kaedah “usage profiling” di peringkat
setiap individu pelanggan. “Usage profiling” adalah satu kaedah pengumpulan statistik
panggilan mengikut kategori yang telah ditetapkan berdasarkan kepada aktivti panggilan
pelanggan atau CDR.
“Usage profiling” akan menghasilkan dua jenis profail: profail penggunaan lepas dan
profail penggunaan terkini. Di dalam sistem pengesanan fraud, perbandingan kedua-dua
profail ini akan menghasilkan satu tahap pengukuran yang boleh menunjukkan tahap
keparahan sesuatu kes fraud tersebut. Perlaksanaan pengukuran keparahan kes fraud
menggunakan kaedah “Hellinger distance” mempunyai sedikit kelemahan dari aspek
ketepatan tahap keparahan sesuatu kes fraud itu, oleh kerana kaedah ini tidak
menentukan dengan baik skala sebenar tahap pengukuran yang digunakan
memandangkan ia menghasilkan nilai pengukuran keparahan fraud yang pelbagai. Ini
menyebabkab kesukaran untuk menentukan tahap keparahan sebenar kes fraud yang
dikesan.
Satu kaedah baru telah dicadangkan bagi mengukur tahap keparahan kes fraud di dalam
penyelidikan ini. Kelebihan kaedah baru ini adalah dari aspek ketepatan pengesanan dan
kepantasan pengesanan. Kaedah yang dicadangkan ini dapat menetapkan skala
pengukuran keparahan dengan baik disamping kepantasan pengesanan yang lebih baik
daripada kaedah “Hellinger distance” .
vii
ACKNOWLEDGEMENTS
Assalamualaikum.
Alhamdulillah, all praise to ALLAH S.W.T for giving me strength, patience and
motivation to complete this thesis as a fulfillment for the degree of Master of Science.
I would like to express my deepest appreciation and gratitude to the research supervisory
committee, lead by Prof. Madya Dr. Ramlan Mahmod, Prof. Madya Dr. Hj. Md. Nasir
Sulaiman and En. Mohd Taufik Abdullah for their guidance, support, encouragement and
valuable advice throughout my study in this faculty.
My deepest thanks to my mother, wife, daughters and family, for their support and
understanding during my study. I also like to thank all staffs at the Faculty of Computer
Science and Information Technology, UPM for their administrative support.
Finally, I would like to extend my gratitude to management of TM Research And
Development Sdn. Bhd. (TMR&D) for sponsoring and giving me opportunity to further
study in UPM.
Thank you all and may ALLAH S.W.T bless all individuals for their kindness.
Wassalam.
Mohd Shafri Kamaruddin
July 2006
viii
APPROVAL I certify that an Examination Committee has met on 6th July 2006 to conduct the final examination of Mohd Shafri bin Kamaruddin on his Master of Science thesis entitled "Development of a Method for Fraud Severity Measurement Based on Usage Profiling" in accordance with Universiti Pertanian Malaysia (Higher Degree) Act 1980 and Universiti Pertanian Malaysia (Higher Degree) Regulations 1981. The Committee recommends that the candidate be awarded the relevant degree. Members of the Examination Committee are as follows: Hamidah Ibrahim, PhD Associate Professor Faculty of Computer Science and Information Technology Universiti Putra Malaysia (Chairman) Hj. Mohd Hassan Selamat Associate Professor Faculty of Computer Science and Information Technology Universiti Putra Malaysia (Internal Examiner) Abdul Azim Abd. Azim, PhD Associate Professor Faculty of Computer Science and Information Technology Universiti Putra Malaysia (Internal Examiner) Abdullah Mohd Zin, PhD Professor Faculty of Information Science and Technology Universiti Kebangsaan Malaysia (External Examiner) ________________________________
HASANAH MOHD GHAZALI, PhD Professor/Deputy Dean School of Graduate Studies Universiti Putra Malaysia Date:
ix
This thesis submitted to the Senate of Universiti Putra Malaysia and has been accepted as fulfilment of the requirement for the degree of Master of Science. The members of the Supervisory Committee are as follows: Ramlan Mahmod, PhD Associate Professor Faculty of Computer Science and Information Technology Universiti Putra Malaysia (Chairman) Md. Nasir Sulaiman, PhD Associate Professor Faculty of Computer Science and Information Technology Universiti Putra Malaysia (Member) Mohd Taufik Bin Abdullah Lecturer Faculty of Computer Science and Information Technology Universiti Putra Malaysia (Member) ______________________
AINI IDERIS, PhD Professor/ Dean School of Graduate Studies Universiti Putra Malaysia
Date :
x
DECLARATION
I hereby declare that the thesis is based on my original work except for quotations and citations which have been duly acknowledged. I also declare that it has not previously or concurrently submitted for any other degree at UPM or other institutions.
_________________________________
MOHD SHAFRI BIN KAMARUDDIN
Date :
xi
TABLE OF CONTENTS
Page
DEDICATION ii ABSTRACT iii ABSTRAK v ACKNOWLEDGEMENTS vii APPROVAL viii DECLARATION x LIST OF TABLES xiv LIST OF FIGURES xv LIST OF ABBREVIATIONS xvii CHAPTER 1 INTRODUCTION 1.1
1.1 Introduction 1.1 1.2 Problem Statement 1.7 1.3 Objective 1.10 1.4 Scope 1.11 1.5 Methodology 1.13 1.6 Contribution 1.15 1.7 Organization of Thesis 1.16
2 LITERATURE REVIEW 2.1
2.1 Introduction 2.1 2.2 Fraud in Credit Card 2.1 2.3 Computer Intrusion 2.4 2.4 Fraud in Telecommunication 2.9
2.4.1 Types of fraud in all voice network 2.11 2.4.2 Additional types of fraud in fixed networks. 2.15 2.4.3 Additional types of fraud in mobile networks 2.23 2.4.4 The motivation to commit fraud 2.29 2.4.5 Advancement of Fraud Detection Techniques 2.31
2.5 Fraud Severity Measurement 2.37 2.5.1 Neural Network 2.38 2.5.2 Visualization 2.39 2.5.3 Rule Based 2.40 2.5.4 Model Based Reasoning 2.41 2.5.5 Statistical Approach 2.42
2.6 Summary 2.46
xii
3 THE PROPOSED METHOD 3.1 3.1 Introduction 3.1 3.2 A Proposed Method 3.1 3.3 Abnormal Curve Analysis 3.4 3.4 Summary 3.10
4 TOOL DESIGN AND IMPLEMENTATION 4.1
4.1 Introduction 4.1 4.2 Sample Data Requirement 4.1 4.3 System Architecture 4.3 4.4 Design Process 4.5
4.4.1 Defining the appropriate attribute 4.6 4.4.2 CDR (Call Data Records) data pre-processing and filtering 4.7 4.4.3 Defining attribute-capturing method. 4.8 4.4.4 Designing the table structure. 4.8 4.4.5 Designing Usage Profiling Process And Algorithm 4.9 4.4.6 Detection Process 4.20
4.5 Summary 4.27 5 EXPERIMENTS AND RESULTS 5.1
5.1 Introduction 5.1 5.2 Experimental Process 5.1 5.3 Usage Profiling Performance 5.2 5.4 UPH vs Standard UPH Analysis 5.4 5.5 Fraud Measurement Analysis 5.7
5.5.1 Standard UPH Using Minimum Method vs. CUP 5.7 5.5.2 Standard Usage Profile History Using Maximum
Method vs CUP 5.15 5.5.3 Standard Usage Profile History Using Average
Method vs. CUP 5.22 5.5.4 Summary On Fraud Severity Measurement Comparison 5.29
5.6 Detection Time Analysis 5.30 5.6.1 Hellinger Distance Detection Time vs Method of UPH Generation 5.31 5.6.2 New Method With Abnormal Curve Detection Time vs. Method of UPH Generation 5.33 5.6.3 Summary On Detection Time Comparison 5.35
5.7 Result Discussion 5.37 5.8 Error Analysis 5.39 5.9 Summary 5.39
xiii
6 CONCLUSION AND FUTURE WORK 6.1 6.1 Introduction 6.1 6.2 Conclusion 6.1 6.3 Future Work 6.2
6.3.1 Consideration of other supplementary information. 6.3 6.3.2 Generation of more details usage profile. 6.4 6.3.3 Defining and creating the default usage profile for
new subscriber. 6.5 6.3.4 Investigation on the drop in subscriber activity. 6.5
REFERENCES R.1 APPENDICES A.1 BIODATA OF THE AUTHOR B.1
xiv
LIST OF TABLES
Table Page
1 Attributes Description 3.6
2 Sample of Increase in Fraud Severity measurement’s Calculation 3.6
3 Details Calculation For Sample of Increase in Fraud Severity Measurement 3.7
4 Sample of Decrease in Fraud Severity measurement’s Calculation 3.8
5 Details Calculation For Sample of Decrease in Fraud Severity Measurement 3.9
6 UPH And CUP Attribute Description 4.17
7 Usage Profiling Performance 5.3
8 Sample Data Of Usage Profile History vs. Standard UPH
5.5
9 Sample Data Of Standard UPH Using Minimum Method vs. CUP
5.9
10 Sample Data Of Daily Fraud Measurement (Minimum Method)
5.13
11 Sample Data Of Standard UPH Using Maximum Method vs. CUP 5.17
12 Sample Data Of Daily Fraud Measurement (Maximum Method) 5.20
13 Sample Data Of Standard UPH Using Average Method vs. CUP
5.24
14 Sample Data Of Daily Fraud Measurement (Average Method)
5.27
15 Summary Of Fraud Severity Measurement
5.29
16 Sample Data Of Hellinger Distance Method Detection Time vs. Method of UPH Generation
5.31
17 Sample Data Of The New Method With Abnormal Curve Detection Time vs. Method of UPH Generation
5.33
18 Summary Of Detection Time
5.35
19 Fraud Severity Measurement And Detection Time Summary 5.37
xv
LIST OF FIGURES
Figure Page
1 System Architecture 4.5
2 General Usage Profiling Process 4.12
3 General Usage Profiling Algorithm 4.12
4 Detail Usage Profiling Process 4.14
5 Usage Profiling Algorithm 4.15
6 Standard UPH Generation Process Flow 4.19
7 Standard UPH Generation Algorithm 4.20
8 Detection Process Flow 4.24
9 Detection Algorithm 4.24
10 Detail Fraud Severity Measurement Process 4.26
11 Fraud Severity Measurement Algorithm 4.27
12 Comparison of Total Processing Time and Total CDR Based On Profiling Day
5.3
13 Usage Profile Analysis 5.6
14 Standard UPH Generation Method 5.6
15 Hellinger Distance Calculation (Minimum Method) 5.11
16 Measurement Comparison Between NM And NMAC (Minimum Method)
5.11
17 Daily Alarm Analysis Based On HD Method (Minimum Method) 5.14
18 Daily Alarm Analysis Based On NMAC (Minimum Method) 5.14
19 Hellinger Distance Calculation (Maximum Method) 5.18
20 Measurement Comparison Between NM And NMAC (Maximum Method)
5.18
xvi
21 Daily Alarm Analysis Based On HD Method (Maximum Method) 5.21
22 Daily Alarm Analysis Based On NMAC (Maximum Method) 5.21
23 Hellinger Distance Calculation (Average Method) 5.25
24 Measurement Comparison Between NM And NMAC (Average Method)
5.25
25 Daily Alarm Analysis Based On HD Method (Average Method) 5.28
26 Daily Alarm Analysis Based On NMAC (Average Method) 5.28
27 Hellinger Distance Method Of Detection Time vs. Method of UPH Generation
5.32
28 New Method With Abnormal Curve Of Detection Time vs. Method of UPH Generation
5.34
xvii
LIST OF ABBREVIATIONS
CDR Call Details Record
UPH Usage Profile History
CUP Current Usage Profile
CDMA Code Division Multiple Access
TELCO Telecommunication Company
MSC Mobile Subscriber Center
ITU International Telecommunication Union
1.1
CHAPTER 1
1INTRODUCTION
1.1 Introduction
Fraud in telecommunication can be defined as any dishonest or illegal use of services
where the intention of the sender is to avoid or reduce legitimate call charges (Johnson,
1996). Other definition of fraud is an attempt to obtain or gain illegitimate access to the
network in order to enjoy unbillable services and undeserved fees (Davis and Goyal,
1993 ). The term “fraud” has a particular meaning in legal perspective; however the term
is used broadly to mean misuse, dishonest intention or improper conduct without
implying any legal consequences.
Historically, earlier types of fraud used technological means to acquire free access.
Cloning of mobile phones by creating copies of mobile terminals with identification
numbers from legitimate subscribers was used as a means of gaining free access (Davis
and Goyal 1993). In the era of analog mobile terminals, identification numbers could be
easily captured by eavesdropping with suitable receiver equipment in public places,
where mobile phones were evidently used.
One specific type of fraud, tumbling, is quite prevalent in the United States (Davis and
Goyal, 1993). It exploits deficiencies in the validation of subscriber identity when a
mobile phone subscription is used outside of the subscriber’s home area. The fraudster
keep tumbling (switching between) captured identification numbers to gain access.
1.2
Davis and Goyal (1993) state that the tumbling and cloning fraud have been serious
threats to operators’ revenues.
The first fraud detection systems examine whether two instances of one subscription are
used at the same time (overlapping calls detection mechanism) or at locations far apart in
temporal proximity (velocity trap). Both the overlapping calls or calls collision (Patel ,
1997), and the velocity trap try to detect the existence of two mobile phones with
identical identification codes, which clearly evidencing cloning. As a countermeasure to
these fraud types, technological improvements are introduced together with
implementation of fraud detection system.
Fraud detection system can be considered as a basic tool to detect fraudulent activity
where its implementation may reveal fraudulent activity in telecommunication network.
This tool is very beneficial to network operator, who may lose some of their revenue due
to fraudulent activity as service fees or charges are uncollected.
However, new forms of fraud come into existence. A few years later, Johnson (1996)
and O’Shea (1997) reported that the so-called subscription fraud to be the trendiest and
the fastest-growing type of fraud. In similar spirit, Hoath (1998) characterized
subscription fraud as being probably the most significant and prevalent worldwide
telecommunications fraud type. In subscription fraud, a fraudster obtains a subscription
(possibly with false identification) and starts a fraudulent activity with no intention to
pay the bill. It is indeed non-technical in nature and by call selling, the entrepreneur-
1.3
minded fraudster can generate significant revenues for a minimal investment in a very
short period of time (Johnson, 1996).
As a countermeasure to subscription fraud, Barson et. al. (1996) conduct experiments for
detecting fraud using simulated calls, which consists of six different user types.
Supervised feed forward neural network was implemented to detect anomalous or
deviations in user calling activity. Simulated calls were extracted into two types of
features: one set describing the recent use and another set describing the long-term
behavior. Both set are accumulated statistics of call data over a different length of time
windows.
Burge and Shawe-Taylor, (1996) used the same concept of recent use and long term
behavior. They reported that fraudulent activity can be easily monitored through the
analysis on user behavior. User behaviors are reflected in calling detail or CDR, which
consists of information about the call. Differential analysis and absolute analysis can be
used to detect fraudulent activity. Burge and Shawe-Taylor,. (1997) implemented
unsupervised learning techniques in computing user behavior profiles over sequences of
call records. Hellinger distance method used to measure the changes between user
behavior profiles as alarm indicator.
User behavior profile can be classified into three different types of categories: usage
indicators, mobility indicators and deductive indicators (Burge, et. al 1997). Usage
indicators will show how the service is used. Mobility indicators refer to user’s mobility
while using service, and deductive indicators reflect by-product of fraudulent behavior.
1.4
Mobility and deductive indicators are capable to trap call velocity and call overlapping,
which are commonly related to phone cloning.
Moerau and Vandewalle, (1997) and Taniguchi, et. al (1998) used feed forward neural
network with supervised learning to classify the subscribers between fraud and non fraud
using call summary statistic. They compared subscriber’s past behavior and current
behavior to detect any abnormalities from the past behavior.
Combination of two unsupervised neural network based on user profile, was another
work by Moerau, et. al (1999). They attempt to monitor user profile based on A-number
analysis and B-number analysis. Monitoring a few indicators may not reflect the real
fraud scenario. Burge and Shawe-Taylor, (2001) implemented recurrent neural network
techniques by comparing past behavior and current behavior for fraud detection.
Comparison between both of them using Hellinger distance method, show certain
measurement for triggering alarm.
Bourkeche and Notare, (2002) also used behavior profile information by comparing
recent information and past information of mobile phone’s user, which they extracted
from usage logs. Radial basis function neural network (RBF NN) was used due to its
simplicity and flexibility to adapt to pattern changes. RBF NN is widely used for solving
classification and pattern recognition problem.
Most techniques applied in fraud detection system use calling activity to create behavior
profile for subscriber and try to detect deviations from these profiles (Yufeng, et. al,
1.5
2004). Two main approaches normally implemented in fraud detection system:
differential analysis and absolute analysis. Combination of these approaches capable to
verify certain rule against component of data set in calling activity. Flexible criteria can
be developed to detect any usage change based on user behavior history.
From the above explanation, it is evident that the detection mechanisms of the first
generation soon became inadequate. In the last few years, most of the works in fraud
detection system are based on usage profile, subscriber behavior or calling pattern. The
more advanced detection mechanisms must be based on individual subscriber behavior
as different subscriber may generate different calling pattern. Some of the calling pattern
may considered normal to certain subscriber but not to others.
As a complementary to detection techniques in fraud detection system, usage profiling
efforts in understanding the subscriber behavior or calling pattern in telecommunication
network may be required to improve operation performance in detecting fraudulent
activity. With an additional method of measuring the severity of fraudulent activity, the
fraud detection system even useful and important to network operator. Using usage
profiling method, individual subscriber behavior can be studied and examined through
interrogation of sequence call details record (CDR).
Even though, we are able to study the calling pattern up to individual level, yet there is
no specific sequence of CDR would be guaranteed as 100% fraudulent. Therefore, with
an additional method of measuring the fraudulent activity, network operator can properly
1.6
manage available resources they have to respond to the fraudulent subscriber based on
the fraud severity level generated by the measurement method.
In this study, two main approaches are used which are differential analysis and absolute
analysis. Differential analysis approach is used to detect changes between subscriber’s
calling behavior history and subscriber’s recent calling behavior history, which may
indicate fraudulent activity. Subscriber’s calling behavior history or also known as
Usage Profile History (UPH) and subscriber’s recent calling history or also known as
Current Usage Profile (CUP) are two types of profile generated by usage profiling.
Mean while, the absolute analysis is used as the mechanism to detect the fraudulent
activity by comparing the individual standard UPH and CUP, where the standard UPH
values are considered as threshold value for that particular behavior attribute.
This study also conducts analysis on three type of standard UPH generation methods:
minimum method, maximum method and average method. Standard UPH generation
method refers to the method to create a standard UPH for every subscriber based on a
series of subscriber’s UPH.
1.7
1.2 Problem Statement
In fraud detection system, it is very important to define the performance metric carefully.
Several detection techniques use metric like detection rate, false alarm rate and average
time of detection (Yufeng, et. al, 2004). The typical fraud detection techniques try to
maximize accuracy rate and minimize false alarm rate.
Accuracy of fraud severity measurement is one of the important aspects in fraud
detection system. Fraud severity measurement will be the main indicator for fraud
analyst to properly handle fraud cases. It will also allow the fraud analyst to prioritize
the fraud cases for investigation as the priority of the fraud cases play important role in
fraud detection.
Implementations of Hellinger distance for measuring a fraud severity have been carried
out in some previous works. (Fawcett and Provost, 1997), (Taniguchi, et. al., 1998)
(Burge and Shawe-Tylor, 2001)
The Hellinger distance is defined as following (Lachaud ,2005),( Guha, McGregor and
Suresh, 2005 ), (Poland and Hutter, 2005):
d = ( )21
ii
k
iHC −∑
=
, (1.0)
where C and H are the UPH and CUP, respectively, and K is the number of entries or
attributes in the profile record.