19
WELCOME TO ECOMIT 2013 SRILANKA

A survey on web usage mining techniques

Embed Size (px)

Citation preview

WELCOME TO ECOMIT 2013SRILANKA

A SURVEY ON WEB USAGE MINING TECHNIQUES

Mr. Abdul Rahaman wahab saitLecturer, Shaqra University,

Kingdom of Saudi Arabia(Research Scholar, Alagappa University,India)

[email protected]

Dr. MeyappanProfessor,

Dept. of Computer Science and Eng., Alagappa University, India

[email protected]

INTRODUCTION

• Internet becomes a popular media for the business people.

• Today millions of domains are exist in the internet. This result clearly shows the growth of the internet.

• web is also a database, it is a distributed database, data are hidden, and it means data are stored in deep web

INTRODUCTION

Figure 1 - Country wise Domain Registration (only for 16 Countries)

INTRODUCTION

• The Figure shows the graphical representation of first 16 countries having more domains in

the world.• The data has been taken from the site

www.webhosting.info as on date 25.2.2013.• E-business is the electronic version of business

which runs on internet. Buying and selling activities are happening through internet

INTRODUCTION

• Competition is the constant word in the world, which gives new idea for a person to present himself as unique.

• In this scenario, E-business needs the help of data mining techniques to promote the buying and selling activities.

• Web mining techniques are very useful to determine the customer behavior from the huge pool of web data.

INTRODUCTION

• web usage mining (WUM)WUM is used to discover interest patterns which

can be applied to many real world problems like improving the presentation of website, better understanding of the users behavior and product recommendation.

INTRODUCTION

•THREE MAJOR CATEGORIES OF PATTERN EXTRACTION:-ASSOCIATION RULESCLUSTERINGCLASSIFICATION•In this survey we have presented WUM techniques implemented using clustering and classification.

Role of WUM

• WUM helps to determine frequent access behavior of the users, needed links can be identified to improve the overall performance of future access.

• It provides detailed feedback on user behavior providing the website designer information on which to base redesign decisions.

• It can be used to do statistical research about the users / customers for the site.

Role of WUM

• It can be used for performance evaluation for the company / organization.

• It is usually an automated process whereby web server collect and report user access patterns in server access logs.

An Illustration of Pattern Extraction

• The above figure illustrates the method of WUM. • Data collected from web log data are pre

processed and if it is necessary it will be transformed into correct form and given as an input to the pattern extraction Methods.

Web Logs

• Web log are plain text files which records activity of the user. The above figure shows the types of web log.

• Web server – The common place to store the usage data and the primary source of data for web usage mining that are collected when users access web pages.

Web Logs

• Web proxy server – Primarily used for security purpose. A Web proxy acts as an intermediate level of caching between client browsers and Web servers.

• Client Log - When user surfs the website, some data will be stored about the activity within the client for the future use of the client or by the web server.

SURVEY - WUM Algorithms

• Clustering is the process of grouping observations of similar kinds into smaller groups within the larger population.

• Clustering is the subject of dynamic research in different fields such as statistics, pattern recognition and machine learning.

SURVEY - WUM Algorithms• Clustering is unsupervised learning of a hidden data concept.• The following table shows the research based on the

clustering algorithm.

Research based on Clustering Approach

SURVEY - WUM Algorithms

• Classification is one of most important data mining function that assigns items in a collection to target categories or classes.

• Classification is a supervised learning method and processing and evaluation will depend upon the target.

• The following table shows the research based on classification algorithms

SURVEY - WUM Algorithms

Research based on Classification Approach

CONCLUSION

• Clustering and Classification techniques are the foundation for making website with intelligence.

• The main goal of this paper is to study the existing techniques implemented with clustering and classification algorithms.

• The further study will lead to do a new automated web usage mining technique for web page classification and pattern extraction.

THANK YOU