31
WEB USAGE MINING Web Usage Mining 1

WEB USAGE MINING Web Usage Mining 1. Contents Web Usage Mining 2 Web Mining Web Mining Taxonomy Web Usage Mining Web analysis tools Pattern

Embed Size (px)

DESCRIPTION

Web Mining Web Usage Mining 3  Web mining - data mining techniques to automatically discover and extract information from Web documents/services.  Web mining research – it integrate information from several research communities such as:  Database (DB)  Information retrieval (IR)  The sub-areas of machine learning (ML)  Natural language processing (NLP)

Citation preview

Page 1: WEB USAGE MINING Web Usage Mining 1. Contents Web Usage Mining 2  Web Mining  Web Mining Taxonomy  Web Usage Mining  Web analysis tools  Pattern

WEB USAGE MININGWeb Usage Mining 1

Page 2: WEB USAGE MINING Web Usage Mining 1. Contents Web Usage Mining 2  Web Mining  Web Mining Taxonomy  Web Usage Mining  Web analysis tools  Pattern

Contents

Web Usage Mining

2

Web Mining Web Mining Taxonomy Web Usage Mining Web analysis tools Pattern Discovery Tools & it’s different stages Pattern Analysis Tools & techniques employed Web usage Mining Process Web usage Mining Architecture Research Directions Conclusion References

Page 3: WEB USAGE MINING Web Usage Mining 1. Contents Web Usage Mining 2  Web Mining  Web Mining Taxonomy  Web Usage Mining  Web analysis tools  Pattern

Web Mining

Web Usage Mining

3

Web mining - data mining techniques to automatically discover and extract information from Web documents/services .

Web mining research – it integrate information from several research communities such as:

Database (DB)

Information retrieval (IR)

The sub-areas of machine learning (ML) Natural language processing (NLP)

Page 4: WEB USAGE MINING Web Usage Mining 1. Contents Web Usage Mining 2  Web Mining  Web Mining Taxonomy  Web Usage Mining  Web analysis tools  Pattern

Mining the World-Wide Web

Web Usage Mining

4

WWW is a huge, widely distributed, global information source for : Information services: news,

advertisements, consumer information, financial management, education, government, e-commerce, etc.

Hyper-link information Access and usage information Web Site contents and Organization

Page 5: WEB USAGE MINING Web Usage Mining 1. Contents Web Usage Mining 2  Web Mining  Web Mining Taxonomy  Web Usage Mining  Web analysis tools  Pattern

Challenges on WWW Interactions

Web Usage Mining

5

Finding Relevant Information Creating knowledge from Information available Personalization of the information Learning about customers / individual users

Web Mining can play an important Role!

Page 6: WEB USAGE MINING Web Usage Mining 1. Contents Web Usage Mining 2  Web Mining  Web Mining Taxonomy  Web Usage Mining  Web analysis tools  Pattern

Web Mining Taxonomy

Web Usage Mining

6

Web Mining

Web Content Mining

Web Usage Mining

Web Structure

Mining

Page 7: WEB USAGE MINING Web Usage Mining 1. Contents Web Usage Mining 2  Web Mining  Web Mining Taxonomy  Web Usage Mining  Web analysis tools  Pattern

Web Usage Mining

Web Usage Mining

7

Web usage mining also known as Web log mining mining techniques to discover interesting

usage patterns from the secondary data derived from the interactions of the users while surfing the web

Page 8: WEB USAGE MINING Web Usage Mining 1. Contents Web Usage Mining 2  Web Mining  Web Mining Taxonomy  Web Usage Mining  Web analysis tools  Pattern

Web Usage Mining8

Organizations often generate and collect large volumes of data in their daily operations while interacting with a web site.

Most of this information is usually generated automatically by Web servers and collected in server access logs.

Other sources of user information include referrer logs which contains information about the referring pages for each page reference, and user registration or survey data gathered via tools such as CGI scripts.

Analysis of server access logs and user registration data provide valuable information on how to better structure a Web site in order to create a more effective presence for the organization.

Most of the existing Web analysis tools provide mechanisms for reporting user activity in the servers and various forms of data filtering.

Page 9: WEB USAGE MINING Web Usage Mining 1. Contents Web Usage Mining 2  Web Mining  Web Mining Taxonomy  Web Usage Mining  Web analysis tools  Pattern

Web analysis tools:

Web Usage Mining

9

Using these tool it is possible to determine the number of accesses to the server and the individual files within the organization's Web space, the times or time intervals of visits, and domain names and the URLs of users of the Web server.

Pattern Discovery Tools Pattern Analysis Tools

Page 10: WEB USAGE MINING Web Usage Mining 1. Contents Web Usage Mining 2  Web Mining  Web Mining Taxonomy  Web Usage Mining  Web analysis tools  Pattern

Pattern Discovery Tools

Web Usage Mining

10

The emerging tools for user pattern discovery that use sophisticated techniques from AI, data mining, psychology, and information theory, to mine for knowledge from collected data.

The WEBMINER system introduces a general architecture for Web usage mining. WEBMINER automatically discovers association rules and sequential patterns from server access logs.

Pirolli et. al. use information foraging theory to combine path traversal patterns, Web page typing, and site topology information to categorize pages for easier access by users.

Page 11: WEB USAGE MINING Web Usage Mining 1. Contents Web Usage Mining 2  Web Mining  Web Mining Taxonomy  Web Usage Mining  Web analysis tools  Pattern

Pattern Analysis Tools

Web Usage Mining

11

Once access patterns have been discovered, analysts need the appropriate tools and techniques to understand, visualize, and interpret these patterns. Examples of such tools include ,

WebViz system OLAP techniques such as data cubes for the

purpose of simplifying the analysis of usage statistics from server access logs .

The WEBMINER system proposes an SQL-like query mechanism for querying the discovered knowledge

Page 12: WEB USAGE MINING Web Usage Mining 1. Contents Web Usage Mining 2  Web Mining  Web Mining Taxonomy  Web Usage Mining  Web analysis tools  Pattern

Pattern Discovery from Web Transactions

Web Usage Mining

12

Preprocessing Tasks Data Cleaning Transaction Identification

Discovery Techniques on Web Transactions Path Analysis Association Rules Sequential Patterns Clustering and Classification

Page 13: WEB USAGE MINING Web Usage Mining 1. Contents Web Usage Mining 2  Web Mining  Web Mining Taxonomy  Web Usage Mining  Web analysis tools  Pattern

Preprocessing Tasks

Web Usage Mining

13

Data Cleaning: Techniques to clean a server log to

eliminate irrelevant items. Elimination of irrelevant items can be reasonably accomplished by checking the suffix of the URL name. like, all log entries with filename suffixes such as, gif, jpeg, GIF, JPEG, jpg, JPG, and map can be removed.

Page 14: WEB USAGE MINING Web Usage Mining 1. Contents Web Usage Mining 2  Web Mining  Web Mining Taxonomy  Web Usage Mining  Web analysis tools  Pattern

Transaction Identification

Web Usage Mining

14

Here, sequences of page references are grouped into logical units representing Web transactions or user sessions.

Two types of transactions are defined. navigation-content where each transaction consists of a single content

reference and all of the navigation references in the traversal path leading to the content reference. These transactions can be used to mine for path traversal patterns.

content-only which consists of all of the content references for a given

user session. These transactions can be used to discover associations between the content pages of a site.

Page 15: WEB USAGE MINING Web Usage Mining 1. Contents Web Usage Mining 2  Web Mining  Web Mining Taxonomy  Web Usage Mining  Web analysis tools  Pattern

Discovery Techniques on Web Transactions

Web Usage Mining

15

Path Analysis Here a graph represents the physical layout of a

Web site, with Web pages as nodes and hypertext links between pages as directed edges.

Other graphs could be formed based on the types of Web pages with edges representing similarity between pages, or creating edges that give the number of users that go from one page to another.

Path analysis could be used to determine most frequently visited paths in a Web site.

Page 16: WEB USAGE MINING Web Usage Mining 1. Contents Web Usage Mining 2  Web Mining  Web Mining Taxonomy  Web Usage Mining  Web analysis tools  Pattern

Web Usage Mining16

Other examples of information that can be discovered through path analysis are:

70% of clients who accessed /company/products/file2.html did so by starting at /company and proceeding through /company/whatsnew, /company/products, and /company/products/file1.html;

80% of clients who accessed the site started from /company/products; or

65% of clients left the site after four or less page references.

Page 17: WEB USAGE MINING Web Usage Mining 1. Contents Web Usage Mining 2  Web Mining  Web Mining Taxonomy  Web Usage Mining  Web analysis tools  Pattern

Association Rules

Web Usage Mining

17

This technique is generally applied to databases of transactions where each transaction consists of a set of items.

the problem is to discover all associations and correlations among data items

Each transaction is comprised of a set of URLs accessed by a client in one visit to the server. For example, using association rule discovery techniques we can find correlations such as the following:

40% of clients who accessed the Web page with URL /company/products/product1.html, also accessed /company/products/product2.html; or

30% of clients who accessed /company/announcements/special-offer.html, placed an online order in /company/products/product1.

Page 18: WEB USAGE MINING Web Usage Mining 1. Contents Web Usage Mining 2  Web Mining  Web Mining Taxonomy  Web Usage Mining  Web analysis tools  Pattern

Contd..

Web Usage Mining

18

Usually such transaction databases contain extremely large amounts of data, current association rule discovery techniques try to prune the search space according to support for items under consideration. Support is a measure based on the number of occurrences of user transactions within transaction logs.

Discovery of such rules for organizations engaged in electronic commerce can help in the development of effective marketing strategies.

Page 19: WEB USAGE MINING Web Usage Mining 1. Contents Web Usage Mining 2  Web Mining  Web Mining Taxonomy  Web Usage Mining  Web analysis tools  Pattern

Sequential Patterns

Web Usage Mining

19

The problem of discovering sequential patterns is to find inter-transaction patterns such that the presence of a set of items is followed by another item in the time-stamp ordered transaction set.

By analyzing this information, the Web mining system can determine temporal relationships among data items such as the following:

30% of clients who visited /company/products/, had done a search in Yahoo, within the past week on keyword w; or

60% of clients who placed an online order in /company/products/product1.html, also placed an online order in /company1/products/product4 within 15 days.

Page 20: WEB USAGE MINING Web Usage Mining 1. Contents Web Usage Mining 2  Web Mining  Web Mining Taxonomy  Web Usage Mining  Web analysis tools  Pattern

Clustering and Classification

Web Usage Mining

20

Discovering classification rules allows one to develop a profile of items belonging to a particular group according to their common attributes. This profile can then be used to classify new data items that are added to the database such as the following:

clients from state or government agencies who visit the site tend to be interested in the page /company/products/product1.html; or

50% of clients who placed an online order in /company/products/product2, were in the 20-25 age group and lived on the West Coast.

Clustering analysis allows one to group together clients or data items that have similar characteristics. Clustering of client information or data items on Web transaction logs, can facilitate the development and execution of future marketing strategies, both online and off-line.

Page 21: WEB USAGE MINING Web Usage Mining 1. Contents Web Usage Mining 2  Web Mining  Web Mining Taxonomy  Web Usage Mining  Web analysis tools  Pattern

Analysis of Discovered Patterns

Web Usage Mining

21

Web site administrators are extremely interested in questions like "How are people using the site?", "Which pages are being accessed most frequently?", etc. These questions require the analysis of structure of hyperlinks as well as the contents of the pages. The end products of such analysis might include 1) the frequency of visits per document, 2) most recent visit per document, 3) who is visiting which documents, 4) frequency of use of each hyperlink, and 5) most recent use of each hyperlink.

Visualization Techniques OLAP Techniques Data & Knowledge Querying

Page 22: WEB USAGE MINING Web Usage Mining 1. Contents Web Usage Mining 2  Web Mining  Web Mining Taxonomy  Web Usage Mining  Web analysis tools  Pattern

Visualization Techniques

Web Usage Mining

22

Visualization has been used very successfully in helping people understand various kinds of phenomena, both real and abstract.

The WebViz system is used for visualizing WWW access patterns. WebViz allows the analyst to selectively analyze the portion of the Web that is of interest by filtering out the irrelevant portions. The Web is visualized as a directed graph with cycles, where nodes are pages and edges are (inter-page) hyperlinks.

The visualization is composed of two windows, the WebViz control window and the display window . The first provides the analyst with controls to adjust the bindings, select a specific time to view, control the animation, and rearrange the layout. The second window's arrangement allows a document's access frequency to be represented by the width of the node representing it, while the node's color represents it recency of access.

Page 23: WEB USAGE MINING Web Usage Mining 1. Contents Web Usage Mining 2  Web Mining  Web Mining Taxonomy  Web Usage Mining  Web analysis tools  Pattern

OLAP Techniques

Web Usage Mining

23

On-Line Analytical Processing (OLAP) is emerging as a very powerful paradigm for strategic analysis of databases in business settings.

The key characteristics of strategic analysis include ,very large data volume, explicit support for the temporal dimension, support for various kinds of information aggregation, and long-range analysis. This has led to the development of the data cube information model , and techniques for its efficient implementation .

Web usage data have much in common with those of a data warehouse, and hence OLAP techniques are quite applicable and the issue needs further investigation.

Page 24: WEB USAGE MINING Web Usage Mining 1. Contents Web Usage Mining 2  Web Mining  Web Mining Taxonomy  Web Usage Mining  Web analysis tools  Pattern

Data & Knowledge Querying

Web Usage Mining

24

One of the reasons attributed to the great success of relational database technology has been the existence of a high-level, declarative, query language, which allows an application to express what conditions must be satisfied by the data it needs, rather than having to specify how to get the required data.

The main focus may be provided in at least two ways.

First, constraints may be placed on the database (perhaps in a declarative language)

Second, querying may be performed on the knowledge that has been extracted by the mining process. An SQL-like querying mechanism has been proposed for the WEBMINER system.

Page 25: WEB USAGE MINING Web Usage Mining 1. Contents Web Usage Mining 2  Web Mining  Web Mining Taxonomy  Web Usage Mining  Web analysis tools  Pattern

Web usage Mining Process

Web Usage Mining

25

Page 26: WEB USAGE MINING Web Usage Mining 1. Contents Web Usage Mining 2  Web Mining  Web Mining Taxonomy  Web Usage Mining  Web analysis tools  Pattern

Web Usage Mining Architecture

Web Usage Mining

26

Page 27: WEB USAGE MINING Web Usage Mining 1. Contents Web Usage Mining 2  Web Mining  Web Mining Taxonomy  Web Usage Mining  Web analysis tools  Pattern

Research Directions

Web Usage Mining

27

Web Usage Mining, which is just starting as an area of research, has a number of open issues. Following are some directions for future research:

Data Pre-Processing for Mining The Mining Process Analysis of Mined Knowledge Web SIFT Example

Page 28: WEB USAGE MINING Web Usage Mining 1. Contents Web Usage Mining 2  Web Mining  Web Mining Taxonomy  Web Usage Mining  Web analysis tools  Pattern

WebSIFT Example

Web Usage Mining

28

Web Site Information Filter System (WebSIFT) is a Web usage mining framework, that uses the content and structure information from a Web site, and identifies the interesting results from mining usage data.

Input of the mining process: server logs (access, referrer, and agent), HTML files, optional data.

Prototypical Web usage mining system.

Page 29: WEB USAGE MINING Web Usage Mining 1. Contents Web Usage Mining 2  Web Mining  Web Mining Taxonomy  Web Usage Mining  Web analysis tools  Pattern

Conclusion

Web Usage Mining

29

Web usage and data mining used for finding patterns is a growing area with the growth of Web-based applications

Application of web usage data can be used to better understand web usage, and apply this specific knowledge to better serve users

Web usage patterns and data mining can be the basis for a great deal of future research

Page 30: WEB USAGE MINING Web Usage Mining 1. Contents Web Usage Mining 2  Web Mining  Web Mining Taxonomy  Web Usage Mining  Web analysis tools  Pattern

References:

Web Usage Mining

30

Web Usage: Mining: Discovery and Applications of Usage Patterns from Web Data - Jaideep Srivastava, Robert Cooley, Mukund Deshpande, Pang-N in Tan Dept of CSE – University of Minnesota.

Web Mining: Pattern Discovery from World Wide Web Transaction

Web Mining Research: A Survey – Raymond Kosala, Hendrik Blockeel Dept of CS Katholieke Universiteit LeuvenJ. Srivastava, R. Cooley, M. Deshpande, Pang-Ning-tan.

Web Usage Mining: Discovery and Applications of Usage Patterns from Web Data, SIGKDD Explorations, Vol. 1, Issue 2, 2000.

B. Mobasher, R. Cooley and J. Srivastava, Web Mining: Information and Pattern Discovery on the World Wide Web, Proceedings of the 9th IEEE International Conference on Tools with Artificial Intelligence (ICTAI'97), November 1997.

www.wikipedia.org

Page 31: WEB USAGE MINING Web Usage Mining 1. Contents Web Usage Mining 2  Web Mining  Web Mining Taxonomy  Web Usage Mining  Web analysis tools  Pattern

THANK YOU…

Web Usage Mining 31