Download pdf - Knowledge Discovery From Weblogs

7/29/2019 Knowledge Discovery From Weblogs

1/25

A

SEMINAR REPORT

ON

Knowledge Discovery From Weblogs

Submitted in partial fulfillment of degree of

BACHELOR OF TECHNOLOGY

In

Information Technology

2012-13

Guided by: Submitted by:

Mr. Saurabh Anand, Avtar Kishore Gaur,

Lecturer, B. Tech. (IT),

Department Of IT VIII Semester, IT/09/53

DEPARTMENT OF INFORMATION TECHNOLOGY

POORNIMA COLLEGE OF EGINEERING

ISI 06, RIICO INSTITUTIONAL AREA

JAIPUR302 022


2/25

2

ON ......................................................................................................................................................................... 1

SUBMITTED IN PARTIAL FULFILLMENT OF DEGREE OF ........................................................................................... 1

BACHELOROFTECHNOLOGY ..................................................................................................................................... 1

INFORMATION TECHNOLOGY ............................................................................................................................................ 1

1. INTRODUCTION .................................................................................................................................................. 3

2. FIELDS IN WEB LOG FILE ..................................................................................................................................... 3

3. MINING WEB LOGS FOR PATH PROFILES ............................................................................................................ 4

3.1WEB CONTENT MINING: ............................................................................................................................................ 4

3.2WEB LOG MINING FOR PREFETCHING........................................................................................................................... 4

3.3WEB OBJECT PREDICTION .......................................................................................................................................... 4

4. WEB MINING TAXONOMY: ................................................................................................................................ 5

4.1WEB CONTENT MINING: ........................................................................................................................................... 5

4.1.1 Classification of Multimedia Content and Websites: ................................................................................... 5

4.1.2 Focused Crawling: ........................................................................................................................................ 6

4.1.3 Clustering Web Objects: ............................................................................................................................... 6

Clustering : ............................................................................................................................................................ 6

Association: ........................................................................................................................................................... 7

4.2WEB STRUCTURE MINING: ......................................................................................................................................... 7

Web structure mining techniques: ........................................................................................................................ 9

4.3WEB USAGE MINING: ............................................................................................................................................. 10

4.3.1 Data Preparation: ..................................................................................................................................... 11

4.3.2 Data Mining ............................................................................................................................................... 11

4.3.3 Web usage data: ........................................................................................................................................ 13

4.3.4 Web Server Data: ....................................................................................................................................... 15

5 ADVANTAGES/ MERITS: .................................................................................................................................... 16

6. DISADVANTAGES/ DEMERITS: .......................................................................................................................... 17

7. APPLICATIONS: ................................................................................................................................................ 18

6.1SEARCH ENGINES: ................................................................................................................................................... 19

6.2SIMILARITY MEASURES: ........................................................................................................................................... 19

6.3ONTOLOGY: .......................................................................................................................................................... 20

6.4RECOGNITION TECHNOLOGY: .................................................................................................................................... 20

6.5SUMMARIZATION: .................................................................................................................................................. 21

6.6E-COMMERCE: ....................................................................................................................................................... 21

6.7CONTENT MANAGEMENT: ........................................................................................................................................ 22

6.8INFORMATION AGGREGATION: .................................................................................................................................. 23

8. CONCLUSION ................................................................................................................................................... 23

9. REFERENCES ..................................................................................................................................................... 24


3/25

3

1. IntroductionWeb usage mining is obtaining the interesting and constructive knowledge andimplicit information from activities related to the WWW. Web servers trace andgather information about user interactions every time the user requests for

particular resources. Evaluating the Web access logs would assist in predicting the

user behavior and also assists in formulating the web structure. Based on the

applications point of view, information extracted from the Web usage patternspossibly directly applied to competently manage activities related to e-business,

eservices, e-education, on-line communities and so on. On the other hand, since thesize and density of the data grows rapidly, the information provided by existing

Web log file analysis tools may possibly provide insufficient information andhence more intelligent mining techniques are needed. There are several approaches

previously available for web usage mining. The approaches available in theliterature have their own merits and demerits. This paper focuses on the study and

analysis of various existing web usage mining techniques.

2. Fields in Web Log File

a) Web Server:Apacheb) IP Adress:-66.249.71.6 and 180.76.5.92c) UserName:- -- and --d) Timestamp :- [23/Feb/2012:06:23:46 -0600] and -

[23/Feb/2012:06:11:04 -0600] (time of visit by webserver)

e) Access request :"GET /robots.txt HTTP/1.1 and "GET / HTTP/1.1f) Result status code : 500 and 500 (Internal Server Error)g) Bytes transferred : 7370 and 7370h) User Agent: Mozilla/5.0i) Referrer URL : (compatible; Googlebot/2.1;

+http://www.google.com/bot.html) and (compatible; Baiduspider/2.0;+http://www.baidu.com/search/spider.html)

j) Access request :"GET /robots.txt HTTP/1.1 and "GET / HTTP/1.1k) Result status code : 500 and 500 (Internal Server Error)l) Bytes transferred : 7370 and 7370m)User Agent: Mozilla/5.0


4/25

4

n) Referrer URL : (compatible; Googlebot/2.1;+http://www.google.com/bot.html) and (compatible; Baiduspider/2.0;+http://www.baidu.com/search/spider.html)

3. Mining Web Logs for Path Profiles

3.1Web Content Mining: Steps involved in mining web logs for path profile are

a. Data Cleaning on Web Log Datab. Mining Web Logs for Path Profilesc. Web Object Prediction.d. Learning to Prefetch Web Documents

3.2 Web Log Mining for Prefetching

Caching and prefetching as effective approaches to explosive growth in Network

users and Web service, and has been widely used in Web Proxy,P2P,GridComputing and Wireless network. Bringing some of more popular items closer to

end-users can improve the network performance and, therefore, reduce thedownload latency and network congestion. Web caching and prefetching are based

on temporal locality of user sequence. Independent Reference Model (IRM) andMarkov Reference Model (MRM) are mostly used for Web caching Model at

present. While Markov-based Prefetching Model is mostly used for prefetching.

The design of replacement policy is always based on characteristic of requestsequences. Therefore, to modeling on user request sequences and Web objectsproperties exactly and simply is so important, and we hope to find optimal policies

under these factors to be pursued in systematic manner. This paper firstly analyzesand compares Web caching and prefetching models that are used nowadays, and

then based on the measurement of Relative Popularity and Byte Cost, it presents anoptimal Web caching and prefetching model PR PPM that satisfy different

performance metrics.We have separate visiting sessions.Apath profile consists

frequent subsequences from the frequently occurring paths.Path profile helps us topredict the next pages that are most likely to occur.

3.3 Web Object Prediction

It is possible to train a path-based model for predicting future URL's based on a

sequence of current URL accesses.This can be done on a per-user basis, or on aper-server basis. The former requires that the user-session be recognized and


5/25

5

broken down nicely through a filtering system, and the latter takes the simplisticview that the accesses on a server is a single long thread.

4. WEB MINING TAXONOMY:

Web Mining can be broadly divided into three distinct categories, according to the

kinds of data to be mined:

4.1 Web Content Mining: Web content mining techniques:

4.1.1 Classification of Multimedia Content and Websites:

In order to retrieve relevant knowledge a system has to analyze webcontent first. Classification of web objects offers an automatic way to

decide the relevance of web objects. Our focus in this area is theclassification of websites or hosts. Since websites represent

information on a more general level (e.g. a complete company) andare usually represented by multiple pages, classifiying website on topof webpage classification demands new algorithms.


6/25

6

4.1.2 Focused Crawling:

A focused web crawler takes a set of well-selected web pages

exemplifying the user interest. Searching for further relevant webpages, the focused crawler starts from the given pages and recursively

explores the linked web pages. We are especially interested in

crawling to retrieve complete websites, a task demanding new crawl

strategies. While the crawlers used for refreshing the indices of theweb search engines perform a breadth-first search of the whole web, a

focused crawler explores only a small portion of the web using a best-first search guided by the user interest. Furthermore, we are interested

in crawling for multimedia content in the web, retrieving topicsspecific multimedia content instead of plain HTML documents.

4.1.3 Clustering Web Objects:

Focused Crawling retrieves large numbers of relevant data. In order tooffer fast and more specific access to the query results, clustering is

an established method to group the retrieved information to achievebetter understanding. If the query results are websites or combined

objects like images and their text descriptions, new algorithm areneeded to handle these combined data types to find meaningul

clusterings.

Clustering : It is the process of grouping a set of physical andabstract objects into class of similar objects is called clustering.

Requirements of clustering in web mining:

1 .Scalability


7/25

7

2. ability to deal with different type of attributes

3. discovery of clusters with arbitrary shape

4. minimal requirements for domain knowledge to determine input

parameters

5. ability to deal with noisy data

6. high dimensionality

7. interpretability and usability

Fig: clustering

Association:

Association analysis identifies items events that happen or dont happen

together .it is used to search frequent pattern. Suppose, instead, that weare given the All Electronics relational database relating to purchases. Aweb mining system may find association rules like

age(X, 2029)^ income(X, 20K29K)-> buys(X, CD player)

[support= 2%, confidence = 60%]

4.2 Web Structure Mining:


8/25

8

Web Structure Mining can be regarded as the process of discovering structure

information from the Web.The structure of a typical Web graph consists of Web

pages as nodes, and hyperlinks as edges connecting between two related pagesThis

type of mining can be further divided into two kinds based on the kind of structural

data used.

There has been a significant body of work on hyperlink analysis. Document

Structure: In addition, the content within a Web page can also be organized in a

tree-structured format, based on the various HTML and XML tags within the page.

Mining efforts here have focused on automatically extracting document object

model (DOM) structures out of document.

Hyperlinks: A Hyperlink is a structural unit that connects a Web page to different

location, either within the same Web page or to a different Web page. A hyperlink

that connects to a different part of the same page is called anIntra-Document

Hyperlink, and a hyperlink that connects two different pages is called anInter-

Document Hyperlink.


9/25

9

Web structure mining techniques:

Generate structural summary about the Web site an

webpage:

Depending upon the hyperlink, Categorizing the Web pages and therelated Information @ inter domain level Discovering the Web Page

Structure. Discovering the nature of the hierarchy of hyperlinks in the

website and its structure.

Finding Information about web pages:

->Retrieving information about the relevance and the quality of the

web page.

->Finding the authoritative on the topic and content.


10/25

10

Inference on Hyperlink:

The web page contains not only information but also hyperlinks,

which contains huge amount of annotation. Hyperlink identifies

authors endorsement of the other web page.

4.3 Web Usage Mining:

Web Usage Mining is the application of data mining techniques to discover

interesting usage patterns from Web data, in order to understand and better servethe needs of Web-based applications. Usage data captures the identity or origin ofWeb users along with their browsing behavior at a Web site. Web usage mining

itself can be classified further depending on the kind of usage data considered webusage mining techniques:


11/25

11

4.3.1 Data Preparation:

Data Collection:

Data collection is the first step of web usage mining, the data authenticity

and integrality will directly affect the following works smoothly carrying on

and the final recommendation of characteristic services quality. Therefore itmust use scientific, reasonable and advanced technology to gather variousdata. At present, towards web usage mining technology, the main data origin

has three kinds: server data, client data and middle data (agent server dataand package detecting).

Data Selection:

Where data relevant to the analysis task are retrieved from web.

Data Cleaning:

The purpose of data cleaning is to eliminate irrelevant items, and these kinds

of techniques are of importance for any type of web log analysis not onlydata mining. According to the purposes of different mining applications,

irrelevant records in web access log will be eliminated during data cleaning.Since the target of Web Usage Mining is to get the users travel patterns,following two kinds of records are unnecessary and should be removed:

1. The records of graphics, videos and the format information Therecords have filename suffixes of GIF, JPEG, CSS, and so on, whichcan found in the URI field of the every record;

2. The records with the failed HTTP status code. By examining theStatus field of every record in the web access log.

4.3.2 Data Mining

Navigation Patterns:

Web page hierarchy of web site:


12/25

12

Example:

70% of users who accessed /company/product2 did so by starting at/company and proceeding through /company/new, /company/products and

company/product1 80% of users who accessed the site started from/company/products 65% of users left the site after four or less pagereferences.


13/25

13

Sequential Patterns :

Mining Results

Fig. Mining result

4.3.3 Web usage data:

The record of what actions a user takes with his mouse and keyboard while

visiting a site.

Sources

- Server access logs

- Server Referrer logs

- Agent logs

- Client-side cookies

- User profiles


14/25

14

- search engine logs

- Database logs

Transfer / Access Log: The transfer/access log contains detailed

information about each request that the server receives from usersweb browsers.

Agent log : The agent log lists the browsers (including versionnumber and the platform) that people are using to connect to yourserver.

Referred log : The referrer log contains the URLs of pages on othersites that link to your pages. That is, if a user gets to one of theservers pages by clicking on a link from another site, that URL ofthat site will appear in this log.


15/25

15

Error log: The error log keeps a record of errors and failed requests.

A request may fail if the page contains links to a file that does

not exist or if the user is not authorized to access a specific pageor file.

4.3.4 Web Server Data:

They correspond to the user logs that are collected at Web server. Some ofthe typical data collected at a Web server include IP addresses, pagereferences, and access time of the users.


16/25

16

4.3.5 Application Server Data:

Commercial application servers, e.g. Web logic have significant features in

the framework to enable E-commerce applications to be built on top of them

with little effort. A key feature is the ability to track various kinds ofbusiness events and log them in application server logs.

4.3.6 Application Level Data:

Finally, new kinds of events can always be defined in an application, andlogging can be turned on for them generating histories of these speciallydefined events.

5 Advantages/ Merits:

Web usage mining has many advantages which makes this technology

attractive to many corporations including the government agencies. The

predicting capability of the mining application can benefits the society by

identifying criminal activities. The companies can establish better customer

relationship by giving them exactly what they need. Companies can

understand the needs of the customer better and they can react to customer

needs faster. The companies can find, attract and retain customers; they cansave on production costs by utilizing the acquired insight of customer

requirements. This technology has enabled e-commerce to do personalized

marketing, which eventually results in higher trade. The government

agencies are using this technology to classify threats and fight against

terrorism. They can increase profitability by target pricing based on the

profiles created. They can even find the customer who may default to a

competitor the company will try to retain the customer by providing

promotional offers to the specific customer, thus reducing the risk of losing a

customer or customers.

Easy to implement


17/25

17

Improve the quality of public search engine and personalized searchengines

To create personalized search engines, which can understand apersons search queries in a personal way by analyzing and profiling

users search behaviour

6. Disadvantages/ Demerits:

Some mining algorithms might use controversial attributes like sex, race,

religion, or sexual orientation to categorize individuals. These practices

might be against the anti-discrimination legislation. The applications make it

hard to identify the use of such controversial attributes, and there is no

strong rule against the usage of such algorithms with such attributes. This

process could result in denial of service or a privilege to an individual based

on his race, religion or sexual orientation, right now this situation can be

avoided by the high ethical standards maintained by the data mining

company. The collected data is being made anonymous so that, the obtained

data and the obtained patterns cannot be traced back to an individual. It

might look as if this poses no threat to ones privacy, actually many extra

information can be inferred by the application by combining two separate

unscrupulous data from the user.Another important concern is that the

companies collecting the data for a specific purpose might use the data for a

totally different purpose, and this essentially violates the users interests.

Web usage mining by itself does not create issues, but this technology when

used on data of personal nature might cause the issues. The most criticized

ethical issue involving web usage mining is the invasion of privacy. Privacy


18/25

18

is considered lost when information concerning an individual is obtained,

used, or disseminated, especially if this occurs without their knowledge or

consent. The obtained data will be analyzed, and clustered to form profiles;

the data will be made anonymous before clustering so that there are no

personal profiles. Thus these applications de-individualize the users by

judging them by their mouse clicks. De-individualization, can be defined as

a tendency of judging and treating people on the basis of group

characteristics instead of on their own individual characteristics and merits.

The growing trend of selling personal data as a commodity encourages

website owners to trade personal data obtained from their site. This trend has

increased the amount of data being captured and traded increasing the

likeliness of ones privacy being invaded. The companies which buy the data

are obliged make it anonymous and these companies are considered authors

of any specific release of mining patterns. They are legally responsible for

the contents of the release; any inaccuracies in the release will result in

serious lawsuits, but there is no law preventing them from trading the data.

7. Applications:

a. Search Engines

b. Similarity Measures

c. ontology

d. matching techniques;

e. recognition technology;f. summarization;

g. e-commerce;

h. content management;

i. database querying;


19/25

19

j. information aggregation

6.1 Search Engines:

Given the rate of growth of the Web, scalability of search engines is a key

issue, as the amount of hardware and network resources needed is large, and

expensive. In addition, search engines are popular tools, so they have heavy

constraints on query answer time. So, the efficient use of resources can

improve both scalability and answer time. One tool to achieve these goals is

Web mining. Web mining has three branches: link mining, usage mining,

and content mining. One important analysis in all these cases is the dynamic

behavior. Here we give examples of link and usage mining related to search

engines, as well as the related Web dynamics.

6.2 Similarity Measures:

Ranking model construction is an important topic in information retrieval

and web mining. Recently, many approaches based on the idea of learning

to rank have been proposed for this task and most of them attempt to score

all documents of different queries by resorting to a single function.we

propose a distributional similarity measure for query-dependent ranking. In

the query-dependent ranking framework, an individual ranking model is

constructed for each training query with associated documents. When a new

query is asked, the documents retrieved for the new query are ranked

according to the scores determined by a joint ranking model which is


20/25

20

combined from the individual models of similar training queries. The

distributional similarity measure is used to calculate the similarities between

queries. Experimental results show that our method is more effective than

other approaches.

6.3 Ontology:

The world wide web today provides users access to extremely large websites

containing many information of education and commercial values.due to the

unstructures and semi structures of web pages and the design of idiosyncrasy

of websites.its a challenging task to develop digital libraries for organisingand managing digital content from the web.web mining research in the last

10 years has on the other hand made significant process in categorising and

extracting content from the web.ontology represnts set of content and their

interrelationships relevant to some knowledge domain.the knowledge

provided by ontology is extremely useful defining the structure and scope

for mining web content.

6.4 Recognition Technology:

The explosive growth of internet has made more necessary to the users to

use automatic tool to find, to extract, to filter and to evaluate the available

resources over the internet. there are powerful tools to find information for

category or for content such as yahoo, Google etc. for these searches we

need to introduce keywords and they determine the web pages that contain

these words. trying to satisfying users requirements, many times these

consultations bring inconsistence or documents that fulfill the search

approach but not the users interest.


21/25

21

there is necessity of having new technologies that help us to use the content

of web more efficiently. for this reason in last years a series of techniques

that allow advanced processing data on internet have been developed. these

techniques carry out a depth analysis in an automatic way and they belong to

area denominated as web mining.

6.5 Summarization:

Hypermedia has emerged as primary means for storing and structuring

information yet due to the continuously increasing size of these

infrastructure ,it is getting ever difficult for users to understand and navigatethrough such sites. we see that to overcome these obstacles it is essential to

use techniques that recover the web authors intentions and superimpose it

with the users retrieval context in summarizing websites.

Although most of the developing world is likely to first access the Internet

through mobile phones, mobile devices are constrained by screen space,

bandwidth and limited attention span. Single document summarization

techniques have the potential to simplify information consumption on

mobile phones by presenting only the most relevant information contained in

the document.

6.6 E-commerce:

Nowadays, the web is an important part of our daily life. The web is now the

best medium of doing business. Large companies rethink their business

strategy using the web to improve business. Business carried on the Web

offers the opportunity to potential customers or partners where their products

and specific business can be found. Business presence through a company


22/25

22

web site has several advantages as it breaks the barrier of time and space

compared with the existence of a physical office. To differentiate through

the Internet economy, winning companies have realized that e-commerce

transactions is more than just buying / selling, appropriate strategies are key

to improve competitive power. One effective technique used for this purpose

is data mining. Data mining is the process of extracting interesting

knowledge from data. Web mining is the use of data mining techniques to

extract information from web data.

6.7 Content management:

With the rapid growth in business size, todays businesses orient towards

electronic technologies. Amazon.com and e-bay.com are some of the major

stakeholders in this regard. Unfortunately the enormous size and hugely

unstructured data on the web, even for a single commodity, has become a

cause of ambiguity for consumers. Extracting valuable information from

such an ever increasing data is an extremely tedious task and is fastbecoming critical towards the success of businesses. Web content mining

can play a major role in solving these issues. It involves using efficient

algorithmic techniques to search and retrieve the desired information from a

seemingly impossible to search unstructured data on the Internet.

Application of web content mining can be very encouraging in the areas of

Customer Relations Modeling, billing records, logistics investigations,

product cataloguing and quality management. In this paper we present a

review of some very interesting, efficient yet implementable techniques

from the field of web content mining and study their impact in the area

specific to business user needs focusing both on the customer as well as the


23/25

23

producer. The techniques we would be reviewing include, mining by

developing a knowledge-base repository of the domain, iterative refinement

of user queries for personalized search, using a graph based approach for the

development of a web-crawler and filtering information for personalized

search using website captions. These techniques have been analyzed and

compared on the basis of their execution time and relevance of the result

they produced against a particular search.

6.8 Information aggregation:

Web Data Extraction Services provides robust, cutting-edge solutions and

services for data extraction from websites. Web SQL, for creating turnkey

web extraction applications, such as price collector, patent information

aggregator, etc.

XML MinerXML Miner is a system and class library for mining data and

text expressed in XML, extracting knowledge and re-using that knowledge

in products and applications in the form of fuzzy logic expert system rules

8. Conclusion

The purpose of this paper is to advocate the discovery of actionable knowledge

from Web logs. In this chapter, we presented two examples of actionable Web log

mining. In our future work, we will further explore other types of actionable

knowledge in Web applications, including the extraction of content knowledge and
http://www.webdataextractions.com/http://www.ql2.com/http://www.scientio.com/http://www.scientio.com/http://www.ql2.com/http://www.webdataextractions.com/


24/25

24

knowledge integration from multiple Web sites. The first method is to mine a Web

log for Markov models that can be used for improving caching and prefetching ofWeb objects. A second method is to use the mined knowledge for building better,

adaptive user interfaces. A third application is to use the mined knowledge from a

query web log to improve the search performance of an Internet Search Engine.Actionable knowledge is articularly attractive for Web applications because they

can be consumed by machines rather than human developers. Furthermore, theeffectiveness of the knowledge can be immediately put to test, making the merits

of the type of knowledge and methods for discovering the knowledge under moreobjective scrutiny than before.

9. References

1. Qingtian Han; Xiaoyan Gao; Wenguo Wu; Study on Web Mining Algorithm

based on Usage Mining, 9

th

International Conference on Computer-AidedIndustrial Design and Conceptual Design (CAID/CD 2008), Pp. 11211124, 2008.

2. Heydari, M.; Helal, R.A.; Ghauth, K.I.; A graphbased web usage mining

method considering client side data, International Conference on Electrical

Engineering and Informatics (ICEEI '09), Vol. 1, Pp. 147153, 2009.3. Salin, S.; Senkul, P.; Using semantic information for web usage mining based

recommendation, 24th International Symposium on Computer and InformationSciences (ISCIS 2009), Pp. 236241,2009.

4. Chih-Hung Wu, Yen-Liang Wu, Yuan-Ming Chang and Ming-Hung Hung,

"Web Usage Mining on the Sequences of Clicking Patterns in a Grid ComputingEnvironment", International Conference on Machine Learning and Cybernetics(ICMLC), Vol. 6, Pp. 2909- 2914, 2010.

5. Gang Fang; Jia-Le Wang; Hong Ying; Jiang Xiong;A Double Algorithm ofWeb Usage Mining Based on Sequence Number, International Conference on

Information Engineering and Computer Science (ICIECS), Pp. 14, 2009.6. Raghavendra, P.S.; Chowdhury, S.R.; Kameswari, S.V.; Comparative study of

neural networks and kmeans classification in web usage mining,InternationalConference for Internet Technology and Secured Transactions (ICITST), Pp. 1-7,

2010.7.Hussain, T.; Asghar, S.; Fong, S.; A hierarchical cluster based preprocessing

methodology for Web Usage Mining, 6th International Conference on AdvancedInformation Management and Service (IMS), Pp. 472477, 2010.

8. Khosravi, M.; Tarokh, M.J.; Dynamic mining of users interest navigationpatterns using nave Bayesian method, IEEE International Conference


25/25

25

on Intelligent Computer Communication and Processing (ICCP), Pp. 119 122,

2010.9. Etminani, K.; Delui, A.R.; Yanehsari, N.R.; Rouhani, M.; Web usage mining:

Discovery of the users' navigational patterns using SOM, First International

Conference on Networked Digital Technologies (NDT '09), Pp. 224249, 2009.10.Shinde, S.K. and Kulkarni, U.V., International Conference on Advanced

Computer Theory and Engineering, Pp. 973-977, 2008.11.Yang Bin; Dong Xiangjun; Shi Fufu; Research of WEB Usage Mining Based

on Negative Association Rules, International Forum on Computer Science-Technology and Applications (IFCSTA '09), Vol. 1,Pp. 196199, 2009.

12.Hussain, T.; Asghar, S.; Masood, N.; Web usage mining: A survey on

preprocessing of web log file, International Conference on InformationandEmerging Technologies (ICIET), Pp. 16, 2010.