19
南南南南南南 南南南南南 A web page usage prediction scheme using sequence indexing and clustering techniques Adviser: Yu-Chiang Li Speaker: Gung-Shian Lin Date:2010/10/15 Data & Knowledge Engineering, Vol 69, No. 4, pp. 371-382, 2010.

南台科技大學 資訊工程系 A web page usage prediction scheme using sequence indexing and clustering techniques Adviser: Yu-Chiang Li Speaker: Gung-Shian Lin Date:2010/10/15

Embed Size (px)

Citation preview

Page 1: 南台科技大學 資訊工程系 A web page usage prediction scheme using sequence indexing and clustering techniques Adviser: Yu-Chiang Li Speaker: Gung-Shian Lin Date:2010/10/15

南台科技大學 資訊工程系

A web page usage prediction scheme using sequence indexing and clustering techniquesAdviser: Yu-Chiang Li Speaker: Gung-Shian LinDate:2010/10/15Data & Knowledge Engineering, Vol 69, No. 4, pp. 371-382,  2010.

Page 2: 南台科技大學 資訊工程系 A web page usage prediction scheme using sequence indexing and clustering techniques Adviser: Yu-Chiang Li Speaker: Gung-Shian Lin Date:2010/10/15

2

Outline

Introduction1

Definitions and background2

Prediction/recommendation model3

WST utilization – recommendation/prediction method4

Evaluation5

Definitions and background6

Page 3: 南台科技大學 資訊工程系 A web page usage prediction scheme using sequence indexing and clustering techniques Adviser: Yu-Chiang Li Speaker: Gung-Shian Lin Date:2010/10/15

3

1. Introduction

We consider the problem of web page usage prediction in a web site by modeling users’ navigation history and web page content with weighted suffix trees.

We focus to the later area of web data mining that tries to exploit the navigational traces of the users in order to extract knowledge about their preferences and their behavior.

Page 4: 南台科技大學 資訊工程系 A web page usage prediction scheme using sequence indexing and clustering techniques Adviser: Yu-Chiang Li Speaker: Gung-Shian Lin Date:2010/10/15

4

1. Introduction

We propose two novel methods for modeling the user navigation history.

The first method,exploits knowledge extracted only from user access sequences from the web server log file.

The second method enhances the first one by utilizing web page content during the phase of access pattern extraction.

Page 5: 南台科技大學 資訊工程系 A web page usage prediction scheme using sequence indexing and clustering techniques Adviser: Yu-Chiang Li Speaker: Gung-Shian Lin Date:2010/10/15

5

2. Definitions and background

Page 6: 南台科技大學 資訊工程系 A web page usage prediction scheme using sequence indexing and clustering techniques Adviser: Yu-Chiang Li Speaker: Gung-Shian Lin Date:2010/10/15

6

3. Prediction/recommendation

Page 7: 南台科技大學 資訊工程系 A web page usage prediction scheme using sequence indexing and clustering techniques Adviser: Yu-Chiang Li Speaker: Gung-Shian Lin Date:2010/10/15

7

3. Prediction/recommendation

WAS maintenance Either we program properly the web server to store each

WAS in separate repository or we can program an extraction process from the log files that is executed at the beginning of the preprocessing procedure.

Assume for the sake of description that there are N sequences that form a set S={WAS1,WAS2,….WASN}

Page 8: 南台科技大學 資訊工程系 A web page usage prediction scheme using sequence indexing and clustering techniques Adviser: Yu-Chiang Li Speaker: Gung-Shian Lin Date:2010/10/15

8

3. Prediction/recommendation

WAS clustering Our decision to use k-windows as a clustering method was

driven by a variety of reason, such as the enhanced quality of the produced clusters and its inherent parallel nature

(a) Sequential movements M2, M3, M4 of initial window M1.(b) Sequential enlargements E1, E2 of window M4.

Page 9: 南台科技大學 資訊工程系 A web page usage prediction scheme using sequence indexing and clustering techniques Adviser: Yu-Chiang Li Speaker: Gung-Shian Lin Date:2010/10/15

9

3. Prediction/recommendation

(a) W1 and W2 satisfy the similarity condition and W1 is deleted.(b) W3 and W4 satisfy the merge operation and are considered to belong to the same cluster.(c) W5 and W6 have a small overlapment and capture two dierent clusters.

Page 10: 南台科技大學 資訊工程系 A web page usage prediction scheme using sequence indexing and clustering techniques Adviser: Yu-Chiang Li Speaker: Gung-Shian Lin Date:2010/10/15

10

3. Prediction/recommendation

An example of the application of the k-windows algorithm.

Page 11: 南台科技大學 資訊工程系 A web page usage prediction scheme using sequence indexing and clustering techniques Adviser: Yu-Chiang Li Speaker: Gung-Shian Lin Date:2010/10/15

11

3. Prediction/recommendation

WAS clustering exploiting web page content Direct sequence alignment (DSA)

In the alignment (global or local) of a pair the scoring function of aligning two characters/web pages is a combination of the importance label of each page and the similarity metric between them.

)()!( ))( )(( ,1

)()!( ),cos(*2

1

)()( ),,cos(*1

),( ,

jiji

jiPP

jiPP

PrPrandUProrUPr

PrPrTT

PrPrTT

PjPiScore ji

ji

Page 12: 南台科技大學 資訊工程系 A web page usage prediction scheme using sequence indexing and clustering techniques Adviser: Yu-Chiang Li Speaker: Gung-Shian Lin Date:2010/10/15

12

3. Prediction/recommendation

Sequence alignment with clustering preprocess (SACP)

UU or r(Pj)r(Pi)

U)!),r(Pter&r(P same clusnot in thePP

e clusterin the samPP

PjPiScore jiji

ji

,1

,0

,1

),( ,

,

Another way to incorporate the content of web pages into the sequence alignment algorithm is to perform a clustering by content of the web pages.

Page 13: 南台科技大學 資訊工程系 A web page usage prediction scheme using sequence indexing and clustering techniques Adviser: Yu-Chiang Li Speaker: Gung-Shian Lin Date:2010/10/15

13

3. Prediction/recommendation

WAS cluster representation When the WAS clustering procedure is over each one of the

clusters is expressed as a weighted sequence. As an alternative someone could possibly use the approach

of progressive or iterative pairwise alignment in order to produce the multiple sequence alignment.

Page 14: 南台科技大學 資訊工程系 A web page usage prediction scheme using sequence indexing and clustering techniques Adviser: Yu-Chiang Li Speaker: Gung-Shian Lin Date:2010/10/15

14

4. WST utilization – recommendation/prediction method

The recommendation/prediction algorithm works as follows: when a new user is arrived in the system, he is assigned to the root of the generalized weighted suffix tree (gWST).

Weighted suffix tree navigation

Page 15: 南台科技大學 資訊工程系 A web page usage prediction scheme using sequence indexing and clustering techniques Adviser: Yu-Chiang Li Speaker: Gung-Shian Lin Date:2010/10/15

15

4. WST utilization – recommendation/prediction method

We have a sample run of the recommendation algorithm.

Recommendation method run. Numbers in the nodes express their weight.

Page 16: 南台科技大學 資訊工程系 A web page usage prediction scheme using sequence indexing and clustering techniques Adviser: Yu-Chiang Li Speaker: Gung-Shian Lin Date:2010/10/15

16

5. Evaluation

Evaluation of access based method Comparing our experimental results with “A web page

prediction model based on click-stream tree representation of user behavior”

Page 17: 南台科技大學 資訊工程系 A web page usage prediction scheme using sequence indexing and clustering techniques Adviser: Yu-Chiang Li Speaker: Gung-Shian Lin Date:2010/10/15

17

5. Evaluation

Evaluation of access and content based methods The context of the experiment was exactly the same as the

evaluation as described in the previous section.

Page 18: 南台科技大學 資訊工程系 A web page usage prediction scheme using sequence indexing and clustering techniques Adviser: Yu-Chiang Li Speaker: Gung-Shian Lin Date:2010/10/15

18

6. Conclusions and open issues

we have proposed various techniques for predicting web page usage patterns by modeling the users’ navigation history using string processing techniques,

Future work includes different ways of modeling web user access patterns.

Page 19: 南台科技大學 資訊工程系 A web page usage prediction scheme using sequence indexing and clustering techniques Adviser: Yu-Chiang Li Speaker: Gung-Shian Lin Date:2010/10/15

南台科技大學 資訊工程系