Upload
allyson-sims
View
229
Download
0
Embed Size (px)
Citation preview
南台科技大學 資訊工程系
A web page usage prediction scheme using sequence indexing and clustering techniquesAdviser: Yu-Chiang Li Speaker: Gung-Shian LinDate:2010/10/15Data & Knowledge Engineering, Vol 69, No. 4, pp. 371-382, 2010.
2
Outline
Introduction1
Definitions and background2
Prediction/recommendation model3
WST utilization – recommendation/prediction method4
Evaluation5
Definitions and background6
3
1. Introduction
We consider the problem of web page usage prediction in a web site by modeling users’ navigation history and web page content with weighted suffix trees.
We focus to the later area of web data mining that tries to exploit the navigational traces of the users in order to extract knowledge about their preferences and their behavior.
4
1. Introduction
We propose two novel methods for modeling the user navigation history.
The first method,exploits knowledge extracted only from user access sequences from the web server log file.
The second method enhances the first one by utilizing web page content during the phase of access pattern extraction.
5
2. Definitions and background
6
3. Prediction/recommendation
7
3. Prediction/recommendation
WAS maintenance Either we program properly the web server to store each
WAS in separate repository or we can program an extraction process from the log files that is executed at the beginning of the preprocessing procedure.
Assume for the sake of description that there are N sequences that form a set S={WAS1,WAS2,….WASN}
8
3. Prediction/recommendation
WAS clustering Our decision to use k-windows as a clustering method was
driven by a variety of reason, such as the enhanced quality of the produced clusters and its inherent parallel nature
(a) Sequential movements M2, M3, M4 of initial window M1.(b) Sequential enlargements E1, E2 of window M4.
9
3. Prediction/recommendation
(a) W1 and W2 satisfy the similarity condition and W1 is deleted.(b) W3 and W4 satisfy the merge operation and are considered to belong to the same cluster.(c) W5 and W6 have a small overlapment and capture two dierent clusters.
10
3. Prediction/recommendation
An example of the application of the k-windows algorithm.
11
3. Prediction/recommendation
WAS clustering exploiting web page content Direct sequence alignment (DSA)
In the alignment (global or local) of a pair the scoring function of aligning two characters/web pages is a combination of the importance label of each page and the similarity metric between them.
)()!( ))( )(( ,1
)()!( ),cos(*2
1
)()( ),,cos(*1
),( ,
jiji
jiPP
jiPP
PrPrandUProrUPr
PrPrTT
PrPrTT
PjPiScore ji
ji
12
3. Prediction/recommendation
Sequence alignment with clustering preprocess (SACP)
UU or r(Pj)r(Pi)
U)!),r(Pter&r(P same clusnot in thePP
e clusterin the samPP
PjPiScore jiji
ji
,1
,0
,1
),( ,
,
Another way to incorporate the content of web pages into the sequence alignment algorithm is to perform a clustering by content of the web pages.
13
3. Prediction/recommendation
WAS cluster representation When the WAS clustering procedure is over each one of the
clusters is expressed as a weighted sequence. As an alternative someone could possibly use the approach
of progressive or iterative pairwise alignment in order to produce the multiple sequence alignment.
14
4. WST utilization – recommendation/prediction method
The recommendation/prediction algorithm works as follows: when a new user is arrived in the system, he is assigned to the root of the generalized weighted suffix tree (gWST).
Weighted suffix tree navigation
15
4. WST utilization – recommendation/prediction method
We have a sample run of the recommendation algorithm.
Recommendation method run. Numbers in the nodes express their weight.
16
5. Evaluation
Evaluation of access based method Comparing our experimental results with “A web page
prediction model based on click-stream tree representation of user behavior”
17
5. Evaluation
Evaluation of access and content based methods The context of the experiment was exactly the same as the
evaluation as described in the previous section.
18
6. Conclusions and open issues
we have proposed various techniques for predicting web page usage patterns by modeling the users’ navigation history using string processing techniques,
Future work includes different ways of modeling web user access patterns.
南台科技大學 資訊工程系