Upload
oana-tifrea-marciuska
View
129
Download
0
Tags:
Embed Size (px)
DESCRIPTION
IMPROVING PERSONALIZED SEARCH ON SOCIAL WEB BASED ON SIMILARITIES BETWEEN USERS
Citation preview
IMPROVING PERSONALIZED SEARCH ON SOCIAL WEB BASED ON
SIMILARITIES BETWEEN USERS
* Department of Computer Science, University of Oxford, UK {zhenghua.xu,thomas.lukasiewicz,oana.tifrea}@cs.ox.ac.uk
Zhenghua Xu*, Thomas Lukasiewicz *, Oana Tifrea-Marciuska*
SUM 2014
Social Web Search Personalization
q Tags are valuable resources for Social Web Personalization – Good summaries of the corresponding documents – Ideal data for privacy-enhanced personalization
q Collaborative tagging on the social Web is called folksonomy.
Example
A folksonomy
§ Users and documents
§ Tags annotated by users to documents
Comedy Action
Carl
Bob
Alice
English comedy movie
Chinese action movie
Chinese comedy movie d2 d3 d1
Personalization using folksonomy
The state of the art works of using social tags in personalizing the search on the Social Web generally utilize the similarity between two profiles:
q User profile (tags assigned by a user to all online documents) – Characterize user preference (e.g. pAlice)
q General document profile (tags assigned to a document) – Characterize social summary of the online document (e.g. pd1)
Similarity measure
Cosine similarity
Example
• Carl issue a query “Interesting Chinese film”
• Desired personalized ranking is (d3 > d1 > d2) .
Comedy Action
Carl
d1 d2 d3
English comedy movie
Chinese action movie
Chinese comedy movie
Example
State of the art UP-PR
q The personalized ranking function
where • Score(q,d) is non-personalized textual matching score between query
and document;
• Sim(pu, pd) is the personalizing factor measuring the similarity between user profile and general document profile.
User Profile Personalized Ranking (UP-PR) [1]
Example UP-PR
• Using method UP-PR we can compute the ranking score as follows
• Therefore, the personalized ranking is d1 > d3 > d2
• And we wanted (d3 > d1 > d2)
α=0.5, Score(q,d1)=0.68, Score(q,d2)=0.55, Score(q,d3)=0.5
Example UP-PR
• This ranking (d1 > d3 > d2) is intuitively inaccurate because – Sim(pCarl,pd3 ) should have similar value to Sim(pCarl,pd1 )
– Score(q, d3) and Score(q, d2) should be the highest text matching score Comedy Action
Carl
d1 d2 d3
English comedy movie
Chinese action movie
Chinese comedy movie
Query: “Interesting Chinese film”
Social Personalized Ranking (SoPRa) [2]
State of the art SOPRA
q The personalized ranking function
• Sim(pu, pd) is the personalizing factor measuring the similarity between user profile and general document profile;
• Sim(q,pd): the social matching score, how relevant the social summary of a document d is to q
• Score(q,d) is non-personalized textual matching score between query and document;
Example SOPRA
• Using method SoPRA we can compute the ranking score as follows
• The personalized ranking is d1 > d3 > d2 (narrow gap d1 and d3 )
• And we wanted d3 > d1 > d2
α=β=δ=0.5, Score(q,d1)=0.68, Score(q,d2)=0.55, Score(q,d3)=0.5
Score(q, d3) is low is because d3 is a an online video that has little text
Why it does not work?
Comedy Action
Carl
d1 d2 d3
English comedy movie
Chinese action movie
Chinese comedy movie
For the query “Interesting Chinese film” we want d3 > d1 > d2
does not correctly characterize Carl’s real perception about d3, since tags from all users are treated equally, and the tag from Bob brings a bias
Carl did not tag d3, so the information used for preference modeling is not comprehensive
Why it does not work?
Comedy Action
Carl
d1 d2 d3
English comedy movie
Chinese action movie
Chinese comedy movie
For the query “Interesting Chinese film” we want d3 > d1 > d2
does not correctly characterize Carl’s real perception about d3, since tags from all users are treated equally, and the tag from Bob brings a bias
Carl did not tag d3, so the information used for preference modeling is not comprehensive
Do not treat tags from all users with equal importance for document profile
Extend the user profile with more useful information
Reasons
• Different Users have different perceptions for the same document • Not all tags assigned by all other users are equally helpful to
summarize a user’s real perception about a document
• General document profile, treating tags from all users with equal importance, cannot properly summarize a special user’s personal perception
• Online annotations are sparse • user profile, based on only the tags assigned by the corresponding
user, may not contain sufficient information to comprehensively characterize the user’s preferences
Our approach D-PR
q Two novel profiles – Personalized document profile
– Each user has a personalized document profile to characterize his/her perception about this document
– Extended user profile – Summing up all personalized document profiles of
u to more comprehensively characterize u’s preference
Dual Personalized Ranking
Our approach D-PR
Dual Personalized Ranking q The personalized ranking function
• Sim(p’u pd) is the personalizing factor measuring the similarity between pu,d - the personalized document profile and p’u is the extended profile.
• Sim(q,pd): the social matching score, how relevant the social summary of a document d is to q
• Score(q,d) is non-personalized textual matching score between query and document;
Personalized Document Profile
q Users having similar perceptions about existing documents will very likely also share similar perceptions about future documents
q Given a document d and a user u, we use the perception similarities between u and other users as weights to sum up tags assigned to d by the users having high perception similarities with u.
q Thus, a perception similarity of two users can be measured by the similarity of their profiles, called profile-based perception similarity and defined as follows:
Estimate of Personalized Document Profile
1. Select a set of users UT whose perception similarity with u are higher than a predefined threshold T
2. We estimate u’s personalized document profile relative to a document d (denoted pu,d) by using perception similarities as weights to sum up the tags assigned to d by the users belonging to UT
¤ vui,d is a weighted vector of tags, whose weight of a tag is the number of times that the tag is assigned by ui to d
¤ Ud is the set of users who annotate document d
Example D-PR
q Compute perception similarities between Carl and other users
Carl
We set threshold T to be 0.5, therefore
UT={Alice, Bob,Carl}
Example D-PR
Carl
Example D-PR
α=β=δ=0.5 We get desired ranking d3 > d1 > d2
Analysis
q D-PR solves profile modeling problems existing in the state-of-the-art approaches in the following two ways: – It utilizes the perception similarities to weaken the
influences of tags assigned by users having different perceptions
– It obtains a personalized document profile for each document, so the extended user profile, computed by summing up all these personalized document profiles, contains more sufficient information to characterize the user’s preferences more comprehensively
Experimental Study
More than 100 000 URLs of online documents and retrieves their social annotations from Delicious.com from [3].
Evaluation Methodology
q Obtaining relevance judgments is an expensive, time-consuming process ¤ who does it? ¤ what are the instructions? ¤ what is the level of agreement?
Evaluation Methodology
• Reciprocal of the rank at which the first relevant document is retrieved (very sensitive to rank position)
• Mean Reciprocal Rank (MRR) is the average of the reciprocal ranks over a set of queries
• ri is the ranking position of the ith user query’s first relevant document in the personalized search result ordering, and n is the total number of tested queries.
MRR
RR = 1/1 = 1
RR = 1/2 = 0.5
MRR = (1+0.5)/2 = 0.75
Evaluation Methodology
¨ Proven that if a document is annotated by a user with some tags, this document is very likely to be visited by the same user if it appears as a search result of using the same tags as the search query ¤ Therefore, for each bookmark (u, t, d), we create a
query q = t, which is issued by user u and aims at finding document d
¤ We remove all selected bookmarks to avoid promoting the annotated document with bias.
Results
Summary and Outlook
q In this paper, we have proposed a dual personalized ranking (D-PR) function to improve personalized ranking of search on the Social Web via
q an extended user profile
q a personalized document profile.
q In future research, we will apply our D-PR ranking function to other Social Web datasets to evaluate its performance on various kinds of social resources.
Questions?
References
[1] S. Xu, S. Bao, B. Fei, Z. Su, and Y. Yu. Exploring folksonomy for personalized search. In Proceedings of SIGIR, pages 155–162, 2008.
[2] M. R. Bouadjenek, H. Hacid, and M. Bouzeghoub. Sopra: A new social personalized ranking function for improving Web search. In Proceedings of SIGIR, pages 861–864, 2013.
[3] M. G. Noll and C. Meinel. The metadata triumvirate: Social annotations, anchor texts and search queries. In Proceedings of WI-IAT, pages 640–647, 2008.