IMPROVING PERSONALIZED SEARCH ON SOCIAL WEB BASED ON SIMILARITIES BETWEEN USERS

IMPROVING PERSONALIZED SEARCH ON SOCIAL WEB BASED ON

SIMILARITIES BETWEEN USERS

* Department of Computer Science, University of Oxford, UK {zhenghua.xu,thomas.lukasiewicz,oana.tifrea}@cs.ox.ac.uk

Zhenghua Xu*, Thomas Lukasiewicz *, Oana Tifrea-Marciuska*

SUM 2014

Social Web Search Personalization

q Tags are valuable resources for Social Web Personalization –  Good summaries of the corresponding documents –  Ideal data for privacy-enhanced personalization

q Collaborative tagging on the social Web is called folksonomy.

Example

A folksonomy

§  Users and documents

§  Tags annotated by users to documents

Comedy Action

Carl

Bob

Alice

English comedy movie

Chinese action movie

Chinese comedy movie d2 d3 d1

Personalization using folksonomy

The state of the art works of using social tags in personalizing the search on the Social Web generally utilize the similarity between two profiles:

q User profile (tags assigned by a user to all online documents) –  Characterize user preference (e.g. pAlice)

q General document profile (tags assigned to a document) –  Characterize social summary of the online document (e.g. pd1)

Similarity measure

Cosine similarity

Example

•  Carl issue a query “Interesting Chinese film”

•  Desired personalized ranking is (d3 > d1 > d2) .

Comedy Action

Carl

d1 d2 d3



Chinese comedy movie

Example

State of the art UP-PR

q The personalized ranking function

where •  Score(q,d) is non-personalized textual matching score between query

and document;

•  Sim(pu, pd) is the personalizing factor measuring the similarity between user profile and general document profile.

User Profile Personalized Ranking (UP-PR) [1]

Example UP-PR

•  Using method UP-PR we can compute the ranking score as follows

•  Therefore, the personalized ranking is d1 > d3 > d2

•  And we wanted (d3 > d1 > d2)

α=0.5, Score(q,d1)=0.68, Score(q,d2)=0.55, Score(q,d3)=0.5

Example UP-PR

•  This ranking (d1 > d3 > d2) is intuitively inaccurate because –  Sim(pCarl,pd3 ) should have similar value to Sim(pCarl,pd1 )

–  Score(q, d3) and Score(q, d2) should be the highest text matching score Comedy Action

Carl

d1 d2 d3




Query: “Interesting Chinese film”

Social Personalized Ranking (SoPRa) [2]

State of the art SOPRA

q The personalized ranking function

•  Sim(pu, pd) is the personalizing factor measuring the similarity between user profile and general document profile;

•  Sim(q,pd): the social matching score, how relevant the social summary of a document d is to q

•  Score(q,d) is non-personalized textual matching score between query and document;

Example SOPRA

•  Using method SoPRA we can compute the ranking score as follows

•  The personalized ranking is d1 > d3 > d2 (narrow gap d1 and d3 )

•  And we wanted d3 > d1 > d2

α=β=δ=0.5, Score(q,d1)=0.68, Score(q,d2)=0.55, Score(q,d3)=0.5

Score(q, d3) is low is because d3 is a an online video that has little text

Why it does not work?

Comedy Action

Carl

d1 d2 d3




For the query “Interesting Chinese film” we want d3 > d1 > d2

does not correctly characterize Carl’s real perception about d3, since tags from all users are treated equally, and the tag from Bob brings a bias

Carl did not tag d3, so the information used for preference modeling is not comprehensive

Why it does not work?

Comedy Action

Carl

d1 d2 d3




For the query “Interesting Chinese film” we want d3 > d1 > d2

does not correctly characterize Carl’s real perception about d3, since tags from all users are treated equally, and the tag from Bob brings a bias

Carl did not tag d3, so the information used for preference modeling is not comprehensive

Do not treat tags from all users with equal importance for document profile

Extend the user profile with more useful information

Reasons

•  Different Users have different perceptions for the same document •  Not all tags assigned by all other users are equally helpful to

summarize a user’s real perception about a document

•  General document profile, treating tags from all users with equal importance, cannot properly summarize a special user’s personal perception

•  Online annotations are sparse •  user profile, based on only the tags assigned by the corresponding

user, may not contain sufficient information to comprehensively characterize the user’s preferences

Our approach D-PR

q Two novel profiles –  Personalized document profile

– Each user has a personalized document profile to characterize his/her perception about this document

–  Extended user profile –  Summing up all personalized document profiles of

u to more comprehensively characterize u’s preference

Dual Personalized Ranking

Our approach D-PR

Dual Personalized Ranking q The personalized ranking function

•  Sim(p’u pd) is the personalizing factor measuring the similarity between pu,d - the personalized document profile and p’u is the extended profile.

•  Sim(q,pd): the social matching score, how relevant the social summary of a document d is to q

•  Score(q,d) is non-personalized textual matching score between query and document;

Personalized Document Profile

q Users having similar perceptions about existing documents will very likely also share similar perceptions about future documents

q Given a document d and a user u, we use the perception similarities between u and other users as weights to sum up tags assigned to d by the users having high perception similarities with u.

q Thus, a perception similarity of two users can be measured by the similarity of their profiles, called profile-based perception similarity and defined as follows:

Estimate of Personalized Document Profile

1.  Select a set of users UT whose perception similarity with u are higher than a predefined threshold T

2.  We estimate u’s personalized document profile relative to a document d (denoted pu,d) by using perception similarities as weights to sum up the tags assigned to d by the users belonging to UT

¤  vui,d is a weighted vector of tags, whose weight of a tag is the number of times that the tag is assigned by ui to d

¤ Ud is the set of users who annotate document d

Example D-PR

q Compute perception similarities between Carl and other users

Carl

We set threshold T to be 0.5, therefore

UT={Alice, Bob,Carl}

Example D-PR

Carl

Example D-PR

α=β=δ=0.5 We get desired ranking d3 > d1 > d2

Analysis

q D-PR solves profile modeling problems existing in the state-of-the-art approaches in the following two ways: –  It utilizes the perception similarities to weaken the

influences of tags assigned by users having different perceptions

–  It obtains a personalized document profile for each document, so the extended user profile, computed by summing up all these personalized document profiles, contains more sufficient information to characterize the user’s preferences more comprehensively

Experimental Study

More than 100 000 URLs of online documents and retrieves their social annotations from Delicious.com from [3].

Evaluation Methodology

q  Obtaining relevance judgments is an expensive, time-consuming process ¤ who does it? ¤ what are the instructions? ¤ what is the level of agreement?


•  Reciprocal of the rank at which the first relevant document is retrieved (very sensitive to rank position)

•  Mean Reciprocal Rank (MRR) is the average of the reciprocal ranks over a set of queries

•  ri is the ranking position of the ith user query’s first relevant document in the personalized search result ordering, and n is the total number of tested queries.

MRR

RR = 1/1 = 1

RR = 1/2 = 0.5

MRR = (1+0.5)/2 = 0.75


¨  Proven that if a document is annotated by a user with some tags, this document is very likely to be visited by the same user if it appears as a search result of using the same tags as the search query ¤ Therefore, for each bookmark (u, t, d), we create a

query q = t, which is issued by user u and aims at finding document d

¤ We remove all selected bookmarks to avoid promoting the annotated document with bias.

Results

Summary and Outlook

q  In this paper, we have proposed a dual personalized ranking (D-PR) function to improve personalized ranking of search on the Social Web via

q an extended user profile

q a personalized document profile.

q  In future research, we will apply our D-PR ranking function to other Social Web datasets to evaluate its performance on various kinds of social resources.

Questions?

References

[1] S. Xu, S. Bao, B. Fei, Z. Su, and Y. Yu. Exploring folksonomy for personalized search. In Proceedings of SIGIR, pages 155–162, 2008.

[2] M. R. Bouadjenek, H. Hacid, and M. Bouzeghoub. Sopra: A new social personalized ranking function for improving Web search. In Proceedings of SIGIR, pages 861–864, 2013.

[3] M. G. Noll and C. Meinel. The metadata triumvirate: Social annotations, anchor texts and search queries. In Proceedings of WI-IAT, pages 640–647, 2008.

Social Media

IMPROVING PERSONALIZED SEARCH ON SOCIAL WEB BASED ON SIMILARITIES BETWEEN USERS