26
Social Book Search: A Combination of Personalized Recommendations and Retrieval Author: Justin van Wees Supervisor: Marijn Koolen Second assessor: Frank Nack Master Thesis Information Science Human Centered Multimedia August 23, 2012

Social Book Search: A Combination of Personalized Recommendations and Retrieval

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Social Book Search: A Combination of Personalized Recommendations and Retrieval

Social Book Search:A Combination of Personalized Recommendations and Retrieval

Author:Justin van WeesSupervisor:Marijn KoolenSecond assessor:Frank Nack

Master ThesisInformation Science

Human Centered MultimediaAugust 23, 2012

Page 2: Social Book Search: A Combination of Personalized Recommendations and Retrieval

Outline

1. Background

2. Research questions

3. Data collection

4. Experiments and results

5. Conclusions

6. Discussion and future work

7. Questions

Page 3: Social Book Search: A Combination of Personalized Recommendations and Retrieval

Current situation

• Traditional information retrieval (IR) models:

• developed for use on small collections

• contain only officially published documents, annotated by professionals

• Many modern web (2.0) applications still use traditional models for search

• Millions of documents

• Combination of user-generated content (UDG) and professional metadata

Page 4: Social Book Search: A Combination of Personalized Recommendations and Retrieval

Current situation

• User uses IR system to find those documents that are topically relevant to her information need

• Queries can lead to thousands of relevant documents

• Evaluating large number of results expensive for user

• Other notions of relevance, i.e. how well-written, popular, recent, fun is the document

• Combination of professional and user-generated metadata

Page 5: Social Book Search: A Combination of Personalized Recommendations and Retrieval

Social Book Search Track

• Evaluate relative value of controlled book metadata versus social metadata (Koolen et al., 2012)

• Amazon.com and LibraryThing (LT) corpus

• ~2.8 million book records, both social and professional metadata

• book search requests from LT discussion forums as topics, suggestions by other users as relevance judgements

Page 6: Social Book Search: A Combination of Personalized Recommendations and Retrieval
Page 7: Social Book Search: A Combination of Personalized Recommendations and Retrieval

Recommender Systems

• Recommender Systems (RSs) suggest items of interest to individuals or groups of users (Resnick and Varian, 1997)

• Assumes that individual’s taste or interest in a particular item can be explained by features recorded by the RS (demographics, previous interactions, etcetera)

• Different strategies: collaborative filtering (CF), content-, community-, knowledge-based, hybrid (Burke, 2007)

• Differs from traditional retrieval in terms of query formulation, source of relevance feedback and personalization (Furner, 2002)

Page 8: Social Book Search: A Combination of Personalized Recommendations and Retrieval

Research Questions

• What data are we able to collect?

• Can we automatically make accurate predictions of a user’s preference for an unknown book?

• How do we combine results from IR system with RSs?

• Social Book Search scenario and data

Does a combination of techniques from the field of IR with those from RSs improve retrieval performance when searching for works in a large scale on-line collaborative media catalogue?

Page 9: Social Book Search: A Combination of Personalized Recommendations and Retrieval

Crawling LibraryThing

• Perform four different crawls of user profiles and personal catalogues

• For each crawl, also crawl links to other profiles

• Compare crawls to determine representativeness for entire LT user-base

• All crawl combined approximately 6% of LT userbase

Crawl seed list profiles unique works profile overlap

Forum users 1,104 60,131 4,354,387

Random – 211 works 1,306 8,040 2,537,065 7,048

Random – 1,000 works 5,577 18,381 3,580,296 14,262

Random – 10,000 works 35,671 64,379 5,122,848 37,300

Total - 89,693 5,299,399 -

Page 10: Social Book Search: A Combination of Personalized Recommendations and Retrieval

Crawling LibraryThing

Crawl min. max. median mean std. dev.

Forum users

Friends 0 172 3.0 8.47 16.31

Groups 0 10 9.0 6.79 3.74

Interesting Libraries 0 510 2.0 11.19 26.46

Random – 211 works

Friends 0 79 0.0 2.61 7.46

Groups 0 10 0.0 1.70 3.05

Interesting Libraries 0 394 0.0 3.30 17.80

Random – 1,000 works

Friends 0 84 0.0 2.18 6.07

Groups 0 10 0.0 1.64 3.02

Interesting Libraries 0 574 0.0 2.74 14.41

Random – 10,000 works

Friends 0 2,858 0.0 1.73 17.49

Groups 0 10 0.0 1.24 2.61

Interesting Libraries 0 855 0.0 1.69 10,40

Total

Friends 0 2,858 1.0 2.14 12.77

Groups 0 10 0.0 1.18 2.44

Interesting Libraries 0 855 0.0 1.27 8.00

Page 11: Social Book Search: A Combination of Personalized Recommendations and Retrieval

Crawling LibraryThing

Crawl min. max. median mean std. dev. sum

Forum users

Unrated 0 28,402 84.0 397.22 929.70 23,885,23

Rated 0 12,190 3.0 78.80 238.53 4,738,018

Total 0 28,402 148.00 476.02 980.88 28,623,249

Random – 211 works

Unrated 0 28,402 458.00 1,112.81 1,835.81 8,946,997

Rated 0 12,190 10.00 182.77 472.08 1,469,531

Total 0 28,402 657.00 1,295.58 1,908.65 10,416,528

Random – 1,000 works

Unrated 0 28,402 331.00 864.32 1,480.98 15,887,025

Rated 0 12,190 3.00 130.20 369.06 2,393,233

Total 0 28,402 475.00 994.52 1539.15 18,280,258

Random – 10,000 works

Unrated 0 28,402 163.00 486.63 955.86 31,328,971

Rated 0 12,190 1.00 74.04 237.01 4,766,750

Total 0 28,402 201.00 560.68 1,000.50 36,095,721

Total

Unrated 0 28,402 102.00 378.18 834.94 33,920,353

Rated 0 12,190 1.00 62.85 206.40 5,637,097

Total 0 28,402 156.00 441.03 876.76 39,557,450

Page 12: Social Book Search: A Combination of Personalized Recommendations and Retrieval

Generating Recommendations

• Collaborative filtering approach

• Unary and rated transactions

• Memory- and model-based recommenders

• Randomly split transactions (80% train/20% test) for performance evaluation

Page 13: Social Book Search: A Combination of Personalized Recommendations and Retrieval

Generating Recommendations

• Neighbourhood (Desrosiers and Karypis, 2011):

• Directly use user-item ratings to predict ratings for ‘unseen’ items

• Find n most similar neighbours (Pearson correlation)

• Use the weighted average rating given by the user’s neighbours

• Let neighbours ‘vote’ on unary transactions

Page 14: Social Book Search: A Combination of Personalized Recommendations and Retrieval

Generating Recommendations

• Singular Value Decomposition (SVD) (Schafer et al., 2007):

• Reduce domain complexity by mapping item space to k dimensions

• Remaining dimensions represent the latent topics: preferences classes of users, categorical classes of items

• Currently considered ‘state of the art’

Page 15: Social Book Search: A Combination of Personalized Recommendations and Retrieval

Recommender Performance

Method MAE RMSE P@5 P@10 P@50

Neighbourhood (N=25) 0.7813 1.0286 0.0712 0.0661 0.0614

Neighbourhood (N=50) 0.7721 1.0105 0.0376 0.0371 0.0339

Neighbourhood (N=100) 0.7633 0.9927 0.0246 0.0239 0.0232

SVD (K=50) 0.6210 0.8139 0.0021 0.0019 0.0026

SVD (K=100) 0.6203 0.8131 0.0025 0.0022 0.0028

SVD (K=150) 0.6192 0.8122 0.0281 0.0107 0.0030

Method Accuracy P@5 P@10 P@50

Neighbourhood (N=25) 0.2430 0.3711 0.2425 0.1829

Neighbourhood (N=50) 0.3014 0.3824 0.2561 0.1861

Neighbourhood (N=100) 0.3621 0.3640 0.2422 0.1812

SVD (K=50) 0.2240 0.0214 0.0198 0.0216

SVD (K=100) 0.2601 0.0219 0.0203 0.0229

SVD (K=150) 0.2676 0.0424 0.0212 0.0234

Page 16: Social Book Search: A Combination of Personalized Recommendations and Retrieval

Retrieving Works

• Setup used for INEX 2012; top performing run

• Index consists of user-generated content

• Removed stopwords

• Stemming with Krovetz

• Topic titles as queries

• Language model

• Pseudo relevance feedback, 50 terms of top 10 results

Page 17: Social Book Search: A Combination of Personalized Recommendations and Retrieval

Combining IR and RS

• Retrieval system: ranked list, probability score between 0 and 1 per work

• Recommendations: estimated preference of user for work between 0.5 and 5.0 or 0 or 1 (unary)

• Normalise ratings

• ‘Boost’ works with estimated preference, CombSUM (Fox and Shaw, 19994)

• Use average rating when no prediction can be made

• Introduce weight (λ) between systems

Page 18: Social Book Search: A Combination of Personalized Recommendations and Retrieval

Results

Method � nDCG@10 P@10 R@10

Baseline - 0.1437 0.1219 0.1494

Neighbourhood

Rated (n=25) 0.0001700 0.1709 (18.93%) 0.1490 (22.23%) 0.1899 (27.11%)

Rated (n=50) 0.0001855 0.1778 (23.73%) 0.1500 (23.05%) 0.1913 (28.05%)

Rated (n=100) 0.0001800 0.1669 (16.14%) 0.1490 (22.23%) 0.1878 (25.70%)

Unary (n=25) 0.0001500 0.1446 (0.63%) 0.1229 (0.82%) 0.1520 (1.74%)

Unary (n=50) 0.0001500 0.1441 (0.28%) 0.1229 (0.82%) 0.152 (1.74%)

Unary (n=100) 0.0001500 0.1441 (0.28%) 0.1229 (0.82%) 0.152 (1.74%)

SVD

Rated (K=50) 0.0001800 0.1718 (19.55%) 0.149 (22.23%) 0.1866 (24.9%)

Rated (K=100) 0.0001850 0.1721 (19.76%) 0.149 (22.23%) 0.1866 (24.9%)

Rated (K=150) 0.0001850 0.172 (19.69%) 0.149 (22.23%) 0.1866 (24.90%)

Unary (K=50) 0.0001500 0.1449 (0.84%) 0.124 (1.72%) 0.1541 (3.15%)

Unary (K=100) 0.0001550 0.1441 (0.28%) 0.1229 (0.82%) 0.1520 (1.74%)

Unary (K=150) 0.0001550 0.1424 (-0.9%) 0.1250 (2.54%) 0.1561 (4.48%)

Page 19: Social Book Search: A Combination of Personalized Recommendations and Retrieval

Conclusions

• Collected representative sample of user profiles

• Collaborative filtering obvious choice

• SVD best at estimating rated preference

• Poor performance on unary transactions

• Successfully combined retrieval with personalized recommendations

• Rated transactions most useful

• Personal preference is relevance evidence that can highly improve retrieval performance in SBS

Page 20: Social Book Search: A Combination of Personalized Recommendations and Retrieval

Discussion and Future Work

• Popularity as relevance evidence

• Value of λ depending on IR score distribution

• Other (mixtures of) RS setups

• Scaling, cold-start problems

• Trust and transparency of the system

Page 21: Social Book Search: A Combination of Personalized Recommendations and Retrieval

Questions?

Page 22: Social Book Search: A Combination of Personalized Recommendations and Retrieval

References• R. Burke. Hybrid web recommender systems. In The adaptive web, pages 377–408. Springer-Verlag, 2007.

• C. Desrosiers and G. Karypis. A comprehensive survey of neighborhood-based recommendation methods. Recommender

Systems Handbook, pages 107–144, 2011.

• E. Fox and J. Shaw. Combination of multiple searches. NIST SPECIAL PUBLICATION SP, pages 243–243, 1994.

• J. Furner. On recommending. Journal of the American Society for Information Science and Technology, 53(9):747–763, 2002.

• M. Koolen, G. Kazai, J. Kamps, A. Doucet, and M. Landoni. Overview of the INEX 2011 books and social search track. In S. Geva, J.

Kamps, and R. Schenkel, editors, Focused Retrieval of Content and Structure: 10th International Workshop of the Initiative for the

Evaluation of XML Retrieval (INEX 2011), volume 7424 of LNCS. Springer, 2012.

• P. Resnick and H. Varian. Recommender systems. Communi- cations of the ACM, 40(3):56–58, 1997.

• J. B. Schafer, D. Frankowski, J. Herlocker, and S. Sen. Collaborative Filtering Recommender Systems. Inter- national Journal of Electronic

Business, 2(1):77, 2007. ISSN 14706067. doi: 10.1504/IJEB.2004.004560. URL http://www.springerlink.com/index/

t87386742n752843.pdf.

Page 23: Social Book Search: A Combination of Personalized Recommendations and Retrieval

Number of Books in Catalogue

(a) Unrated works (b) Rated works

Page 24: Social Book Search: A Combination of Personalized Recommendations and Retrieval

Document scoring

S(d) = (1� �)PRet(d|q) + �PCF (d)

• PRet(d|q): work’s score obtained through IR system

• PCF : estimated rating of current user for work obtained through RS

• �: weight between systems

Page 25: Social Book Search: A Combination of Personalized Recommendations and Retrieval

Estimating preference (rated)

r̂ui =

Pv2Ni(u)

wuvrvi

Pv2Ni(u)

|wuv|

• r̂ui: estimated preference of user u for item i

• wuv: preference similarity between users v and u

• Ni(u): k-NN of u that rated item i

Desrosiers and Karypis, 2011

Page 26: Social Book Search: A Combination of Personalized Recommendations and Retrieval

Estimating preference (unary)

vir =X

v2Ni(u)

�(rvi = r)Wuv

Desrosiers and Karypis, 2011