17
The Effectiveness of Collaboration The Effectiveness of Collaboration Filtering Based Recommendation Systems Filtering Based Recommendation Systems Across Different Domains and Search Modes Across Different Domains and Search Modes Does a One-Size Recommendation System Fit All ? Does a One-Size Recommendation System Fit All ? Il Im, Alexander Hars Yonsei University, Inventivio Gmbh ACM Transactions on Information Systems, Vol. 26, No. 1, ACM Transactions on Information Systems, Vol. 26, No. 1, Ariticle 4, Nov 2007 Ariticle 4, Nov 2007 2008. 03. 28. Summarized by Jaehui Park, IDS Lab., Seoul National University Presented by Jaehui Park, IDS Lab., Seoul National University

Il Im, Alexander Hars Yonsei University, Inventivio Gmbh

  • Upload
    beck

  • View
    36

  • Download
    0

Embed Size (px)

DESCRIPTION

The Effectiveness of Collaboration Filtering Based Recommendation Systems Across Different Domains and Search Modes Does a One-Size Recommendation System Fit All ?. Il Im, Alexander Hars Yonsei University, Inventivio Gmbh - PowerPoint PPT Presentation

Citation preview

Page 1: Il Im, Alexander Hars Yonsei University, Inventivio Gmbh

The Effectiveness of Collaboration Filtering Based The Effectiveness of Collaboration Filtering Based Recommendation Systems Across Different Recommendation Systems Across Different Domains and Search ModesDomains and Search ModesDoes a One-Size Recommendation System Fit All ?Does a One-Size Recommendation System Fit All ?

Il Im, Alexander Hars

Yonsei University, Inventivio Gmbh

ACM Transactions on Information Systems, Vol. 26, No. 1, Ariticle 4, Nov ACM Transactions on Information Systems, Vol. 26, No. 1, Ariticle 4, Nov 20072007

2008. 03. 28.

Summarized by Jaehui Park, IDS Lab., Seoul National University

Presented by Jaehui Park, IDS Lab., Seoul National University

Page 2: Il Im, Alexander Hars Yonsei University, Inventivio Gmbh

Copyright 2008 by CEBT

OutlineOutline

Introduction

Past studies on CF

Research issues and Hypothesis

Empirical study

Result and Discussion

Conclusion and Implication

2

Page 3: Il Im, Alexander Hars Yonsei University, Inventivio Gmbh

Copyright 2008 by CEBT

IntroductionIntroduction

Collaboration Filtering (CF)

One of the major technology for personalization that generates recommendations for users based on others’ evaluation or preferences.

Major limitation

– CF has been used mostly for consumer products Understanding the difference in CF across various domain

– The lack of studies about user side factors The differences in user’s evaluations would affect the accuracy of

recommendations. : e.g. different intention

This article

– compares the differences of recommendations by CF between different domains : research papers, and consumer product

– examines user side factors and their effects on CF systems

3

Page 4: Il Im, Alexander Hars Yonsei University, Inventivio Gmbh

Copyright 2008 by CEBT

Past studies on CFPast studies on CF

Goldberg et al. [1992] applied the technology for IR

Miller et al. [1997] generated recommendations for users based on the evaluations of others with similar profiles using the ratings of an appropriate reference group rather than the

average rating of al users.

Main stream Focused on algorithms for generating recommendations

Focused on the applications and use of CF

Shortcoming of past CF

– There has been little research about how the effectiveness of CF might vary in these different domains Mainly consumer products, such as CDs and movies, use CF

Don’t have much text information

Have little attributes

– Assumption that users’ evaluations remain constant E.g. if Tom liked “Star Wars”, he should like it forever in any occasion

[Miller et al. 1997]

4

Page 5: Il Im, Alexander Hars Yonsei University, Inventivio Gmbh

Copyright 2008 by CEBT

Research Issues and Hypothesis Research Issues and Hypothesis DevelopmentDevelopment

Many factors may affect the accuracy of CF

Hypothesis 1 The accuracy of a CF system increases as the total number of

users increase.

– The probability of finding people with similar preferences.

– critical mass : A certain number of people for certain level of recommendation:

– The accuracy may increase in different patterns depending on the product domains and other factors

5

Page 6: Il Im, Alexander Hars Yonsei University, Inventivio Gmbh

Copyright 2008 by CEBT

Research Issues and Hypothesis Research Issues and Hypothesis DevelopmentDevelopment

Hypothesis 2 The accuracy of CF as a function of the number of users will be greater

for knowledge domains, such as research papers, than for consumer product domains, such as movies.

– Preference heterogeneity : the pattern of preference of consumer

– Different levels of heterogeneity may result in the different patterns in H1’s figure.

– The people’s preferences in a movie domain is more homogeneous than that of a research paper Loosely-coupled cluster will result in less accurate recommendations than

tightly-coupled clusters

6

Page 7: Il Im, Alexander Hars Yonsei University, Inventivio Gmbh

Copyright 2008 by CEBT

Research Issues and Hypothesis Research Issues and Hypothesis DevelopmentDevelopment

Hypothesis 3

After some threshold, the accuracy of CF as a function of the users will be greater for the problemistic search mode than for the scanning mode.

– “What types of motivation do people have when conducting an information search?” [Vandenbosch and Huff 1997] [El Sawy 1985]’s categorization

Scanning : browsing through data in order to understand trends or sharpen their general understanding of the business (without specific questions)

Problemistic search : stimulated by a problem and directed towards any particular problem (with specific questions)

In the scanning mode, users’ evaluations would be more homogeneous More overlaps in users’ interests

In the problemistic mode, heterogeneous

– Performance argument higher performance in scanning mode than problemistic mode : Similar criteria

-> higher correlation -> higher performance [Miller 1997]

higher performance in problemistic mode than scanning mode : In heterogeneity domains (e.g. problemistic search), each cluster will have high correlations

– Critical mass may resolve this7

Page 8: Il Im, Alexander Hars Yonsei University, Inventivio Gmbh

Copyright 2008 by CEBT

Research Issues and Hypothesis Research Issues and Hypothesis DevelopmentDevelopment

Hypothesis 4

The accuracy of a CF system is better for the users in a same search mode than for users in mixed search modes.

– If users in different search modes were in mixed mindsets, the recommendations would not be as accurate as for the users in a same search mode because their evaluations were from different evaluation criteria

8

Page 9: Il Im, Alexander Hars Yonsei University, Inventivio Gmbh

Copyright 2008 by CEBT

Empirical StudyEmpirical Study

Setting

Data from two domains : movie and research paper

– 492 movies and 2000 abstracts of academic articles (IS Journal)

Similarity index : correlation coefficient

Reference group selection : best-n-neighbor

Users’ evaluation criteria

– Movies Scanning mode : ‘in general’

Problemistic search : ‘for the specific occasion chosen’

– Papers Scanning mode

Overall usefulness

Relevance of the paper for general IS research

Problemistic search Usefulness

Relevalce of the paper for the subject’s specific research project

Accuracy calculation : Simulation method

Accuracy measures : MAE, NMAE9

Page 10: Il Im, Alexander Hars Yonsei University, Inventivio Gmbh

Copyright 2008 by CEBT

Results and DiscussionsResults and Discussions

People evaluate items with broader (higher average) but similar (smaller standard deviations) criteria in the scanning mode and with narrow and diverse criteria in the problemistic search mode

Avg evaluation : Scanning mode > Problemistic search mode

Std Dev : Scanning mode < Problemistic search mode

The research papers received lower ratings than the movies

The research paper is probably a more heterogeneous domain 10

Page 11: Il Im, Alexander Hars Yonsei University, Inventivio Gmbh

Copyright 2008 by CEBT

Results and DiscussionsResults and Discussions

Number of Users and the Accuracy of CF Systems

11

Page 12: Il Im, Alexander Hars Yonsei University, Inventivio Gmbh

Copyright 2008 by CEBT

Results and DiscussionsResults and Discussions

Number of Users and the Accuracy of CF Systems (EachMovie)

12

Page 13: Il Im, Alexander Hars Yonsei University, Inventivio Gmbh

Copyright 2008 by CEBT

Results and DiscussionsResults and Discussions

Number of Users and the Accuracy of CF Systems (Book-Crossing)

13

Page 14: Il Im, Alexander Hars Yonsei University, Inventivio Gmbh

Copyright 2008 by CEBT

Results and DiscussionsResults and Discussions

the Accuracy of CF Systems

14

Page 15: Il Im, Alexander Hars Yonsei University, Inventivio Gmbh

Copyright 2008 by CEBT

Results and DiscussionsResults and Discussions

Mode of Search

15

Page 16: Il Im, Alexander Hars Yonsei University, Inventivio Gmbh

Copyright 2008 by CEBT

Results and DiscussionsResults and Discussions

Summary

16

Page 17: Il Im, Alexander Hars Yonsei University, Inventivio Gmbh

Copyright 2008 by CEBT

Conclusion and ImplicationConclusion and Implication

Identifying key factors that would influence the accuracy of CF systems

Investigation the impact of those factors on accuracy

Limitation

Domain selection, Data-set size, Book-crossing data-set

Subjects selection, evaluation scale

Implication

The performance of CF systems is not domain-independent.

– Pilot test to estimate the suitability for the intended domain

The search mode of the users strongly influences the accuracy of the results.

– Collecting information about user’s search mode is not easy

Future research direction

More research on other product domain

How the patterns of evaluations affect the accuracy of CF system

How search modes can be identified with minimal intrusion to users17