Il Im, Alexander Hars Yonsei University, Inventivio Gmbh

The Effectiveness of Collaboration Filtering Based The Effectiveness of Collaboration Filtering Based Recommendation Systems Across Different Recommendation Systems Across Different Domains and Search ModesDomains and Search ModesDoes a One-Size Recommendation System Fit All ?Does a One-Size Recommendation System Fit All ?

Il Im, Alexander Hars

Yonsei University, Inventivio Gmbh

ACM Transactions on Information Systems, Vol. 26, No. 1, Ariticle 4, Nov ACM Transactions on Information Systems, Vol. 26, No. 1, Ariticle 4, Nov 20072007

2008. 03. 28.

Summarized by Jaehui Park, IDS Lab., Seoul National University

Presented by Jaehui Park, IDS Lab., Seoul National University

Copyright 2008 by CEBT

OutlineOutline

Introduction

Past studies on CF

Research issues and Hypothesis

Empirical study

Result and Discussion

Conclusion and Implication

2


IntroductionIntroduction

Collaboration Filtering (CF)

One of the major technology for personalization that generates recommendations for users based on others’ evaluation or preferences.

Major limitation

– CF has been used mostly for consumer products Understanding the difference in CF across various domain

– The lack of studies about user side factors The differences in user’s evaluations would affect the accuracy of

recommendations. : e.g. different intention

This article

– compares the differences of recommendations by CF between different domains : research papers, and consumer product

– examines user side factors and their effects on CF systems

3


Past studies on CFPast studies on CF

Goldberg et al. [1992] applied the technology for IR

Miller et al. [1997] generated recommendations for users based on the evaluations of others with similar profiles using the ratings of an appropriate reference group rather than the

average rating of al users.

Main stream Focused on algorithms for generating recommendations

Focused on the applications and use of CF

Shortcoming of past CF

– There has been little research about how the effectiveness of CF might vary in these different domains Mainly consumer products, such as CDs and movies, use CF

Don’t have much text information

Have little attributes

– Assumption that users’ evaluations remain constant E.g. if Tom liked “Star Wars”, he should like it forever in any occasion

[Miller et al. 1997]

4


Research Issues and Hypothesis Research Issues and Hypothesis DevelopmentDevelopment

Many factors may affect the accuracy of CF

Hypothesis 1 The accuracy of a CF system increases as the total number of

users increase.

– The probability of finding people with similar preferences.

– critical mass : A certain number of people for certain level of recommendation:

– The accuracy may increase in different patterns depending on the product domains and other factors

–

5



Hypothesis 2 The accuracy of CF as a function of the number of users will be greater

for knowledge domains, such as research papers, than for consumer product domains, such as movies.

– Preference heterogeneity : the pattern of preference of consumer

– Different levels of heterogeneity may result in the different patterns in H1’s figure.

–

– The people’s preferences in a movie domain is more homogeneous than that of a research paper Loosely-coupled cluster will result in less accurate recommendations than

tightly-coupled clusters

6



Hypothesis 3

After some threshold, the accuracy of CF as a function of the users will be greater for the problemistic search mode than for the scanning mode.

– “What types of motivation do people have when conducting an information search?” [Vandenbosch and Huff 1997] [El Sawy 1985]’s categorization

Scanning : browsing through data in order to understand trends or sharpen their general understanding of the business (without specific questions)

Problemistic search : stimulated by a problem and directed towards any particular problem (with specific questions)

In the scanning mode, users’ evaluations would be more homogeneous More overlaps in users’ interests

In the problemistic mode, heterogeneous

– Performance argument higher performance in scanning mode than problemistic mode : Similar criteria

-> higher correlation -> higher performance [Miller 1997]

higher performance in problemistic mode than scanning mode : In heterogeneity domains (e.g. problemistic search), each cluster will have high correlations

– Critical mass may resolve this7



Hypothesis 4

The accuracy of a CF system is better for the users in a same search mode than for users in mixed search modes.

– If users in different search modes were in mixed mindsets, the recommendations would not be as accurate as for the users in a same search mode because their evaluations were from different evaluation criteria

8


Empirical StudyEmpirical Study

Setting

Data from two domains : movie and research paper

– 492 movies and 2000 abstracts of academic articles (IS Journal)

Similarity index : correlation coefficient

Reference group selection : best-n-neighbor

Users’ evaluation criteria

– Movies Scanning mode : ‘in general’

Problemistic search : ‘for the specific occasion chosen’

– Papers Scanning mode

Overall usefulness

Relevance of the paper for general IS research

Problemistic search Usefulness

Relevalce of the paper for the subject’s specific research project

Accuracy calculation : Simulation method

Accuracy measures : MAE, NMAE9


Results and DiscussionsResults and Discussions

People evaluate items with broader (higher average) but similar (smaller standard deviations) criteria in the scanning mode and with narrow and diverse criteria in the problemistic search mode

Avg evaluation : Scanning mode > Problemistic search mode

Std Dev : Scanning mode < Problemistic search mode

The research papers received lower ratings than the movies

The research paper is probably a more heterogeneous domain 10



Number of Users and the Accuracy of CF Systems

11



Number of Users and the Accuracy of CF Systems (EachMovie)

12



Number of Users and the Accuracy of CF Systems (Book-Crossing)

13



the Accuracy of CF Systems

14



Mode of Search

15



Summary

16


Conclusion and ImplicationConclusion and Implication

Identifying key factors that would influence the accuracy of CF systems

Investigation the impact of those factors on accuracy

Limitation

Domain selection, Data-set size, Book-crossing data-set

Subjects selection, evaluation scale

Implication

The performance of CF systems is not domain-independent.

– Pilot test to estimate the suitability for the intended domain

The search mode of the users strongly influences the accuracy of the results.

– Collecting information about user’s search mode is not easy

Future research direction

More research on other product domain

How the patterns of evaluations affect the accuracy of CF system

How search modes can be identified with minimal intrusion to users17

Documents

Il Im, Alexander Hars Yonsei University, Inventivio Gmbh