42
Towards Better Online Personalization: A Framework for Empirical Evaluation and Real-Life Validation of Hybrid Recommendation Systems Stijn Geuens, Koen W. De Bock, Kristof Coussement

Towards Better Online Personalization: A Framework for Empirical Evaluation and Real-Life Validation of Hybrid Recommendation Systems

Embed Size (px)

Citation preview

Page 1: Towards Better Online Personalization: A Framework for Empirical Evaluation and Real-Life Validation of Hybrid Recommendation Systems

Towards Better Online Personalization: A Framework for Empirical Evaluation and

Real-Life Validation of Hybrid Recommendation Systems

Stijn Geuens, Koen W. De Bock, Kristof Coussement

Page 2: Towards Better Online Personalization: A Framework for Empirical Evaluation and Real-Life Validation of Hybrid Recommendation Systems

Recommendation Systems: Examples

207/20/2016 AMS World Marketing Congress 2016

Page 3: Towards Better Online Personalization: A Framework for Empirical Evaluation and Real-Life Validation of Hybrid Recommendation Systems

How to Calculate Recommendations[Bobadilla et al. 2013; Adomavicius et al. 2008]

Classification based on calculation paradigm:

Classification based on input data:

3AMS World Marketing Congress 201607/20/2016

Page 4: Towards Better Online Personalization: A Framework for Empirical Evaluation and Real-Life Validation of Hybrid Recommendation Systems

How to Calculate Recommendations[Bobadilla et al. 2013; Adomavicius et al. 2008]

Classification based on calculation paradigm: Memory-based [Goldberg, 1992]

Model-based [Koren, 2008]

Classification based on input data:

Socio-demographic information Demographic RecSys [eg. Pazzani 1999; Porcel et al. 2012]

Product characteristics Content-based RecSys [eg. Lang 1995; Meteren and Someren 2000]

Real-time navigation information Knowledge-based RecSys [eg. Burke 2000]

Behavioral history Collaborative filtering RecSys [eg. Herlocker et al. 2004]

Hybrid solutions [eg. Burke 2002; Preece and Sneiderman 2009]

3AMS World Marketing Congress 201607/20/2016

Page 5: Towards Better Online Personalization: A Framework for Empirical Evaluation and Real-Life Validation of Hybrid Recommendation Systems

A Shift Towards Hybrid Algorithms

Single data source systems: advantages and disadvantages [Bobadilla et al. 2013]

Hybridization resolves these issues and leads to better performance [Bobadilla et al. 2013]

Algorithm combination vs. data source combination [Bobadilla et al. 2013]

Burke’s classification [Burke, 2002]: Weighting

Feature combination

4AMS World Marketing Congress 201607/20/2016

Page 6: Towards Better Online Personalization: A Framework for Empirical Evaluation and Real-Life Validation of Hybrid Recommendation Systems

Contributions

Go beyond creation of a hybrid algorithm by:

Creation of a decision framework for marketing academics and professionals to guide them in their efforts to create recommendation systems

Opening the black-box of recommendation systems by introducing the concept of feature importance

5AMS World Marketing Congress 201607/20/2016

Page 7: Towards Better Online Personalization: A Framework for Empirical Evaluation and Real-Life Validation of Hybrid Recommendation Systems

Research Questions

6AMS World Marketing Congress 2016

Data:

Recommendation Calculation:

Feature Importance:

07/20/2016

Page 8: Towards Better Online Personalization: A Framework for Empirical Evaluation and Real-Life Validation of Hybrid Recommendation Systems

Research Questions

6AMS World Marketing Congress 2016

Data:RQ1.a. Do Recommendation systems based on different single data sources differ in performance?RQ1.b. Does combining different data sources add predictive performance?

Recommendation Calculation:RQ2. Which hybridization technique performs best for algorithms with the optimal number of data sources?

Feature Importance:RQ3. Which are the most important predictors in the best performing algorithm?

07/20/2016

Page 9: Towards Better Online Personalization: A Framework for Empirical Evaluation and Real-Life Validation of Hybrid Recommendation Systems

Framework

AMS World Marketing Congress 2016 707/20/2016

Page 10: Towards Better Online Personalization: A Framework for Empirical Evaluation and Real-Life Validation of Hybrid Recommendation Systems

Framework

AMS World Marketing Congress 2016 8

[Song, 2000; Kohavi et al., 2004]

07/20/2016

Page 11: Towards Better Online Personalization: A Framework for Empirical Evaluation and Real-Life Validation of Hybrid Recommendation Systems

Framework

AMS World Marketing Congress 2016 8

[Rendle, 2010]

[Burke, 2002; Adomavicius & Tuzhilin, 2005]

07/20/2016

Page 12: Towards Better Online Personalization: A Framework for Empirical Evaluation and Real-Life Validation of Hybrid Recommendation Systems

Framework

AMS World Marketing Congress 2016 8

[Lipton, 2014]

[Herlocker et al., 2004]

[Breiman, 2003]

07/20/2016

Page 13: Towards Better Online Personalization: A Framework for Empirical Evaluation and Real-Life Validation of Hybrid Recommendation Systems

Framework

AMS World Marketing Congress 2016 807/20/2016

Page 14: Towards Better Online Personalization: A Framework for Empirical Evaluation and Real-Life Validation of Hybrid Recommendation Systems

Experimental Setup

8 different company specific datasets

AMS World Marketing Congress 2016 9

Product Category Visitors Products

Shoes 31,536 11,712

Children's Clothing 16,752 3,956

Decoration 12,747 5,054

Lingerie 11,672 3,514

Furniture 20,507 6,481

Men's Clothing 8,412 4,737

Women's Clothing 50,336 12,979

Household linen 12,376 2,934

07/20/2016

Page 15: Towards Better Online Personalization: A Framework for Empirical Evaluation and Real-Life Validation of Hybrid Recommendation Systems

Experimental Setup

Evaluation metric: F1@5 [Lipton, 2015]

Method of analysis:

AMS World Marketing Congress 2016 1007/20/2016

Page 16: Towards Better Online Personalization: A Framework for Empirical Evaluation and Real-Life Validation of Hybrid Recommendation Systems

Experimental Setup

Evaluation metric: F1@5 [Lipton, 2015]

Method of analysis: Evaluation: Data and Recommendation Calculation

Friedman aligned rank test with Li’s procedure for posthoc testing [Garçia, 2010]

AMS World Marketing Congress 2016 1007/20/2016

Page 17: Towards Better Online Personalization: A Framework for Empirical Evaluation and Real-Life Validation of Hybrid Recommendation Systems

Experimental Setup

Evaluation metric: F1@5 [Lipton, 2015]

Method of analysis: Evaluation: Data and Recommendation Calculation

Friedman aligned rank test with Li’s procedure for posthoc testing [Garçia, 2010]

Interpretation: Variable importance Implementation of Breiman’s (2003) method developed for random forests

AMS World Marketing Congress 2016 10

𝐹𝑒𝑎𝑡𝐼𝑚𝑝𝑖 =𝐹1@5𝐹𝑢𝑙𝑙 − 𝐹1@5𝑅𝑎𝑛𝑑𝑜𝑚 𝑝𝑒𝑟𝑚𝑢𝑡𝑎𝑡𝑖𝑜𝑛

𝑖

𝐹1@5𝐹𝑢𝑙𝑙

𝐹𝑒𝑎𝑡𝐼𝑚𝑝𝑎𝑔𝑔𝑟𝑖 =

1𝑑 𝐹𝑒𝑎𝑡𝐼𝑚𝑝𝑖

𝑑

07/20/2016

Page 18: Towards Better Online Personalization: A Framework for Empirical Evaluation and Real-Life Validation of Hybrid Recommendation Systems

Results: Data

RQ1.a. Do Recommendation systems based on different single data sources differ in performance?

AMS World Marketing Congress 2016 11

---- indicate a non-significant difference @ 95% CI

07/20/2016

Page 19: Towards Better Online Personalization: A Framework for Empirical Evaluation and Real-Life Validation of Hybrid Recommendation Systems

Results: Data

RQ1.a. Do Recommendation systems based on different single data sources differ in performance?

Yes, there is a difference in performance of different single data source recommendation sytems

AMS World Marketing Congress 2016 11

---- indicate a non-significant difference @ 95% CI

07/20/2016

Page 20: Towards Better Online Personalization: A Framework for Empirical Evaluation and Real-Life Validation of Hybrid Recommendation Systems

Results: Data

RQ1.a. Do Recommendation systems based on different single data sources differ in performance?

Yes, there is a difference in performance of different single data source recommendation sytems

A company focusses best on a RBD (or PD) based recommendation sytem whenbuilding a single data source recommender system

AMS World Marketing Congress 2016 11

---- indicate a non-significant difference @ 95% CI

07/20/2016

Page 21: Towards Better Online Personalization: A Framework for Empirical Evaluation and Real-Life Validation of Hybrid Recommendation Systems

Results: Data

RQ1.b. Does combining different data sources add predictive performance?

AMS World Marketing Congress 2016 12

…... indicate a marginally significant difference

07/20/2016

Page 22: Towards Better Online Personalization: A Framework for Empirical Evaluation and Real-Life Validation of Hybrid Recommendation Systems

Results: Data

RQ1.b. Does combining different data sources add predictive performance?

Yes, performance increases when adding data sources

AMS World Marketing Congress 2016 12

…... indicate a marginally significant difference

07/20/2016

Page 23: Towards Better Online Personalization: A Framework for Empirical Evaluation and Real-Life Validation of Hybrid Recommendation Systems

Results: Data

RQ1.b. Does combining different data sources add predictive performance?

Yes, performance increases when adding data sources

It is worthwhile for a company to investigate data source combination to improve performance of recommendation systems

AMS World Marketing Congress 2016 12

…... indicate a marginally significant difference

07/20/2016

Page 24: Towards Better Online Personalization: A Framework for Empirical Evaluation and Real-Life Validation of Hybrid Recommendation Systems

Results: Recommendation Calculation

RQ2. Which hybridization technique performs best for algorithms with the optimal number of data sources?

AMS World Marketing Congress 2016 1307/20/2016

Page 25: Towards Better Online Personalization: A Framework for Empirical Evaluation and Real-Life Validation of Hybrid Recommendation Systems

Results: Recommendation Calculation

RQ2. Which hybridization technique performs best for algorithms with the optimal number of data sources?

Factorization machines are out performing an a posteriori weighting of single data source algorithms

AMS World Marketing Congress 2016 1307/20/2016

Page 26: Towards Better Online Personalization: A Framework for Empirical Evaluation and Real-Life Validation of Hybrid Recommendation Systems

Results: Recommendation Calculation

RQ2. Which hybridization technique performs best for algorithms with the optimal number of data sources?

Factorization machines are out performing an a posteriori weighting of single data source algorithms

It is worthwhile for a company to investigate advanced hybridization techniques to improve the performance of recommendation systems

AMS World Marketing Congress 2016 1307/20/2016

Page 27: Towards Better Online Personalization: A Framework for Empirical Evaluation and Real-Life Validation of Hybrid Recommendation Systems

Results: Feature Importance

RQ3. Which are the most important predictors in the best performing algorithm?

Within the best performing algorithm (RQ1 and RQ2), distinction can be made between data source importance scores. RBD > PD > CD > ABD

AMS World Marketing Congress 2016 14

0% 5% 10% 15% 20% 25% 30% 35% 40%

Aggregated Behavioral Data

Customer Data

Product Data

Raw Behavioral Data

07/20/2016

Page 28: Towards Better Online Personalization: A Framework for Empirical Evaluation and Real-Life Validation of Hybrid Recommendation Systems

Results: Feature Importance

AMS World Marketing Congress 2016 15

0% 2% 4% 6% 8% 10% 12% 14%

Number of total purchases

Mean product rating

Total value of purchases

Length of relationship

Time since last purchase

Internal vs external

Value-based segmentation

Mean Product Rating

Explicit ratings

Number of children

Marital Status

Place of residence

Age of Children

Brand

Gender

Age

Internal search

Product Division 3

Product Division 2

Product Division 1

Purchases

Addition to cart

Views

RBDPDCDABD

07/20/2016

Page 29: Towards Better Online Personalization: A Framework for Empirical Evaluation and Real-Life Validation of Hybrid Recommendation Systems

Conclusions

A framework to guide marketing professionals and academics in their efforts to create recommendation systems

Empirical validation of the framework on 8 datasets:

AMS World Marketing Congress 2016 1607/20/2016

Page 30: Towards Better Online Personalization: A Framework for Empirical Evaluation and Real-Life Validation of Hybrid Recommendation Systems

Conclusions

A framework to guide marketing professionals and academics in their efforts to create recommendation systems

Empirical validation of the framework on 8 datasets:

Single data sources recommendation systems differ in performance

AMS World Marketing Congress 2016 1607/20/2016

Page 31: Towards Better Online Personalization: A Framework for Empirical Evaluation and Real-Life Validation of Hybrid Recommendation Systems

Conclusions

A framework to guide marketing professionals and academics in their efforts to create recommendation systems

Empirical validation of the framework on 8 datasets:

Single data sources recommendation systems differ in performance

Combining data sources adds to the performance of recommendation systems

AMS World Marketing Congress 2016 1607/20/2016

Page 32: Towards Better Online Personalization: A Framework for Empirical Evaluation and Real-Life Validation of Hybrid Recommendation Systems

Conclusions

A framework to guide marketing professionals and academics in their efforts to create recommendation systems

Empirical validation of the framework on 8 datasets:

Single data sources recommendation systems differ in performance

Combining data sources adds to the performance of recommendation systems

An advanced combination technique based on feature combination outperforms a posteriori weighting of single data source algorithms

AMS World Marketing Congress 2016 1607/20/2016

Page 33: Towards Better Online Personalization: A Framework for Empirical Evaluation and Real-Life Validation of Hybrid Recommendation Systems

Conclusions

A framework to guide marketing professionals and academics in their efforts to create recommendation systems

Empirical validation of the framework on 8 datasets:

Single data sources recommendation systems differ in performance

Combining data sources adds to the performance of recommendation systems

An advanced combination technique based on feature combination outperforms a posteriori weighting of single data source algorithms

RBD is the most important data source in the best performing model followed by PD, CD, and finally ABD

AMS World Marketing Congress 2016 1607/20/2016

Page 34: Towards Better Online Personalization: A Framework for Empirical Evaluation and Real-Life Validation of Hybrid Recommendation Systems

Future Work

Incorporation of other evaluation metrics in the framework

Field test Evaluation of different recommendation strategies in terms of business metrics

Identification of the relationship between ‘academic’ metrics and business metrics

AMS World Marketing Congress 2016 1707/20/2016

Page 35: Towards Better Online Personalization: A Framework for Empirical Evaluation and Real-Life Validation of Hybrid Recommendation Systems

References

J. Bobadilla, F. Ortega, A. Hernando, A. Gutierrez, Recommender systems survey, Knowl.-Based Syst., 46 (2013) 109-132

] G. Adomavicius, A. Tuzhilin, Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions, IEEE Trans. Knowl. Data Eng., 17 (2005) 734-749

Y. Koren, Factorization meets the neighborhood: A multifaceted collaborative filtering model, 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, Las Vegas, NV, 2008, pp. 426-434

M.J. Pazzani, A framework for collaborative, content-based and demographic filtering, Artif. Intell. Rev., 13 (1999) 393-408

C. Porcel, A. Tejeda-Lorente, M.A. Martinez, E. Herrera-Viedma, A hybrid recommender system for the selective dissemination of research resources in a technology transfer office, Inform. Sciences, 184 (2012) 1-19

R. Burke, Hybrid recommender systems: Survey and experiments, User Modeling and User-Adapted Interaction, 12 (2002) 331-370

AMS World Marketing Congress 2016 1807/20/2016

Page 36: Towards Better Online Personalization: A Framework for Empirical Evaluation and Real-Life Validation of Hybrid Recommendation Systems

References

J.L. Herlocker, J.A. Konstan, L.G. Terveen, J.T. Riedl, Evaluating collaborative filtering recommender systems, ACM Trans. Inf. Syst., 22 (2004) 5-53

I.-Y. Song, Database Design for Real-World E-Commerce Systems, IEEE Data Engineering Bulletin, 23 (2000) 23-28.

R. Kohavi, L. Mason, R. Parekh, Z. Zheng, Lessons and Challenges from Mining Retail E-Commerce Data, Mach. Learn., 57 (2004) 83-113

S. Rendle, Factorization Machines, IEEE International Conference on Data Mining, Sydney, Australia, 2010

Z.C. Lipton, C. Elkan, B. Naryanaswamy, Optimal thresholding of classifiers to maximize F1 measure, in: T. Calders, F. Esposito, E. Hüllermeier, R. Meo (Eds.) Machine Learning and Knowledge Discovery in Databases, Springer Berlin Heidelberg 2014, pp. 225-239

L. Breiman, Random forests, Mach. Learn., 45 (2001) 5-32

AMS World Marketing Congress 2016 1907/20/2016

Page 37: Towards Better Online Personalization: A Framework for Empirical Evaluation and Real-Life Validation of Hybrid Recommendation Systems

Thank you for your Attention

Contact:Stijn Geuens (0)3.20.545.892

IESEG School of Management [email protected]

3 Rue de la Digue fr.linkedin.com/pub/stijn-geuens/

F-59000 Lille stijn.geuens

AMS World Marketing Congress 2016 2007/20/2016

Page 38: Towards Better Online Personalization: A Framework for Empirical Evaluation and Real-Life Validation of Hybrid Recommendation Systems

Appendix 1: Advantages and disadvantagesof different systems

[Burke, 2002]

AMS World Marketing Congress 2016 21

Collaborative Filtering

Content-based Knowledge-Based Demographic

Pros

No metadata engineering needed

Comparison between items possible

DeterministicNo metadata

engineering needed

Serendipity in resultsNo metadata

engineering neededNo cold-start Serendipity in results

Adaptive Adaptive

Cons

Scalability OverspecializationKnowledge engineering

requiredLong tail

Cold Start for new users and items

Cold start for new users Subjective Cold start for new users

Long tail problemCollection of product

informationStatic Static

Stability

07/20/2016

Page 39: Towards Better Online Personalization: A Framework for Empirical Evaluation and Real-Life Validation of Hybrid Recommendation Systems

Appendix 2: Experimental FrameworkData

22AMS World Marketing Congress 2016

Data

Product Data

Three main product division

Brand

Mean product rating

Internal vs. external

Availability on the web

Customer Data

Age

Gender

Marital status

Place of residence

Number of children

Age of children

Aggregated Behavioral Data

RFM

Time since last purchase

Number of total purchases

Total value of purchases

Relationship features

Length of Relationship

Value-based segmentation

Mean product rating

Raw Behavioral Data

Explicit ratings

Purchases

Internal search

Addition to cart

Views

07/20/2016

Page 40: Towards Better Online Personalization: A Framework for Empirical Evaluation and Real-Life Validation of Hybrid Recommendation Systems

Appendix 2: Experimental FrameworkData

AMS World Marketing Congress 2016 2307/20/2016

Page 41: Towards Better Online Personalization: A Framework for Empirical Evaluation and Real-Life Validation of Hybrid Recommendation Systems

Appendix 2: Experimental FrameworkData

AMS World Marketing Congress 2016 24

Product Category Visitors Products

Shoes 31,536 11,712

Children's Clothing 16,752 3,956

Decoration 12,747 5,054

Lingerie 11,672 3,514

Furniture 20,507 6,481

Men's Clothing 8,412 4,737

Women's Clothing 50,336 12,979

Household linen 12,376 2,934

07/20/2016

Page 42: Towards Better Online Personalization: A Framework for Empirical Evaluation and Real-Life Validation of Hybrid Recommendation Systems

Appendix 3: Experimental Framework:Recommendation Calculation

25

Factorization Machines Introduced by Rendle (2010)

Based on Support Vector Machines (SVM) and factorization models and combines the advantages of both.

SVM: Works with any real valued feature vector, allowing to integrated different data sources

Factorization Models: Variable interaction is calculated based on factorized parameters, allowing to estimate interaction under huge sparsity, where SVM’s fail.

General FM model equation of degree 2:

AMS World Marketing Congress 201607/20/2016