Towards Better Online Personalization: A Framework for Empirical Evaluation and Real-Life Validation of Hybrid Recommendation Systems

Towards Better Online Personalization: A Framework for Empirical Evaluation and

Real-Life Validation of Hybrid Recommendation Systems

Stijn Geuens, Koen W. De Bock, Kristof Coussement

Recommendation Systems: Examples

207/20/2016 AMS World Marketing Congress 2016

How to Calculate Recommendations[Bobadilla et al. 2013; Adomavicius et al. 2008]

Classification based on calculation paradigm:

Classification based on input data:

3AMS World Marketing Congress 201607/20/2016

How to Calculate Recommendations[Bobadilla et al. 2013; Adomavicius et al. 2008]

Classification based on calculation paradigm: Memory-based [Goldberg, 1992]

Model-based [Koren, 2008]

Classification based on input data:

Socio-demographic information Demographic RecSys [eg. Pazzani 1999; Porcel et al. 2012]

Product characteristics Content-based RecSys [eg. Lang 1995; Meteren and Someren 2000]

Real-time navigation information Knowledge-based RecSys [eg. Burke 2000]

Behavioral history Collaborative filtering RecSys [eg. Herlocker et al. 2004]

Hybrid solutions [eg. Burke 2002; Preece and Sneiderman 2009]


A Shift Towards Hybrid Algorithms

Single data source systems: advantages and disadvantages [Bobadilla et al. 2013]

Hybridization resolves these issues and leads to better performance [Bobadilla et al. 2013]

Algorithm combination vs. data source combination [Bobadilla et al. 2013]

Burke’s classification [Burke, 2002]: Weighting

Feature combination


Contributions

Go beyond creation of a hybrid algorithm by:

Creation of a decision framework for marketing academics and professionals to guide them in their efforts to create recommendation systems

Opening the black-box of recommendation systems by introducing the concept of feature importance


Research Questions

6AMS World Marketing Congress 2016

Data:

Recommendation Calculation:

Feature Importance:

07/20/2016

Research Questions


Data:RQ1.a. Do Recommendation systems based on different single data sources differ in performance?RQ1.b. Does combining different data sources add predictive performance?

Recommendation Calculation:RQ2. Which hybridization technique performs best for algorithms with the optimal number of data sources?

Feature Importance:RQ3. Which are the most important predictors in the best performing algorithm?

07/20/2016

Framework

AMS World Marketing Congress 2016 707/20/2016

Framework

AMS World Marketing Congress 2016 8

[Song, 2000; Kohavi et al., 2004]

07/20/2016

Framework


[Rendle, 2010]

[Burke, 2002; Adomavicius & Tuzhilin, 2005]

07/20/2016

Framework


[Lipton, 2014]

[Herlocker et al., 2004]

[Breiman, 2003]

07/20/2016

Framework


Experimental Setup

8 different company specific datasets


Product Category Visitors Products

Shoes 31,536 11,712

Children's Clothing 16,752 3,956

Decoration 12,747 5,054

Lingerie 11,672 3,514

Furniture 20,507 6,481

Men's Clothing 8,412 4,737

Women's Clothing 50,336 12,979

Household linen 12,376 2,934

07/20/2016

Experimental Setup

Evaluation metric: F1@5 [Lipton, 2015]

Method of analysis:


Experimental Setup


Method of analysis: Evaluation: Data and Recommendation Calculation

Friedman aligned rank test with Li’s procedure for posthoc testing [Garçia, 2010]


Experimental Setup


Method of analysis: Evaluation: Data and Recommendation Calculation

Friedman aligned rank test with Li’s procedure for posthoc testing [Garçia, 2010]

Interpretation: Variable importance Implementation of Breiman’s (2003) method developed for random forests


𝐹𝑒𝑎𝑡𝐼𝑚𝑝𝑖 =𝐹1@5𝐹𝑢𝑙𝑙 − 𝐹1@5𝑅𝑎𝑛𝑑𝑜𝑚 𝑝𝑒𝑟𝑚𝑢𝑡𝑎𝑡𝑖𝑜𝑛

𝑖

𝐹1@5𝐹𝑢𝑙𝑙

𝐹𝑒𝑎𝑡𝐼𝑚𝑝𝑎𝑔𝑔𝑟𝑖 =

1𝑑 𝐹𝑒𝑎𝑡𝐼𝑚𝑝𝑖

𝑑

07/20/2016

Results: Data

RQ1.a. Do Recommendation systems based on different single data sources differ in performance?


---- indicate a non-significant difference @ 95% CI

07/20/2016

Results: Data


Yes, there is a difference in performance of different single data source recommendation sytems



07/20/2016

Results: Data


Yes, there is a difference in performance of different single data source recommendation sytems

A company focusses best on a RBD (or PD) based recommendation sytem whenbuilding a single data source recommender system



07/20/2016

Results: Data

RQ1.b. Does combining different data sources add predictive performance?


…... indicate a marginally significant difference

07/20/2016

Results: Data


Yes, performance increases when adding data sources



07/20/2016

Results: Data


Yes, performance increases when adding data sources

It is worthwhile for a company to investigate data source combination to improve performance of recommendation systems



07/20/2016

Results: Recommendation Calculation

RQ2. Which hybridization technique performs best for algorithms with the optimal number of data sources?




Factorization machines are out performing an a posteriori weighting of single data source algorithms




Factorization machines are out performing an a posteriori weighting of single data source algorithms

It is worthwhile for a company to investigate advanced hybridization techniques to improve the performance of recommendation systems


Results: Feature Importance

RQ3. Which are the most important predictors in the best performing algorithm?

Within the best performing algorithm (RQ1 and RQ2), distinction can be made between data source importance scores. RBD > PD > CD > ABD


0% 5% 10% 15% 20% 25% 30% 35% 40%

Aggregated Behavioral Data

Customer Data

Product Data

Raw Behavioral Data

07/20/2016

Results: Feature Importance


0% 2% 4% 6% 8% 10% 12% 14%

Number of total purchases

Mean product rating

Total value of purchases

Length of relationship

Time since last purchase

Internal vs external

Value-based segmentation

Mean Product Rating

Explicit ratings

Number of children

Marital Status

Place of residence

Age of Children

Brand

Gender

Age

Internal search

Product Division 3

Product Division 2

Product Division 1

Purchases

Addition to cart

Views

RBDPDCDABD

07/20/2016

Conclusions

A framework to guide marketing professionals and academics in their efforts to create recommendation systems

Empirical validation of the framework on 8 datasets:


Conclusions



Single data sources recommendation systems differ in performance


Conclusions




Combining data sources adds to the performance of recommendation systems


Conclusions





An advanced combination technique based on feature combination outperforms a posteriori weighting of single data source algorithms


Conclusions





An advanced combination technique based on feature combination outperforms a posteriori weighting of single data source algorithms

RBD is the most important data source in the best performing model followed by PD, CD, and finally ABD


Future Work

Incorporation of other evaluation metrics in the framework

Field test Evaluation of different recommendation strategies in terms of business metrics

Identification of the relationship between ‘academic’ metrics and business metrics


References

J. Bobadilla, F. Ortega, A. Hernando, A. Gutierrez, Recommender systems survey, Knowl.-Based Syst., 46 (2013) 109-132

] G. Adomavicius, A. Tuzhilin, Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions, IEEE Trans. Knowl. Data Eng., 17 (2005) 734-749

Y. Koren, Factorization meets the neighborhood: A multifaceted collaborative filtering model, 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, Las Vegas, NV, 2008, pp. 426-434

M.J. Pazzani, A framework for collaborative, content-based and demographic filtering, Artif. Intell. Rev., 13 (1999) 393-408

C. Porcel, A. Tejeda-Lorente, M.A. Martinez, E. Herrera-Viedma, A hybrid recommender system for the selective dissemination of research resources in a technology transfer office, Inform. Sciences, 184 (2012) 1-19

R. Burke, Hybrid recommender systems: Survey and experiments, User Modeling and User-Adapted Interaction, 12 (2002) 331-370


References

J.L. Herlocker, J.A. Konstan, L.G. Terveen, J.T. Riedl, Evaluating collaborative filtering recommender systems, ACM Trans. Inf. Syst., 22 (2004) 5-53

I.-Y. Song, Database Design for Real-World E-Commerce Systems, IEEE Data Engineering Bulletin, 23 (2000) 23-28.

R. Kohavi, L. Mason, R. Parekh, Z. Zheng, Lessons and Challenges from Mining Retail E-Commerce Data, Mach. Learn., 57 (2004) 83-113

S. Rendle, Factorization Machines, IEEE International Conference on Data Mining, Sydney, Australia, 2010

Z.C. Lipton, C. Elkan, B. Naryanaswamy, Optimal thresholding of classifiers to maximize F1 measure, in: T. Calders, F. Esposito, E. Hüllermeier, R. Meo (Eds.) Machine Learning and Knowledge Discovery in Databases, Springer Berlin Heidelberg 2014, pp. 225-239

L. Breiman, Random forests, Mach. Learn., 45 (2001) 5-32


Thank you for your Attention

Contact:Stijn Geuens (0)3.20.545.892

IESEG School of Management [email protected]

3 Rue de la Digue fr.linkedin.com/pub/stijn-geuens/

F-59000 Lille stijn.geuens


Appendix 1: Advantages and disadvantagesof different systems

[Burke, 2002]


Collaborative Filtering

Content-based Knowledge-Based Demographic

Pros

No metadata engineering needed

Comparison between items possible

DeterministicNo metadata

engineering needed

Serendipity in resultsNo metadata

engineering neededNo cold-start Serendipity in results

Adaptive Adaptive

Cons

Scalability OverspecializationKnowledge engineering

requiredLong tail

Cold Start for new users and items

Cold start for new users Subjective Cold start for new users

Long tail problemCollection of product

informationStatic Static

Stability

07/20/2016

Appendix 2: Experimental FrameworkData


Data

Product Data

Three main product division

Brand

Mean product rating

Internal vs. external

Availability on the web

Customer Data

Age

Gender

Marital status

Place of residence

Number of children

Age of children

Aggregated Behavioral Data

RFM

Time since last purchase

Number of total purchases

Total value of purchases

Relationship features

Length of Relationship

Value-based segmentation

Mean product rating

Raw Behavioral Data

Explicit ratings

Purchases

Internal search

Addition to cart

Views

07/20/2016





Product Category Visitors Products

Shoes 31,536 11,712

Children's Clothing 16,752 3,956

Decoration 12,747 5,054

Lingerie 11,672 3,514

Furniture 20,507 6,481

Men's Clothing 8,412 4,737

Women's Clothing 50,336 12,979

Household linen 12,376 2,934

07/20/2016

Appendix 3: Experimental Framework:Recommendation Calculation

25

Factorization Machines Introduced by Rendle (2010)

Based on Support Vector Machines (SVM) and factorization models and combines the advantages of both.

SVM: Works with any real valued feature vector, allowing to integrated different data sources

Factorization Models: Variable interaction is calculated based on factorized parameters, allowing to estimate interaction under huge sparsity, where SVM’s fail.

General FM model equation of degree 2:

AMS World Marketing Congress 201607/20/2016

Data & Analytics

Towards Better Online Personalization: A Framework for Empirical Evaluation and Real-Life Validation of Hybrid Recommendation Systems