Upload
ramzi-alqrainy
View
599
Download
5
Embed Size (px)
DESCRIPTION
Most large-scale commercial and social websites recommend options, such as products or people to connect with, to users. Recommendation engines sort through massive amounts of data to identify potential user preferences. This presentation, the first in a two-part series, explains the ideas behind recommendation systems and introduces you to the algorithms that power them. In Part 2, learn about some open source recommendation engines you can put to work.
Citation preview
-
Recommender Systems
♣ Applica;on areas
- 3 -
In the Social Web
- 4 -
Even more …
♣ Personalized search
♣ "Computa;onal adver;sing"
- 5 -
About the speaker
♣ Ramzi Alqrainy -‐ Ramzi is one of the well recognized experts within Ar8ficial Intelligence and
Informa8on Retrieval fields in Middle East. Ac8ve researcher and technology blogger
with the focus on informa8on retrieval.
- 6 -
Agenda ♣ What are recommender systems for? -‐ Introduc8on
♣ How do they work (Part I) ? -‐ Collabora8ve Filtering
♣ How to measure their success? -‐ Evalua8on techniques
♣ How do they work (Part II) ? -‐ Content-‐based Filtering
-‐ Knowledge-‐Based Recommenda8ons -‐ Hybridiza8on Strategies
♣ Advanced topics -‐ Explana8ons
-‐ Human decision making
- 7 -
-
Why using Recommender Systems? ♣ Value for the customer -‐ Find things that are interes8ng
-‐ Narrow down the set of choices -‐ Help me explore the space of op8ons -‐ Discover new things -‐ Entertainment -‐
…
♣ Value for the provider -‐ Addi8onal and probably unique personalized service for the customer
-‐ Increase trust and customer loyalty -‐ Increase sales, click trough rates, conversion etc. -‐ Opportuni8es for promo8on, persuasion -‐ Obtain more knowledge about customers -‐
…
- 9 -
Real-‐world check
♣ Myths from industry -‐ Amazon.com generates X percent of their sales through the recommenda8on
lists (30 < X < 70) -‐ NeVlix (DVD rental and movie streaming) generates X percent of their sales through the recommenda8on lists (30 < X < 70)
♣ There must be some value in it -‐ See recommenda8on of groups, jobs or people on LinkedIn
-‐ Friend recommenda8on and ad personaliza8on on Facebook -‐ Song recommenda8on at last.fm -‐ News recommenda8on at Forbes.com (plus 37% CTR)
♣ Academia
-‐ A few studies exist that show the effect ♣ increased sales, changes in sales behavior
- 10 -
Problem domain
♣ Recommenda;on systems (RS) help to match users with items -‐ Ease informa8on overload
-‐ Sales assistance (guidance, advisory, persuasion,…)
RS are so)ware agents that elicit the interests and preferences of individual consumers […] and make recommenda<ons accordingly. They have the poten<al to support and improve the quality of the decisions consumers make while searching for and selec<ng products online. » [Xiao & Benbasat, MISQ, 2007]
♣ Different system designs / paradigms -‐ Based on availability of exploitable data
-‐ Implicit and explicit user feedback -‐ Domain characteris8cs
- 11 -
Recommender systems
♣ RS seen as a func;on [AT05]
♣ Given:
-‐ User model (e.g. ra8ngs, preferences, demographics, situa8onal context) -‐ Items (with or without descrip8on of item characteris8cs)
♣ Find:
-‐ Relevance score. Used for ranking.
♣ Finally:
-‐ Recommend items that are assumed to be relevant
♣ But:
-‐ Remember that relevance might be context-‐dependent -‐ Characteris8cs of the list itself might be important (diversity)
- 12 -
Paradigms of recommender systems
Recommender systems reduce informa;on overload by es;ma;ng relevance
- 13 -
Paradigms of recommender systems
Personalized recommenda;ons
- 14 -
Paradigms of recommender systems
Collabora;ve: "Tell me what's popular among my peers"
- 15 -
Paradigms of recommender systems
Content-‐based: "Show me more of the same what I've liked"
- 16 -
Paradigms of recommender systems
Knowledge-‐based: "Tell me what fits based on my needs"
- 17 -
Paradigms of recommender systems
Hybrid: combina;ons of various inputs and/or composi;on of different mechanism
- 18 -
Recommender systems: basic techniques
Pros
Cons Collabora8ve
No knowledge-‐
Requires some form of ra8ng engineering effort,
feedback, cold start for new users serendipity of results,
and new items learns market segments
Content-‐based
No community required,
Content descrip8ons necessary, comparison between
cold start for new users, no items possible
surprises
Knowledge-‐based
Determinis8c recommenda8ons, assured quality, no cold-‐ start, can resemble sales dialogue
Knowledge engineering effort to bootstrap, basically sta8c, does not react to short-‐term trends
- 19 -
- 20 -
Collabora;ve Filtering (CF)
♣ The most prominent approach to generate recommenda;ons -‐ used by large, commercial e-‐commerce sites
-‐ well-‐understood, various algorithms and varia8ons exist -‐ applicable in many domains (book, movies, DVDs, ..)
♣ Approach
-‐ use the "wisdom of the crowd" to recommend items
♣ Basic assump;on and idea
-‐ Users give ra8ngs to catalog items (implicitly or explicitly) -‐ Customers who had similar tastes in the past, will have similar tastes in the future
- 21 -
User-‐based nearest-‐neighbor collabora;ve filtering (1)
♣ The basic technique: -‐ Given an "ac8ve user" (Alice) and an item I not yet seen by Alice
-‐ The goal is to es<mate Alice's ra<ng for this item, e.g., by ♣ find a set of users (peers) who liked the same items as Alice in the past and
who have rated item I ♣ use, e.g. the average of their ra8ngs to predict, if Alice will like item I ♣ do this for all items Alice has not seen and recommend the best-‐rated
Item1
Item2
Item3
Item4
Item5 Alice
5
3
4
4
? User1
3
1
2
3
3
User2
4
3
4
3
5
User3
3
3
1
5
4
User4
1
5
5
2
1 - 24 -
User-‐based nearest-‐neighbor collabora;ve filtering (2)
♣ Some first ques;ons -‐ How do we measure similarity?
-‐ How many neighbors should we consider? -‐ How do we generate a predic8on from the neighbors' ra8ngs?
Item1
Item2
Item3
Item4
Item5
Alice
5
3
4
4
? User1
3
1
2
3
3
User2
4
3
4
3
5
User3
3
3
1
5
4
User4
1
5
5
2
1
- 25 -
Measuring user similarity
♣ A popular similarity measure in user-‐based CF: Pearson correla;on
a, b : users ra,p
: ra8ng of user a for item p P
: set of items, rated both by a and b Possible similarity values between -‐1 and 1;
ͿԨԩԪԫԬԭԮԯՠֈ֍֎ࢽࢼࢻࢺࢹࢸࢷࢶࢴࢳࢲࢱࢰࢯࢮࢭࢡࡪࡩࡨࡧࡦࡥࡤࡣࡢࡡࡠ߿߾ׯॸঀৼ৽ͿԨԩԪԫԬԭԮԯՠֈ֍֎ࢽࢼࢻࢺࢹࢸࢷࢶࢴࢳࢲࢱࢰࢯࢮࢭࢡࡪࡩࡨࡧࡦࡥࡤࡣࡢࡡࡠ߿߾ׯॸঀৼ৽, ͿԨԩԪԫԬԭԮԯՠֈ֍֎ࢽࢼࢻࢺࢹࢸࢷࢶࢴࢳࢲࢱࢰࢯࢮࢭࢡࡪࡩࡨࡧࡦࡥࡤࡣࡢࡡࡠ߿߾ׯॸঀৼ৽ͿԨԩԪԫԬԭԮԯՠֈ֍֎ࢽࢼࢻࢺࢹࢸࢷࢶࢴࢳࢲࢱࢰࢯࢮࢭࢡࡪࡩࡨࡧࡦࡥࡤࡣࡢࡡࡠ߿߾ׯॸঀৼ৽
= user's average ra8ngs
Item1
Alice 5 User1 3
Item2 Item3 3 4 1 2
Item4 Item5 4 ? 3 3
sim = 0,85 sim = 0,70 sim = -‐0,79 User2
4
3
4
3
5
User3
3
3
1
5
4
User4
1
5
5
2
1
- 26 -
Pearson correla;on
♣ Takes differences in ra;ng behavior into account
6
5
4
Ratings 3
2
1
0 Item1 Item2
Alice
User1
User4
Item3 Item4
♣ Works well in usual domains, compared with alterna;ve measures -‐ such as cosine similarity
- 27 -
Making predic;ons
♣ A common predic8on func8on:
♣ Calculate, whether the neighbors' ra8ngs for the unseen item i are higher or lower than their average
♣ Combine the ra8ng differences -‐ use the similarity as a weight ♣ Add/subtract the neighbors' bias from the ac8ve user's average and use
this as a predic8on
- 28 -
Making recommenda;ons
♣ Making predic;ons is typically not the ul;mate goal
♣ Usual approach (in academia)
-‐ Rank items based on their predicted ra8ngs
♣ However
-‐ This might lead to the inclusion of (only) niche items -‐ In prac;ce also: Take item popularity into account
♣ Approaches
-‐ "Learning to rank" ♣ Op8mize according to a given rank evalua8on metric (see later)
- 29 -
Improving the metrics / predic;on func;on
♣ Not all neighbor ra;ngs might be equally "valuable" -‐ Agreement on commonly liked items is not so informa8ve as agreement on
controversial items -‐ Possible solu;on: Give more weight to items that have a higher variance
♣ Value of number of co-‐rated items
-‐ Use "significance weigh8ng", by e.g., linearly reducing the weight when the number of co-‐rated items is low
♣ Case amplifica;on -‐ Intui8on: Give more weight to "very similar" neighbors, i.e., where the
similarity value is close to 1.
♣ Neighborhood selec;on -‐ Use similarity threshold or fixed number of neighbors
- 30 -
Memory-‐based and model-‐based approaches
♣ User-‐based CF is said to be "memory-‐based" -‐ the ra8ng matrix is directly used to find neighbors / make predic8ons
-‐ does not scale for most real-‐world scenarios -‐ large e-‐commerce sites have tens of millions of customers and millions of items
♣ Model-‐based approaches -‐ based on an offline pre-‐processing or "model-‐learning" phase
-‐ at run-‐8me, only the learned model is used to make predic8ons -‐ models are updated / re-‐trained periodically -‐ large variety of techniques used -‐ model-‐building and upda8ng can be computa8onally expensive
- 31 -
Item-‐based collabora;ve filtering
♣ Basic idea: -‐ Use the similarity between items (and not users) to make predic8ons
♣ Example:
-‐ Look for items that are similar to Item5 -‐ Take Alice's ra8ngs for these items to predict the ra8ng for Item5
Item1
Item2
Item3
Item4
Item5
Alice
5
3
4
4
? User1
3
1
2
3
3
User2
4
3
4
3
5
User3
3
3
1
5
4
User4
1
5
5
2
1
- 33 -
The cosine similarity measure
♣ Produces beber results in item-‐to-‐item filtering -‐ for some datasets, no consistent picture in literature
♣ Ra;ngs are seen as vector in n-‐dimensional space ♣ Similarity is calculated based on the angle between the vectors
♣ Adjusted cosine similarity -‐ take average user ra8ngs into account, transform the original ra8ngs
- 34 -
Pre-‐processing for item-‐based filtering
♣ Item-‐based filtering does not solve the scalability problem itself
♣ Pre-‐processing approach by Amazon.com (in 2003)
-‐ Calculate all pair-‐wise item similari8es in advance -‐ The neighborhood to be used at run-‐8me is typically rather small, because only items are taken into account which the user has rated
-‐ Item similari8es are supposed to be more stable than user similari8es
♣ Memory requirements
-‐ Up to N2 pair-‐wise similari8es to be memorized (N = number of items) in theory
-‐ In prac8ce, this is significantly lower (items with no co-‐ra8ngs) -‐ Further reduc8ons possible ♣ Minimum threshold for co-‐ra8ngs (items, which are rated at least by n users)
♣ Limit the size of the neighborhood (might affect recommenda8on accuracy)
- 35 -
More on ra;ngs
♣ Pure CF-‐based systems only rely on the ra;ng matrix
♣ Explicit ra;ngs
-‐ Most commonly used (1 to 5, 1 to 7 Likert response scales) -‐ Research topics ♣ "Op8mal" granularity of scale; indica8on that 10-‐point scale is berer accepted in
movie domain ♣ Mul8dimensional ra8ngs (mul8ple ra8ngs per movie) -‐ Challenge
♣ Users not always willing to rate many items; sparse ra8ng matrices ♣ How to s8mulate users to rate more items? ♣ Implicit ra;ngs
-‐ clicks, page views, 8me spent on some page, demo downloads … -‐ Can be used in addi8on to explicit ones; ques8on of correctness of interpreta8on
- 36 -
Data sparsity problems
♣ Cold start problem -‐ How to recommend new items? What to recommend to new users?
♣ Straigheorward approaches
-‐ Ask/force users to rate a set of items -‐ Use another method (e.g., content-‐based, demographic or simply non-‐ personalized) in the ini8al phase
♣ Alterna;ves -‐ Use berer algorithms (beyond nearest-‐neighbor approaches)
-‐ Example: ♣ In nearest-‐neighbor approaches, the set of sufficiently similar neighbors might
be to small to make good predic8ons ♣ Assume "transi8vity" of neighborhoods
- 37 -
Example algorithms for sparse datasets
♣ Recursive CF -‐ Assume there is a very close neighbor n of u who however has not rated the
target item i yet. -‐ Idea: ♣ Apply CF-‐method recursively and predict a ra8ng for item i for the neighbor
♣ Use this predicted ra8ng instead of the ra8ng of a more distant direct neighbor
Item1
Item2
Item3
Item4
Item5
Alice
5
3
4
4
? sim = 0,85
User1
3
1
2
3
?
User2 4 3
4 3 5 Predict
User3 3 3 User4 1 5
1 5 4 ra8ng for User1
5 2 1 - 38 -
Graph-‐based methods
♣ "Spreading ac;va;on" (sketch) -‐ Idea: Use paths of lengths > 3
to recommend items -‐ Length 3: Recommend Item3 to User1 -‐ Length 5: Item1 also recommendable
- 39 -
More model-‐based approaches
♣ Plethora of different techniques proposed in the last years, e.g., -‐ Matrix factoriza8on techniques, sta8s8cs
♣ singular value decomposi8on, principal component analysis -‐ Associa8on rule mining
♣ compare: shopping basket analysis -‐ Probabilis8c models
♣ clustering models, Bayesian networks, probabilis8c Latent Seman8c Analysis -‐ Various other machine learning approaches
♣ Costs of pre-‐processing -‐ Usually not discussed
-‐ Incremental updates possible?
- 40 -
Matrix factoriza;on
• SVD:
Uk Dim1
Alice 0.47
M
k
Dim2 -‐0.30
T =U×Σ×V
k k k
T
Vk Dim1
-‐0.44 -‐0.57 0.06 0.38 0.57 Bob -‐0.44 0.23
Dim2 0.58 -‐0.66
0.26 0.18 -‐0.36 Mary 0.70 -‐0.06
Sue 0.31 0.93
Σ Dim1 Dim2
k
• Predic;on:
r
ui =
r +U (Alice)×Σ u k
T ×V (EPL)
k k
Dim1 5.63 0 Dim2 0 3.23
= 3 + 0.84 = 3.84
- 43 -
Associa;on rule mining
♣ Commonly used for shopping behavior analysis -‐ aims at detec8on of rules such as
"If a customer purchases baby-‐food then he also buys diapers in 70% of the cases"
♣ Associa;on rule mining algorithms -‐ can detect rules of the form X => Y (e.g., baby-‐food => diapers) from a set of
sales transac8ons D = {t1, t2, … tn} -‐ measure of quality: support, confidence
- 44 -
Probabilis;c methods
♣ Basic idea (simplis;c version for illustra;on): -‐ given the user/item ra8ng matrix
-‐ determine the probability that user Alice will like an item i -‐ base the recommenda8on on such these probabili8es
♣ Calcula;on of ra;ng probabili;es based on Bayes Theorem -‐ How probable is ra8ng value "1" for Item5 given Alice's previous ra8ngs?
-‐ Corresponds to condi8onal probability P(Item5=1 | X), where ♣ X = Alice's previous ra8ngs = (Item1 =1, Item2=3, Item3= … )
-‐ Can be es8mated based on Bayes' Theorem
♣ Usually more sophis;cated methods used
-‐ Clustering -‐ pLSA …
- 45 -
Summarizing recent methods
♣ Recommenda;on is concerned with learning from noisy observa;ons (x, y), where
f(x)=ŷ
2
has to be determined such that is minimal.
∑ ˆ
( ˆ -y)
♣ A variety of different learning strategies have been applied trying to es;mate f(x)
-‐ Non parametric neighborhood models -‐ MF models, SVMs, Neural Networks, Bayesian Networks,…
- 48 -
Collabora;ve Filtering Issues
♣ Pros: -‐ well-‐understood, works well in some domains, no knowledge engineering required
♣ Cons: -‐ requires user community, sparsity problems, no integra8on of other knowledge sources,
no explana8on of results
♣ What is the best CF method? -‐ In which situa8on and which domain? Inconsistent findings; always the same domains
and data sets; differences between methods are o|en very small (1/100)
♣ How to evaluate the predic;on quality? -‐ MAE / RMSE: What does an MAE of 0.7 actually mean?
-‐ Serendipity: Not yet fully understood
♣ What about mul;-‐dimensional ra;ngs?
- 49 -
- 50 -
Recommender Systems in e-‐Commerce
♣ One Recommender Systems research ques;on -‐ What should be in that list?
- 51 -
Recommender Systems in e-‐Commerce
♣ Another ques;on both in research and prac;ce -‐ How do we know that these are good
recommenda8ons?
- 52 -
Recommender Systems in e-‐Commerce
♣ This might lead to … -‐ What is a good recommenda8on?
-‐ What is a good recommenda8on strategy? -‐ What is a good recommenda8on strategy for my
business?
These have been in stock for quite a while now …
- 53 -
What is a good recommenda;on? What are the measures in prac;ce? ♣ Total sales numbers ♣ Promo;on of certain items ♣
…
♣ Click-‐through-‐rates ♣ Interac;vity on plaeorm ♣
…
♣ Customer return rates ♣ Customer sa;sfac;on and loyalty
- 54 -
You have Ques8ons and we have Answers