View
215
Download
0
Category
Preview:
Citation preview
8/7/2019 Survey Recomender System Algorithm
1/33
Survey of Recommendation
Systems and Algorithms
Term Paper for
EE 380L: DATA MINING
Spring 2000
By
Yuan Qu
Xiaoyun Yang
Tianping Huang
8/7/2019 Survey Recomender System Algorithm
2/33
May 5, 2000
Table of Contents
I. Introduction ........................................................ 3
II. Recommendation Systems................................... 4
I. Algorithms.......................................................... 14
II. Discussion.......................................................... 29
III. Reference........................................................... 31
2
2
8/7/2019 Survey Recomender System Algorithm
3/33
1. Introduction
In our daily life, we make our choices at most cases relying on recommendations
from other people either by word of mouth, recommendation letters, movie and book
reviews printed in newspapers, or general surveys. In this information age, each day tons
of news published through the Internet. This leads to a clear demand for automated
methods that locate and retrieve information with respect to users individual interests.
More and more people accessing the Internet also provide new possibilities to organize
and recommend information.
Recommendation systems can assist and augment this natural social process.
These systems can recommend what you want according what you want in previous time.
The main purpose of the recommendation systems is to provide tools for people to
leverage the information hunting and gathering activities of other people or groups of
people. Recommendation systems have been an important application area and the focus
of considerable recent academic and commercial interest.
Recommendation systems basically are divided into two categories. One is called
content-base filtering; the other is collaborative filtering (or social filtering). In content-
based filtering system, each user is assumed to operate independently. As a result,
document representations in content-based filtering systems can exploit only information
that can be derived from document contents. In collaborative filtering system, the
representation of a document is based on an evaluation to that document made by prior
readers of the document. They consider that communities of shared interest could be
automatically identified by exchanging this sort of information. In practical, collaborative
3
3
8/7/2019 Survey Recomender System Algorithm
4/33
filtering system provides a basis for selection of information items, regardless of whether
their content can be represented in a way that is useful for selection. In this paper, the
focus will be on the collaborative filtering.
Collaborative Filtering was presented by the developers of the first
recommendation system, Tapestry, in 1992 [Goldberg, et al. 1992]. Several years later
the concept of collaborative filtering had already applied in dozens of publicly available
systems, several proprietary systems, and even some commercially available systems. In
1996, dozens of the researchers in the academic and business areas gathered at the UC-
Berkeley to share their ideas and experiences about these emerging filtering methods
[Collaborative Filtering workshop, 1996]. They presented the vision and definition of
collaborative filtering, and provided some applications of this technique. Right now
more and more published articles demonstrated their applications of the collaborative
filtering methods.
In this paper, a survey was made for all the recommendation systems available in
the Internet. Then, the characteristics of each recommendation system are displayed. And
last, some algorithms of famous recommendation systems are introduced in detail.
II. Recommendation Systems
There are a lot of recommendation systems on web sites. According to the
purposes of their application, the recommendation systems can be classified into three
categories [Resnick, 1997], shown in Figure 1.
4
4
8/7/2019 Survey Recomender System Algorithm
5/33
Figure 1. The recommendation systems categories
The systems in first category are used for recommending movies, music, videos or
other services. In this category, the database is relative stable, such as the population
database, it may not be changed in years. The typical systems include EachMovie,
Firefly, and Morse. The second category is used for news or articles in a newsgroup.
The users in the newsgroup generally have the similar goals or interests. The database is
also relative stable. It may be updated in weeks or short time. The representatives of
these systems are Tapestry, GroupLens, and Lotus Notes. The last one is for web pages
recommendation. The information in this category is dynamic, that means, the new page
can be added or deleted in the system at any time. At the same time, the users may have
different tastes. Phoaks, GAB, and Fab are most useful systems of this kind.
The brief introduction of each recommendation system is given as follows:
5
5
E a c h M
M o r s
F i r e f
. . .
m o v i e s
T a p e
G r o u p
L o t u s
. . .
n e w s o r
P h o a
G A B
F a b
. . .
w e b p a
r e c o m m e n d
8/7/2019 Survey Recomender System Algorithm
6/33
Do-I-Care
When a user revisits a favorite Web page, Do-I-Care [Turnbull, 1998;
Collaborative Filtering workshop, 1996] system provides a function that alerts the user
when this Web page is changed. The system uses the model-based algorithm. It uses
Bayesian classifier technology. After some users training the model many times, the other
users can get good prediction.
According to the report from Mark Ackerman (U. of California-Irvine)
[Collaborative Filtering workshop, 1996], the accuracy of Do-I-Care can reach 70-90%.
It is said the accuracy of the system reaches 100% in tracking airline fare sales
application.
Fab
In a collaborative filtering system, if a new item or new user enters the system,
the system has no clue to calculate the similarity between users and the system has no
way to consider the new item unless some users have rated it, or recommended it. This
problem is called cold-start problem. But for content-based filtering, there does not exist
such problem. To eliminate this problem, Fab recommendation system [Turnbull, 1998]
combines both collaborative and content-based filtering systems.
The Fab system is a web based recommendation service that incorporates both
collaborative and content-based filtering methods. Users profiles are constructed as a
collection of keywords contained in those documents that each user rate highly.
6
6
8/7/2019 Survey Recomender System Algorithm
7/33
Documents are presented for rating when either the content of the document matches
previous documents that were rated highly, or neighboring users rate a document highly.
Every time a favorable or unfavorable rating is received, the profile of the user is updated
to reflect the new rating.
Collection agents are sent out over the web to look for documents with specific
content, each agent using a different set of keywords. After retrieving the documents,
they are passed to a central server where a selection agent matched to each user's profile,
scours through the documents looking for interesting material. Relevant documents are
then presented to the user for rating. This rating dynamically affects the selection agents
behavior and changes the user's profile. The rating also affects the collection agent that
retrieved the document. Unpopular collection agents are removed and replaced with more
successful ones over time.
The Fab system combines the best features of both content-based and
collaborative filtering methods and also manages to keep the system dynamically updated
to the current users' tastes. One potential shortcoming is Fab's reliance on explicit user
feedback.
Firefly
The system [Turnbull, 1997 and 1998] is based on similarities of users to provide
recommendation. At the beginning, this system was used for music and movies
recommendation. Right now it extends to other media recommendation, such as
newsgroup, books, and web pages.
7
7
8/7/2019 Survey Recomender System Algorithm
8/33
The system used users profiles as input, and used constrained Pearson algorithm
to make the best predictions between users. The basic idea of the algorithm is: a) the
system maintains a user profile, which includes like or dislike of specific items; b) the
system compares the similarities of users and decides which kind of users that the user
belongs, and c) according to the similar users profile and gives a good recommendation.
GAB
GAB [Wittenburg, et. al., 1998] stands for group asynchronous browsing. The
idea of GAB system is that the system collects and merges bookmarks and hotlists files
of users and then serves these files to users. That means, the system has the ability to
reach users bookmarks and extract information. This raises privacy concerns. To
overcome the privacy problem, the system has provide a mechanism to let user save
his/her bookmark in private or public.
The system uses multi-tree data structure for the bookmarks. To avoid getting
lost in hyperspace and to increase the connectivity in merged subject tree database, the
system has defined sibling and cousin relations. Sibling relation of item A and B means
that A and B belong to the same specific subject, while cousin relation of A and B means
that A and B belong to the broad subject but not the same specific subject. The system
also has applied for monitoring the change of content of web page.
Grassroots
8
8
8/7/2019 Survey Recomender System Algorithm
9/33
Grassroots system [Turnbull, 1998] is described as "A System Providing A
Uniform Framework for Communicating, Structuring, Sharing Information, and
Organizing People.
This system provides a special interface of Web pages to access all of the
information it works with. In practice, Grassroots also lets participants continue using
other mechanisms, and takes as much advantage of them as possible. The main engine in
the Grassroots system is a Web server and Proxy server setup that can be used with any
Web browser.
GroupLens
Resnick [Resnick, et al. 1994] presented the GroupLens system, which is built
based on a simple premise "the heuristic that people who agreed in the past will probably
agree again". This system uses the same Pearson algorithm to provide algorithm. At
early stage, the system uses explicit vote ( 1 to 5 scale, 1 stands for dislike it, 5 for like
it). The updated one also includes using implicit method to get the feedback from the
user, such as monitoring reading time. The most characteristics of the system are its
openness and scalability.
Openness means that this system provides other researchers an access to create
clients that work with the system servers or to even change those servers if there are
better improvements. When users number increases, the system still can provide
accurate prediction but the database for the system or the calculation time will be very
huge.
9
9
8/7/2019 Survey Recomender System Algorithm
10/33
Letizia & Lets Browse
Let's browse and its predecessor, Letizia, [Lieberman, 1996; Pryor, 1998] are web
agents that assist a user during his/her browsing experience. By monitoring a users
behavior, or browsing time on a web page, Letizia system learns the users interests and
provides recommendation. Lets Browse, improved from Letizia, provides
recommendation by using groups profiles instead of by using a single profile. If
multiple users are reading the same page at the same time, Lets Browse can determine
which users are in the area of monitor, and use their profiles to provide recommendation
sites for entire group.
Lotus Notes
Lotus Notes [Turnbull, 1998] is a system that is used as a foundation for
Collaborative Filtering techniques. The system serves for the newsgroup. All Notes
Users should have similar goals or information interests because they are working in the
same group
Lotus provides a feature to let people annotate documents. After annotation, the
user can send or distribute these links or comments to others. To protect users privacy,
the system uses an agent to represent an individual. These agents extract significant
phrases from the document that the user reads, and then exchange the learning results
anonymously.
10
10
8/7/2019 Survey Recomender System Algorithm
11/33
Mosaic
Mosaic system [Turnbull,1997] was the first Web tool that facilitated
collaborative. Like recommendation system Pointers, the Mosaic users in the system can
publish and distribute the bookmarks and add the comments to the web page. This
simple feature enabled users to actively share information with others.
PHOAKS
Terveen [Terveen et. al, 1997] first introduced PHOAKS (People Helping One
Another Know Stuff) system that recommends the URLs that will be very interesting to
users. The system will automatically recognize web resource references in a new group
message and then attempt to classify it, and introduce it to other users. That means the
system scans and checks the groups messages and then gets the most important URLs in
theses messages. After sorting these links, the system recommends this URLs to users.
The system uses implicit feedback and also considers the role specialization.
Pointers
This system [Maltz, 1995] is implemented inside Lotus Notes environment. As
we know if one person is an expert in these areas, then other users in this group would
like to see his/her recommendation. So the system provide a mechanism to let the
information mediators in a workgroup easily distribute references and commentary of
11
11
8/7/2019 Survey Recomender System Algorithm
12/33
documents they find. This mechanism is realized by using pointer. This pointer is
consists of URL link, contextual information, and optimal comments by the sender. The
system is very easy to use but not anonymous.
Siteseer
Siteseer [Turnbull, 1997] is a collaborative system using web browser bookmarks
to find neighbors and recommend sites. Users with significant overlap in bookmark
listings are determined to be close to one another, allowing previously unvisited sites to
be recommended to one another.
Tapestry
This is the first collaborative recommendation system [Goldberg, 1992]. It uses
free annotations or explicit like it or hate it annotations. This system is used for
newsgroup. So it is not easy for the group exploring new area.
Yahoo!
Turnbull [Turnbull, 1998] considered Yahoo! as a recommendation system that
uses manual way to realize collaborative filtering. They have one expert to update
Yahoo! Index as quickly as possible. That means that every site is examined by a people
when it is added. Also the system allows web users to submit pages. Because of its
12
12
8/7/2019 Survey Recomender System Algorithm
13/33
openness, the form of Yahoo! index has become very popular and has become a
classification standard.
WebWatcher
The WebWatcher system [Joachims, 1996] likes a tour guide in a museum. It
provides interactive communication between server and users and provides
recommendation. The user who enter the system can ask question by typing what is
his/her interest, and then the system will recommend the related web sites. This is not the
same thing as keyword-based search engine. It does use the user profile and other users
previous tour, and calculate the similarities of users and predict the users interest. The
system also uses the users experience to reinforce learning.
III. Algorithms on Collaborative Filtering
Today recommendation systems have been used in many fields, virtually all
topics that could be of potential interest to users are covered by special purpose
recommendation systems: Web pages, news stories, emails, movies, music videos, books,
CDs, restaurants, and many more. These recommendation systems predict the users
interest and preference based on all users profiles, using information retrieval
techniques. The underlying techniques used in todays recommendation systems fall into
two distinct categories: content-based filtering and collaborative filtering methods. The
content-based filtering uses actual content features of items, while the collaborative
13
13
8/7/2019 Survey Recomender System Algorithm
14/33
filtering predict new users preference using other users rating, assuming the like-
minded people tend to have similar choices. Here, we concentrate on the algorithms used
on the collaborative filtering.
Collaborative filtering or recommender systems predict additional topics or
products of a new user might like, based on a user preference database. There have been
a lot of collaborative filtering algorithms. Breese, et.al.,1998, classified these algorithms
into two categories: Memory-based Algorithm and Model-based Algorithms. Based on
their classification, we collect and classified the available algorithms so far on
Collaborative Filtering.
Memory-based Algorithms
The reason that they define these algorithms as memory-based algorithm is
because that these algorithms operate over the entire user database to make predictions.
Basically, these algorithms all try to find the similarity or correlation between the new
active user and other users in the database. All users preferences could be represented by
their votes (explicit or implicit) to the products (which could be anything related to the
users interests.). The new user has an average vote over the products he/she has rated.
Then the predicted votes of the new users over other products could be calculated by
adding weighted sum of other users votes. The weights could be determined by the
similarity between the new user and other users. The more similar they are, the more
contributions they have to the sum, so the large the weights are. The users average vote
14
14
8/7/2019 Survey Recomender System Algorithm
15/33
could be represented as below, the iI is the set of items the new user i has voted, ijv is
the user i vote to product j. Then the average vote is:
=iIj
ji
i
i vI
v ,||
1
The predicted vote of the new (active) user is:
= +=
n
iijiaja vviawkvp
1,, ))(,(
where the k is a normalizing factor, while ),( iaw is the weight that the user i
contributes to the active user.
The weights are calculated by comparing a set of common products, which the
active user and all other users in the database have rated. Here we collected three major
methods to define the weights.
Mean Squared Differences:
This method defines the weight as the inverse of the mean square distance.
2)(
1),(
aj VViaw
=
Pearson Correlation:
15
15
8/7/2019 Survey Recomender System Algorithm
16/33
=
j jijiaja
jijiaja
vvvv
vvvviaw
2
,
2
,
,,
)()(
))((),(
Vector Similarity:
This method defines the weight based on the angle size between the active user
vector and the other user vector.
=
ia Ikki
ji
jIk ka
ja
v
v
v
viaw2
,
,
2
,
,),(
Improvement on Memory-based Algorithms
In order to improve the performance of standard memory-based algorithms,
several modifications are proposed.
Default Voting:
book1 book2 book3 book4 book5 book6
user 1 5 1
user 2 3 1 5user 3 3 5 4
user 4 4 2 ?
16
16
8/7/2019 Survey Recomender System Algorithm
17/33
Usually, we are dealing with very sparse databases, also there are a lot of products which
users didnt vote on (explicit or implicit). When using memory-based algorithms, we are
only using the entries at the intersection. For the example above, to calculate the weight
user1 contributes, we can only use the rates for book1. In order to deal with this problem,
default votes are introduced. In most case, a neutral or negative preference is given to the
unobserved products. So the union of voted set could be used in weights calculation
instead of intersection. But this method may not necessarily improve the performance of
the memory-based algorithms, an unobserved product may not mean that its less
interesting.
Inverse User Frequency:
The idea of inverse user frequency is that universally liked products are not as
useful as the less common products in capturing the similarity between users. So the
weight is modified by introducing a jf , which is defined as below:
j
jn
nf log=
Where n is total number of users, while jn is the total number of users who have
voted for product j. Then the relative correlation weight would be
UV
vfvfvvffiaw
j j j j jijjajjijajj =
))(()(),(
,,,,
17
17
8/7/2019 Survey Recomender System Algorithm
18/33
Where,
=j j j
jajjajj vfvffU ))((2
,
2
,
=j j j
jijjijj vfvffV ))((2
,
2
,
Case Amplification:
Case amplification emphasizes the contribution of the most similar users to the
prediction by amplifying the weights close to 1. The new weights are calculated as
below:
{0)(
0
,,
,,',
8/7/2019 Survey Recomender System Algorithm
19/33
categories. See the same example below, this time the original 4 by 6 matrix is changed
to be 4 by 3 and users have more common votes.
book1 book2 book3 book4 book5 book6
user 1 5 1
user 2 3 1 5
user 3 3 5 4
user 4 4 2 ?
catagory1 category2 catagory3
The new votes of users to categories are calculated as below:
cjvv jici = ,,,
Now the entry of the new matrix is the average over the votes of the products per each
category for a given user.
The categories could be pre-defined or unknown. To deal with unknown
categories, EM algorithm could be used.
The method could be used on all other algorithms (including the Model-based
Algorithms). We put it here because the original author uses it along with the correlation
algorithm.
Model-based Algorithms
19
19
8/7/2019 Survey Recomender System Algorithm
20/33
Model-based algorithms first generate a descriptive model by compiling the users
preferences; recommendations are then predicted by appealing to the model. From a
probabilistic perspective, the collaborative filtering can be viewed as calculating the
expected value of a vote, given users profile or previous votes.
=
===m
i
akajajajaiIkvivvEP
0
,,,,),|Pr()(
Cluster Models:
Based on the idea that there are certain groups or types of users capturing a
common set of preferences and tastes, Breese, et.al, proposed a cluster method, in which
like-minded users are classified into the same group. Given a users class membership,
the users votes are assumed to be independent, then the joint probability of class and
votes could be calculated by the nave Bayes formulation,
n
i
in cCvcCvvcC1
1 )|Pr()Pr(),...,,Pr(=
====
Once we know the probability of observing an individual of a class with a set of votes,
the expectation of the future vote could be easily calculated. Since the classes and
number of class are unknown, EM algorithm is used to find the model structure with
maximum likelihood.
20
20
8/7/2019 Survey Recomender System Algorithm
21/33
Ungar [Unger, et. al.,1998] proposed a new clustering methods, unlike the
standard cluster models, they assume that people are from classes: e.g, intellectual or fun
and products are also from classes. Here is an example in their paper,
Batman Rambo Andre Hiver Whispers Star Wars
Lyle y y
Ellen y y y
Jason y y
Fred y y
Dean y y y
In this movie database example, people can be classified as intellectual or fun,
and movies could belong to three categories: action, foreign, classic. y in the table
means people like the movies associated. For each person/movie pair, the probability that
there is a y in the table is
action foreign classic
intellectual 0/6 5/9. 2/3.fun 3/4. 0/6 2/2.
Based on the observation above, they establish a model, which contains three sets
of parameters: kP (probability a random person is in class k), lP (probability a random
movie is in class l), klP (probability a person in class k is linked to a movie in class l).
Here, the class assignments are unknown. They tried repeated clustering and
Gibbs sampling methods. In repeated clustering method, firstly, people are clustered
based on movies and movies based on people; on the second, and later passes, people are
clustered based movie clusters and movies based on people clusters. To do clustering,
21
21
8/7/2019 Survey Recomender System Algorithm
22/33
they use k-means clustering instead of EM algorithm due to the constraint that a person is
always in the same class and a movie is always in the same class. They claimed that the
Gibbs sampling method over-performances repeated clustering.
Bayesian Network Models:
An alternative model formulation for probabilistic collaborative filtering is a
Bayesian belief network with a node corresponding to each product in the database. The
missing data can be represented by a no vote value. After applying an algorithm to train
the belief network, in the resulting network, each item will have a set of parent items that
are the best predictors of its votes. A decision tree could be used to represent the
conditional probability table.
Neural Network Models:
Similar as the Bayesian Network models, collaborative filtering can be seen as a
classification task. Based on a set of ratings from users for products, we could induce a
model for each user that allows us to classify unseen products into two or more classes.
The missing data could be indicted by a no vote state. Here is an example given in
Billsus [Billsus, D. and Pazzani, M., 1998] paper.
I1 I2 I3 I4 I5
U1 4 3
U2 1 2
U3 3 4 2 4
U4 4 2 1 ?
22
22
8/7/2019 Survey Recomender System Algorithm
23/33
Where Ui is the ith user, Ii is the ith item. Users rate the items from 1 to 4, while 4 is the
highest rating. Since finally they only recommend the items the active user would like,
they reform the rating matrix by replacing rating > 2 by 1 otherwise 0. To represent the
no vote value, they further split every user set into two sets (like and dislike).
E1 E2 E3
U1 like 1 0 1
U1 dislike 0 0 0
U2 like 0 0 0
U2 dislike 0 1 0
U3 like 1 1 0
U3 dislike 0 0 1
Class like dislike dislike
Here U4s ratings for I1, I2, I3 are class labels. After converting a data set of user ratings
for items into this format, we can apply virtually any supervised learning algorithm.
Other Algorithms
A hybrid memory- and model-based approach:
Pennock [Pennock, David M. and Horvitz, Eric 1999] proposed a CF method
called personality diagnosis (PD) which can be seen as a hybrid between memory- and
model-based approaches. All data is maintained throughout the process, new data can be
added incrementally, and predictions have a meaningful probabilistic semantics.
In this algorithm, each users preferences are interpreted as a manifestation of
their underlying personal type. Based on the fact that users voting are affected by the
other environmental factors, such as previous users votes, current users mood , they
23
23
8/7/2019 Survey Recomender System Algorithm
24/33
assumed that all users report their rating with Gaussian noise. If we define a users
personality type as a vector of true ratingtrue
iV , then user is actually rating could be
drawn from an independent normal distribution,
22 2/)(
,, )|Pr(yxtrue
jiji ekyvxv
===
Where is a free parameter.
They further assumed that the distribution of voting vector in the database is
representative of the distribution of that in target population of users. So we have,
nVV i
truea
1)Pr( ==
Where n is the total number of users in the database. Then the probability that the active
user has the same personality type with any other user can by calculated by applying
Bayes rule.
)Pr()|Pr()|Pr(
),...,|Pr(
,,,1,1,11,
,11,
itrue
amitrue
mammaitrueaa
mmaaitrue
a
VVvvxvvvxv
xvxvVV
=====
===
Then the active users vote of an unseen product would be,
24
24
8/7/2019 Survey Recomender System Algorithm
25/33
=====
====
),
,...,11,
|()|,
(
),
,...,11,
|,
(
mx
mavx
av
iV
truea
Vr
pi
Vtruea
Vj
xja
vr
p
mx
mavx
av
jx
jav
rp
Improvements:
Now we have seen the memory-based and model-based collaborative filtering
methods. Both methods have their advantages and drawbacks. Memory-based methods
are simple and easy to implement. But they may be time- and space- consuming. At lease,
for memory-based methods, its hard to handle two problems mentioned below:
1) Missing data: To find the similarity between users, the difference (distance) between
users has to be computed. If there are missing data, either only the products which all
users voted are used, or give a vote to missing data. In first case, it has problem with
sparse databases. In second case, giving average votes or somewhat negative votes to
the missing data may shadow the similarity between users.
2) Memory-based methods can not handle the situation that two user are very similar but
have not rated the same set of products. For example,
product1 product2 product3 product4 product5 product6
user1 1 0 1 1 1
user2 0 1 1 1 1
user3 1 ?
User1 and user2 are very similar in this example, however, when we use memory-
based methods to predict user3s preference on product6, only user1s votes could be
used to predict.
25
25
8/7/2019 Survey Recomender System Algorithm
26/33
For model-based methods, clustering methods could somewhat handle missing data
by clustering products into fewer categories, the new votes for categories are averaged
over available votes for the products in the category. But Clustering methods may over-
generalize, and hurt the performance. Bayesian network or neural network models could
handle the missing data and the problem (2) mentioned before reasonably well. But for
large databases containing many users, we will end up with thousands of features while
our amount of training data is very limited, those models will become not practical.
Recently, a promising algorithm is proposed. The idea is that users are rating their
products based on the latent features of products. All products in the database share a set
of common features. Users rate products highly because they rate those features highly.
So by factoring peoples ratings into features using linear algebra, we could predict how
users will react to documents they have not seen before based on their preferences for
these features. Singular Value Decomposition (SVD) allows us to break down data sets
into these components and analyze the principal components of the data. We will see
below how SVD could be used to capture the hidden features and help to reduce the
dimension of databases.
Singular Value Decomposition:
The user rating vectors can be represented by a m n matrix A, with m users and
n products,
][ , jiaA=
26
26
8/7/2019 Survey Recomender System Algorithm
27/33
Where jia , is the rating of user i for product j . Through singular value
decomposition, A can by factored into TUSV , where U and V are orthogonal matrices
and the S is a zero matrix, except for the diagonal entries which are defined as the
singular value of A. U is representative of the response of each user to certain features. V
is representative of the amount of each feature present in each product. S is a matrix
related to the feature importance in overall determination of the rating. Here is an
example given by Pryor [Pryor, H. Michael,1998] in his report. Suppose the rating matrix
A is,
=
4146
2573
6245
A
The SVD of A would be:
=
7278.04099.05498.0
0192.08136.05811.0
6855.04124.06000.0
U
=
0000.06550.10000.00000.0
0000.00000.09324.40000.0
0000.00000.00000.04890.14
S
=
1437.07031.05041.04805.0
6764.03306.05744.03213.0
6088.01835.04878.05982.03889.06023.04218.05551.0
V
27
27
8/7/2019 Survey Recomender System Algorithm
28/33
We can find that the feature described by 14.4890 in S is the most important
feature. So the dimension of S could drop off by selecting only most important features,
in this case only the one represented by 14.4890. Then the new rating matrix could be
generated, by converting the original rating matrix into the feature space.
USAV =
The new rating matrix M,
'USM =
In this case, [ ]4890.14'=S , after we get the new rating matrix M in the feature space.
We can implement memory-based or model-based methods on this new rating matrix. It
has been shown that exploiting latent structure in matrices of user ratings can lead to
improved predictive performance.
In current recommender systems, Content-Based Filtering (CBF) methods and
Collaborative Filtering (CF) Methods are used. CBF filters information based on
matching information content with users interests. CBF is able to filter information that
has not been evaluated by other people. So CBF and CF are combined in recommender
systems. CBF could be used to deal with unlearn products, while CF recommend new
products based on previous users votes.
IV. Discussion
28
28
8/7/2019 Survey Recomender System Algorithm
29/33
As we introduced above, the future recommendation systems should have
following features:
1) Solve the cold-start problem.
General collaborative recommendation systems have suffered this problem, that
is, system has no clue to recommend a new item to users or to provide an accurate
predictions for a new user. Since content-based filtering is based on the feature of the
item, there is no such cold-start problem. Fab system has integrated these content-based
fitering and collaborative filtering. Based on this integration, Michelle Keim Condliff et
al[1998], propose a Bayesian methodology for recommendation system. This proposal
uses Bayesian theory to give a good prediction by fully incorporating all of the available
data, such as user ratings, user features, and item features . Claypool [Mark Claypool, et
al. 1999] also provide an approach to solve this cold-start problem. This system bases on
a weighted average of the content-based filtering prediction and collaborative filtering
prediction.
2) Easy for users to participate or vote
Generally speaking, people do not like to provide recommendation although they
like to receive recommendation. Since the system depends on the votes of users and then
to calculate the similarities of users, so it is very important to get enough data from the
users. So the system should provide very easy interface for a user to vote or provide
annotation. Although explicit annotations or votes will leverage the calculation, implicit
feedback of the users will be more helpful to decrease the sparse matrices, which is used
for similarity calculation. The implicit methods include monitoring users behavior and
29
29
8/7/2019 Survey Recomender System Algorithm
30/33
monitoring users browsing time on the page. The longer time a person stays, the more
interesting the person shows. The system also can use compensation methods. For
example, if one needs further recommendation, one must vote what he reads.
3) Privacy
Privacy becomes an issue when a system collects information about its user, so
important social issue s arise on an individual scale as well. In collaborative filtering,
users share the document annotations. In one side, people do not like the release their
private identification, on the other side, people like to see who make the annotations. For
example, if annotation is provided by an expert in this area, people in this group would
like more to read this information. The system should provide a mechanism to allow user
to adopt a pseudonym, also it should provide different level of privacy protection.
4) Algorithm
The good algorithm should have following features:
1. handling missing data
2. handling sparse data
3. cost-efficiency
5. Reference:
Ariyoshi, Yusuke: 1999. Improvement of combination Information Filtering Method
based on Reliabilities. http://www-ai.cs.uni-dortmund.de/EVENTS/IJCAI99-
MLIF/papers.html
Billsus, D. and Pazzani, M., 1998. Learning Collaborative Filters. Proceedings of
ICML98, 46-53. Morgan Kaufman Eds.
30
30
http://www-ai.cs.uni-dortmund.de/EVENTS/IJCAI99-http://www-ai.cs.uni-dortmund.de/EVENTS/IJCAI99-8/7/2019 Survey Recomender System Algorithm
31/33
Breese, J., Heckerman, D., Kadie, C., 1998. Empirical Analysis of Predictive Algorithms
for collaborative Filtering. Proceedings of the Fourteenth Conference on
Uncertainty in Artificial Intelligence, Madison, WI.
Claypool, Mark; Gokhale, Anuja and Miranda, Tim et. al., 1999, Combining Content-
Based and Collaborative Filters in an online Newspaper.
http://www.cs.wpi.edu/~claypool/papers/content-collab/
Collaborative Filtering workshop, 1996, Berkeley, CA. Webpage:
http://www.sims.berkeley.edu/resources/collab/collab-report.htr.
Condliff, Michelle Keim; Lewis, David D.; Madigan, David and Posse, Christian ; 1998,
Bayesian Mixed-Effects Models for Recommender Systems.
http://www.cs.umbc.edu/~ian/sigir99-rec/
Goldberg, D. Nichols, D. Oki, B. M. and Terry, D.: Using collaborative filtering to weave
an information tapestry. Commun. ACM35, 12, 1992.
Joachims, Thorsten; Freitag, Dayne and Mitchell, Tom 1996, WebWatcher: A Tour
Guide for the World Wide Web.
http://www.cs.cmu.edu/afs/cs.cmu.edu/project/theo-6/web-agent/www/project-
home.html
Lieberman, H. 1996: Letizia: An Agent That Assists Web Browse, in MIT Media Lab.
Maltz, David and Ehrlich, Kate 1995: Pointing the way: active collaborative filtering.
http://www.acm.org/sigchi/chi95/Electronic/documnts/papers/ke_bdy.htm.
Oard, Douglas W. and Marchionini, Gary 1996, A Conceptual FrameWork for Text
Filtering. http://www.ee.umd.edu/medlab/filter/papers/filter/filter.html
Pennock, David M. and Horvitz, Eric 1999. Collaborative Filtering by Personality
31
31
http://www.cs.cmu.edu/afs/cs.cmu.edu/project/theo-6/web-agent/www/project-http://www.cs.cmu.edu/afs/cs.cmu.edu/project/theo-6/web-agent/www/project-8/7/2019 Survey Recomender System Algorithm
32/33
Diagnosis: A Hybrid Memory- and Model-Based Approach.
http://www.research.microsoft.com/~horvitz/cfpd.htm
Pryor, H. Michael,1998. The Effects of Singular Value Decomposition on Collaborative
Filtering. Computer Science Technical Report, Dartmouth College. PCS-TR98-
338.
Resnick, Paul and Varian, Hal R. 1997, Recommender Systems. COMMUNICATIONS
OF THE ACM. March 1997/vol. 40, No.3.
Resnick, Paul; Iacovou, Neophytos and et al;, 1994, GroupLens : An Open Architecture
for Collaborative Filtering of Netnews. From Proceedings of ACM 1994
Conference on Computer Supported Cooperative Work, Chapel Hill, NC: pages
175-186.
Shardanand, Upendra and Maes, Pattie 1995. Social Information Filtering: Algorithms
for Automating Word of Mouth.
http://www.acm.org/sigchi/chi95/Electronic/documnts/papers/us_bdy.htm
Terveen, Loren G., Hill, William C. and et al;, 1998, Building Task-Specific Interfaces
to High Volume Conversational Data.
http://www.acm.org/sigchi/chi97/proceedings/paper/lgt.htm
Turnbull, Don: Augmenting Information Seeking on the World Wide Web Using
Collaborative Filtering Techniques. 1998,
http://donturn.fis.utoronto.ca/research/augmentis.htn
Turnbull, Don: KMDI Final Summary: Collaborative Filtering. 1997,
http://donturn.fis.utoronto.ca/research/kmdi-cf.html
Ungar, Lyle H., and Foster, Dean P. Foster, 1998. A Formal Statistical Approach to
32
32
http://donturn.fis.utoronto.ca/research/augmentis.htnhttp://donturn.fis.utoronto.ca/research/augmentis.htn8/7/2019 Survey Recomender System Algorithm
33/33
Collaborative Filtering in AAAI Workshop on Recommendation System.
http://www.cis.upenn.edu/~ungar/papers.html
Wittenburg, Kent, Duco Das, Will Hill, and Larry Stead, 1998, Group Asynchronous
Browsing on the World Wide Web.
http://www.w3.org/Conferences/WWW4/Papers/98/
33
Recommended