4

Click here to load reader

Collaborative Ranking with Social Relationships for Top-N ...delab.csd.auth.gr/papers/SIGIR2016rc.pdf · Collaborative Ranking with Social Relationships for Top-N Recommendations

Embed Size (px)

Citation preview

Page 1: Collaborative Ranking with Social Relationships for Top-N ...delab.csd.auth.gr/papers/SIGIR2016rc.pdf · Collaborative Ranking with Social Relationships for Top-N Recommendations

Collaborative Ranking with Social Relationshipsfor Top-N Recommendations

Dimitrios RafailidisDepartment of Informatics

Aristotle University of ThessalonikiThessaloniki, [email protected]

Fabio CrestaniFaculty of Informatics

Università della Svizzera italiana (USI)Lugano, Switzerland

[email protected]

ABSTRACTRecommendation systems have gained a lot of attentionbecause of their importance for handling the unprecedent-edly large amount of available content on the Web, such asmovies, music, books, etc. Although Collaborative Ranking(CR) models can produce accurate recommendation lists, inpractice several real-world problems decrease their rankingperformance, such as the sparsity and cold start problems.Here, to account for the fact that the selections of socialfriends can leverage the recommendation accuracy, we pro-pose SCR, a Social CR model. Our model learns personalizedranking functions collaboratively, using the notion of SocialReverse Height, that is, considering how well the relevantitems of users and their social friends have been ranked atthe top of the list. The reason that we focus on the topof the list is that users mainly see the top-N recommenda-tions, and not the whole ranked list. In our experimentswith a benchmark data set from Epinions, we show that ourSCR model performs better than state-of-the-art CR mod-els that either consider social relationships or focus on theranking performance at the top of the list.

CCS Concepts•Information systems→Collaborative and social com-puting systems and tools;

KeywordsRecommendation systems; collaborative ranking; social re-lationships

1. INTRODUCTIONThe collaborative filtering strategy has been widely fol-

lowed in recommendation systems, where users with similarpreferences tend to get similar recommendations [5]. Userpreferences are expressed explicitly in the form of ratingsor implicitly in the form of number of views, clicks, pur-chases, etc. Relevant studies examine how to predict the

Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full cita-tion on the first page. Copyrights for components of this work owned by others thanACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re-publish, to post on servers or to redistribute to lists, requires prior specific permissionand/or a fee. Request permissions from [email protected].

SIGIR ’16, July 17-21, 2016, Pisa, Italyc⃝ 2016 ACM. ISBN 978-1-4503-4069-4/16/07. . . $15.00

DOI: http://dx.doi.org/10.1145/2911451.2914711

rating of a user on an unseen item, also known as the ratingprediction problem [8]. Instead of focusing on the accuracyof rating prediction, ranking-based models try to produceaccurate ranked lists for the top-N recommendation prob-lem [9]. In relevant studies [2, 13], it has been shown thatranking-based models achieve higher recommendation accu-racy than rating prediction-based approaches in the top-Nrecommendation problem.

Collaborative Ranking (CR) models learn a personalizedscoring/ranking function to rank the recommended items foreach individual, while all functions are constructed collabo-ratively across all the users [9, 10]. In [2], authors introduceCR models that perform push at the top of the recommen-dation list, by considering that what matters in a recom-mendation system is the ranking performance at the top ofthe list, that is, the items the user will actually see. In doingso, the CR models that focus on the ranking performanceat the top of the list achieve high ranking accuracy in thetop-N recommendation problem.

Although CR models can generate accurate recommenda-tion lists, in practice several real-world problems decreasethe recommendation accuracy, such as the sparsity and thecold-start problems [5]. Several approaches have been pro-posed for handling them in the rating prediction problem,assuming that users tend to trust the recommendations oftheir social friends [1]. However, a little effort has beenmade to exploit social relationships in CR models, such asthe studies in [6, 14]. Nonetheless, the CR models in [6, 14]do not focus on the ranking performance at the top of thelist, when learning the ranking functions.

Therefore, a pressing challenge resides on generating accu-rate recommendations able to exploit the social relationshipsthat simultaneously focus on the top of the list. In this pa-per, we propose a Social CR model, namely SCR. Our CRmodel learns the personalized ranking function of each indi-vidual in a collaborative manner using the notion of SocialReverse Height. When learning the ranking functions, SCRattempts to push up users’ relevant items above the irrele-vant ones at the top of the list, by considering the rankingpositions that the relevant items of their friends have. Weformulate the objective function of the proposed SCR modelas a minimization problem, and we provide an efficient op-timization algorithm based on alternating minimization [4]and gradient descent. Our experiments on a data set fromEpinions1 demonstrate the superiority of our SCRmodel overstate-of-the-art CR models.

1http://www.epinions.com/

785

Page 2: Collaborative Ranking with Social Relationships for Top-N ...delab.csd.auth.gr/papers/SIGIR2016rc.pdf · Collaborative Ranking with Social Relationships for Top-N Recommendations

2. RELATED WORKLearning to rank methods have been widely studied in In-

formation Retrieval and in recommendation systems. Thegoal is to define a ranking function on each item, and thenlearn toward some loss functions. Liu et al. [7] categorizelearning to rank methods into point-wise, list-wise and pair-wise. In short, point-wise approaches predict ranking scoresfor individual items. List-wise approaches consider an in-dividual training example as an entire list of items and useloss functions to express the distance between the referencelist and the output list from the ranking model. Represen-tative list-wise approaches in recommendation systems areCofiRank [12] and CLiMF [10], which use loss functions basedon Normalized Discounted Cumulative Gain and ReciprocalRank, respectively. Pair-wise approaches make a predictionfor every pair of items concerning their relative ordering inthe final list. A pair-wise recommendation approach is theBayesian Personalized Ranking framework (BPR) [9], a CRmodel that learns the personalized ranking functions collab-oratively. To account for the fact that users focus on recom-mendations at the top of the list, in [2] authors introduceoptimization algorithms for three CR models P-Push, Inf-Push and RH-Push, which follow different strategies whenlearning the personalized ranking functions. P-Push tries topush irrelevant items in the top-N list below those that havebeen selected as relevant; Inf-Push attempts to move themost irrelevant item in the list below all the relevant ones;while RH-Push tries to move the relevant items above theirrelevant ones. However, all the aforementioned models donot exploit the social relationships in the learning process.Although several approaches incorporate social relation-

ships in the rating prediction problem, e.g. [8], a few at-tempts have been made in the top-N recommendation prob-lem. Rendle et al. [6] present the MR-BPR model, where theycombine Multi-Relational matrix factorization with the BPR

framework [9] to model both users’ feedback on items and onsocial relationships. Zhao et al. [14] propose SBRP, a SocialBayesian Personalized Ranking model that incorporates so-cial relationships into a pair-wise ranking model, assumingthat users tend to assign higher ranks to items that theirfriends prefer. However, MR-BPR and SBRP do not focus onthe ranking performance at the top of the list.

3. PROBLEM FORMULATIONLet U and I be the sets of users and items, respectively.

We assume that users express their preferences over theitems by marking them as relevant or irrelevant (either viaexplicit or implicit feedback), stored in a (|U| × |I|) matrixR, with [R]ij=1 if user i has selected item j as relevant, -1as irrelevant, and 0 if user i has not expressed her preferenceover item j. For each user i the sets of relevant and irrelevantitems are denoted as I+i and I−i , with x+

i ∈ I+i , x−

i ∈ I−i

and ni=|I+i ∪ I−i | being the total number of marked items

by user i. For each user i ∈ U , we define the Reverse Heightof a relevant item x+

i as follows [2]:

Definition 1. The Reverse Height RHi(x+i ) of a rel-

evant item x+i in the recommendation list of user i is the

number of irrelevant items ranked above x+i .

In our setting, we assume that each user i has a set Fi offriends with trusted social relationships. Given x+

i ∈ I+i ,

x−i ∈ I

−i and x+

w ∈ I+w , with w ∈ Fi, we define the SocialReverse Height of a relevant item x+

i as follows:

Definition 2. The Social Reverse Height SRHi(x+i )

of a relevant item x+i in the recommendation list of user i

is the sum of (1) the number of irrelevant items x−i ranked

above x+i ; and (2) the number of the irrelevant items x−

i

ranked above the relevant items x+w of all her |Fi| friends.

Definition 3. The goal of the proposed SCR modelis to learn the personalized ranking functions collaborativelyby trying to minimize SRHi(x

+i ), that is, pushing up the

relevant items x+i of user i above the irrelevant items x−

i , byconsidering the ranking positions of the relevant items x+

w ofher |Fi| friends.

4. SOCIAL COLLABORATIVE RANKING

4.1 Social Reverse HeightA trusted friend w ∈ Fi might point to an interesting item

that does not match the preferences of user i [1]. Hence, tomeasure the social influence between user i and her friendw we define the social regularization term siw=|I+i ∩ I

+w |,

which is the intersection of the sets of relevant items of useri and her friend w. For each user i we normalize siw in [0, 1],by dividing each siw value with the maximum value of allthe |Fi| friends.

Given a personalized ranking function ri(x), with i ∈ Uand x ∈ I, according to Definition 2 we have:

SRHi(x+i ) =

∑x−i ∈I−

i

{1[ri(x

+i ) ≤ ri(x

−i )]

+1

|Fi|∑w∈Fi

1

|I+w |

∑xw∈I+

w

siw1[ri(x+w) ≤ ri(x

−i )]

} (1)

where 1 is the indicator function and siw the social regu-larization term. Note that ri(x

+w) expresses the ranking of

the relevant item x+w of friend w using the ranking function

of user i. The above optimization problem in Eq. (1) is in-tractable due to the non-convex indicator function 1. Thus,we take a surrogate gi(·) as follows:

gi(x+i , x

−i ) = ri(x

+i )− ri(x

−i )

+1

|Fi|∑w∈Fi

1

|I+w |

∑xw∈I+

w

siw(ri(x

+w)− ri(x

−i )

) (2)

4.2 Objective FunctionIn the proposed SCR model we follow a collaborative strat-

egy, instead of learning a ranking function ri(·) for each in-

dividual i. Given the low-rank d decomposition R̂ ≈ UTVof the initial matrix R, with U and V being (d × |U|) and(d × |I|) matrices, we consider ui and vj as the (d × 1) la-tent vectors of user i and item j respectively, that is, thei-th and j-th columns of U and V , respectively. For read-ability, we denote the relevant item of user i as a=x+

i , theirrelevant item of user i as b=x−

i , and the relevant item ofa friend w ∈ Fi as c=x+

w. Using the representation of thelatent factors ui and vj , Eq. (2) can be rewritten as:

gi(a, b) = uTi (va − vb)

+1

|Fi|∑w∈Fi

1

|I+w |

∑c∈I+

w

siw(uTw(vc − vb))

(3)

where a ∈ I+i , b ∈ I−i and c ∈ I+w . As the logistic lossfunction is a convex upper bound to the indicator function1, based on Eq. (3) we can reformulate Eq. (1) as:

786

Page 3: Collaborative Ranking with Social Relationships for Top-N ...delab.csd.auth.gr/papers/SIGIR2016rc.pdf · Collaborative Ranking with Social Relationships for Top-N Recommendations

SRHi(a) =∑b∈I−

i

(log (1 + exp (−gi(a, b)))

)(4)

According to Definition 3, for all the relevant items a ∈ I+iwe have to minimize the SRHi(a) value of Eq. (4). Hence, ∀i ∈ U the goal is to minimize the following objective function(log-log loss function [2]) with respect to matrices U and V :

L(U, V ) =

|U|∑i=1

{1

ni

∑a∈I+

i

log(1 + SRHi(a))

}+

λ

2

(||U ||2 + ||V ||2

)(5)

where the last term, called regularization term, is used toavoid model overfitting, and λ is the regularization parame-ter.

4.3 Model LearningThe goal is to learn (compute) matrices U and V to min-

imize the loss function L(U, V ) in Eq. (5). We follow thealternating minimization strategy [4] using the gradient de-scent optimization algorithm, where we update U keepingV fixed, and then, update V keeping the updated U fixed.Based on the (sub)gradients of objective function L(U, V )with respect to ui and vj , the update rules for each itera-tion t+ 1 are:

ut+1i ← ut

i − η∇uiL(Ut, V t), i = 1 . . . |U| (6)

vt+1j ← vt

j − η∇vjL(Ut+1, V t), j = 1 . . . |I| (7)

where η controls the learning rate.

4.3.1 Computing the Gradient ∇uiL(U, V )

We apply the chain rule to Eq. (5) to take the gradientof L with respect to ui, that is, the composition of gradi-ents ∇uigi(a, b) and ∇gi(a,b)SRHi(a) with respect to ui inEq. (3) and gi(a, b) in Eq. (4), respectively. Note that thegradient of the regularization term in Eq. (5) with respectto ui equals λui. Thus, provided that i ̸= w, we have:

∇uigi(a, b) = va − vb (8)

∇gi(a,b)SRHi(a) =−1

1 + exp(gi(a, b))(9)

the gradient of L with respect to ui in Eq. (5) equals:

∇uiL(U, V ) =

1

ni

∑a∈I+

i

{1

1 + SRHi(a)

∑b∈I−

i

1

1 + exp(gi(a, b))(vb − va)

}+ λui

(10)

4.3.2 Computing the Gradient ∇vjL(U, V )

When calculating the gradient of gi(a, b) with respect tovj in Eq. (3), there are the following two cases:

∇vjgi(a, b) =

ui, if j=a.

−[ui +

1|Fi|

∑w∈Fi

1

|I+w |

∑c∈I+

w

siwuw

], if j=b.

(11)Similarly, we apply the chain rule to Eq. (5), to compute thegradient of L with respect to vj , that is, the composition ofthe gradient ∇vjgi(a, b) based on the two cases in Eq. (11),and the gradient ∇gi(a,b)SRHi(a), as presented in Eq. (9).

The gradient of the regularization term in Eq. (5) with re-spect to vj equals λvj . Let U+

j and U−j be the sets of users

who selected item j as relevant and irrelevant, respectively.We can divide the sum over all users into sums over U−

j and

U+j [2], in order to calculate the gradient of L with respect

to vj as follows:

∇vjL(U, V ) =∑i∈U−

j

1

ni

∑j∈I−

i

{1

1 + SRHi(a)

∑a∈I+

i

1

1 + exp(gi(a, b))

[ui +

1

|Fi|∑w∈Fi

1

|I+w |

∑c∈I+

w

siwuw

]}

−∑i∈U+

j

1

ni

∑j∈I−

i

{1

1 + SRHi(a)

∑a∈I+

i

1

1 + exp(gi(a, b))ui

}+ λvj

(12)

Having computed U and V based on Eqs. (6) and (7) withthe respective gradients in Eqs. (10) and (12), we calcu-

late the low-rank d approximation R̂ ≈ UTV of the initial(|U| × |I|) matrix R, and we generate a recommendation

list by sorting the items in the respective i-th row of R̂ indescending order, for each user i.

5. EXPERIMENTAL EVALUATION5.1 Data Set & Evaluation Protocol

In our experiments we used a publicly available data set2

from Epinions, which consists of 71,002 users and 104,356items. The total number of ratings is 571,235 at a 5-starscale, where we considered 4-5 star ratings as relevant toa user and 1-3 as irrelevant. The total number of trustedsocial relationships is 508,960. The data set is divided intotwo sub-sets: the training set and the test set. Followingthe evaluation protocol of similar studies [13, 14], for userswith less than five ratings, one randomly selected rating isinserted into the test set. For users with five ratings or more,10% of the randomly selected ratings are moved to the testset. The training set is further split into two subsets: thecross-validation training set and the cross-validation test set,which are used to determine the tuning parameters of eachexamined ranking model. We repeated our experiment tentimes, and we report mean values and standard deviationson the (actual) test set over the runs.

We evaluated the performance of the examined rankingmodels in terms of Recall (R@N), which is defined as theratio of the relevant items in the top-N ranked list overall the relevant items for each user. In addition, we usedthe Normalized Discounted Cumulative Gain (NDCG@N)metric, which considers the ranking of the relevant items inthe top-N list. For each user the Discounted CumulativeGain is defined as:

DCG@N =N∑

j=1

2relj − 1

log2 j + 1(13)

where relj represents the relevance score of the item j, bi-nary relevance in our case, that is, relevant or irrelevant tothe user. NDCG@N is the ratio of DCG@N over the idealiDCG@N value for each user, that is, the DCG@N valuegiven the ratings in the test set. In our experiments weaveraged R@N and NDCG@N over all users.

2http://alchemy.cs.washington.edu/data/epinions/

787

Page 4: Collaborative Ranking with Social Relationships for Top-N ...delab.csd.auth.gr/papers/SIGIR2016rc.pdf · Collaborative Ranking with Social Relationships for Top-N Recommendations

5.2 ResultsTo evaluate the parameter sensitivity in SCR, we varied the

learning rate η in the range of [10−5 10−2] and we concludedin η=10−4. This means that a more conservative learningstrategy is required, as higher values of η decreased the per-formance of SCR, by falling in local optima when minimizingthe loss function in Eq. (5). In Figure 1, we vary the numberof latent factors d and the regularization parameter λ, whilekeeping η=10−4 fixed. For presentation purposes, we plotR@ as a percentage (%). Figures 1(a)-(b) show that thereis a slight increase on R@ after d ≥ 40, while there is a dropon R@ when λ is in the range of 0.01-1, and we select d=40and λ=0.001.

20 40 60 80 1005

10

15

20

# of latent factors (d)

R@

10 (%

)

0.0001 0.001 0.01 0.1 10

5

10

15

20

Reg. param. (lambda)

R@

10 (%

)

(a) (b)

Figure 1: Effect of (a) # of latent factors d and (b)regularization parameter λ on R@10 (%).

We use CofiRank3 [12] as baseline, and we compare theproposed SCR method with P-Push, Inf-Push and RH-Push

using the optimization algorithms4 of [2]. In addition, wecompare SCR with MR-BPR5 [6], a state-of-the-art methodfor ranking with social relationships in the recommendationproblem. We also evaluate the performance of SCR againstSBRP [14], a ranking model that also considers social relation-ships in the learning process, assuming that users tend toassign higher ranks to items that their friends prefer. In allranking models, we followed the same cross-validation strat-egy to select the optimal parameters as in SCR. In Table 1we compare the examined ranking models, where we observethat P-Push, Inf-Push and RH-Push outperform CofiRank,as these models focus on the ranking performance at thetop of the list; while MR-BPR, SBPR and SCR achieve betterperformance than the models that do not exploit social re-lationships. The proposed SCR model beats the competitorsby exploiting social relationships and focusing on the topof the ranking list. Using the paired t-test, we found thatthe differences between the reported results for SCR againstthe competitive approaches were statistically significant forp<0.05.

6. CONCLUSION AND FUTURE WORKIn this study, we presented SCR, a CR model that focuses

on the ranking accuracy at the top of the list by also consid-ering users’ trusted social relationships. Our learning strat-egy uses the notion of Social Reverse Height to consider theranking positions of friends’ relevant items, while our SCR

method outperforms other state-of-the-art ranking models.An interesting future direction is to exploit both the trust

and distrust social relationships in the proposed SCR model,

3https://github.com/markusweimer/cofirank4http://www-users.cs.umn.edu/˜christa/5http://www.ismll.uni-hildesheim.de/software

Table 1: Methods comparison in terms of NDCG@and R@, with bold values denoting the best scores(∗p< 0.05 in paired t-test). MR-BPR, SBPR and SCR ex-ploit social relationships.

NDCG@10 R@10 R@100CofiRank .1423± .0043 .0698± .0075 .2398± .0047P-Push .1671± .0058 .1134± .0044 .2977± .0054Inf-Push .1563± .0091 .1073± .0053 .2789± .0035RH-Push .1574± .0079 .1266± .0062 .2980± .0028MR-BPR .1798± .0084 .1401± .0089 .3081± .0051SBPR .1886± .0076 .1490± .0055 .3296± .0042SCR .2065± .0064∗ .1596± .0043∗ .3654± .0035∗

accounting for the fact that users with distrust relationshipsusually do not have similar preferences [3]. In addition, weplan to evaluate the performance of SCR in terms of diver-sity, that is, how well we can generate diversified recommen-dations in SCR to capture the interest range of the targetuser [11].

7. REFERENCES[1] A. J. Chaney, D. M. Blei, and T. Eliassi-Rad. A

probabilistic model for using social networks in personalizeditem recommendation. In RecSys, pages 43–50, 2015.

[2] K. Christakopoulou and A. Banerjee. Collaborative rankingwith a push at the top. In WWW, pages 205–215, 2015.

[3] R. Forsati, M. Mahdavi, M. Shamsfard, and M. Sarwat.Matrix factorization with explicit trust and distrust sideinformation for improved social recommendation. TOIS,32(4):17:1–17:38, 2014.

[4] P. Jain, P. Netrapalli, and S. Sanghavi. Low-rank matrixcompletion using alternating minimization. In STOC, pages665–674, 2013.

[5] Y. Koren, R. M. Bell, and C. Volinsky. Matrix factorizationtechniques for recommender systems. IEEE Computer,42(8):30–37, 2009.

[6] A. Krohn-Grimberghe, L. Drumond, C. Freudenthaler, andL. Schmidt-Thieme. Multi-relational matrix factorizationusing bayesian personalized ranking for social networkdata. In WSDM, pages 173–182, 2012.

[7] T.-Y. Liu. Learning to rank for information retrieval.Found. Trends Inf. Retr., 3(3):225–331.

[8] H. Ma, H. Yang, M. R. Lyu, and I. King. Sorec: socialrecommendation using probabilistic matrix factorization. InCIKM, pages 931–940, 2008.

[9] S. Rendle, C. Freudenthaler, Z. Gantner, andL. Schmidt-Thieme. Bpr: Bayesian personalized rankingfrom implicit feedback. In UAI, pages 452–461, 2009.

[10] Y. Shi, A. Karatzoglou, L. Baltrunas, M. Larson, N. Oliver,and A. Hanjalic. Climf: Learning to maximize reciprocalrank with collaborative less-is-more filtering. In RecSys,pages 139–146, 2012.

[11] Y. Shi, X. Zhao, J. Wang, M. Larson, and A. Hanjalic.Adaptive diversification of recommendation results vialatent factor portfolio. In SIGIR, pages 175–184, 2012.

[12] M. Weimer, A. Karatzoglou, Q. V. Le, and A. J. Smola.COFI RANK - maximum margin matrix factorization forcollaborative ranking. In NIPS, pages 1593–1600, 2007.

[13] X. Yang, H. Steck, Y. Guo, and Y. Liu. On top-krecommendation using social networks. In RecSys, pages67–74, 2012.

[14] T. Zhao, J. McAuley, and I. King. Leveraging socialconnections to improve personalized ranking forcollaborative filtering. In CIKM, pages 261–270, 2014.

788