View
4
Download
0
Category
Preview:
Citation preview
Semi-Disentangled Representation Learning inRecommendation System
Weiguang Chencwg@hnu.edu.cnHunan University
Wenjun Jiangβjiangwenjun@hnu.edu.cn
Hunan University
Xueqi Lilee_xq@hnu.edu.cnHunan University
Kenli Lilkl@hnu.edu.cnHunan University
Albert Zomayaalbert.zomaya@sydney.edu.au
University of Sydney
Guojun Wangcsgjwang@gzhu.edu.cnGuangzhou University
ABSTRACTDisentangled representation has been widely explored in manyfields due to its maximal compactness, interpretability and versatil-ity. Recommendation system also needs disentanglement to makerepresentation more explainable and general for downstream tasks.However, some challenges slow its broader application β the lackof fine-grained labels and the complexity of user-item interactions.To alleviate these problems, we propose a Semi-Disentangled Repre-sentation Learning method (SDRL) based on autoencoders. SDRL di-vides each user/item embedding into two parts: the explainable andthe unexplainable, so as to improve proper disentanglement whilepreserving complex information in representation. The explainablepart consists of πππ‘πππππ πππππ for individual-based features andππ₯π‘πππππ πππππ for interaction-based features. The unexplainablepart is composed by ππ‘βππ πππππ for other remaining information.Experimental results on three real-world datasets demonstrate thatthe proposed SDRL could not only effectively express user anditem features but also improve the explainability and generalitycompared with existing representation methods.
1 INTRODUCTIONDisentangled representation learning aims at separating embeddinginto multiple independent parts, which makes it more explainableand general [2]. The core idea of existing methods is to minimizethe reconstruction error of the whole representation and maximizethe independence among different parts simultaneously [4, 13]. Ithas been successfully applied into image representation, and re-searchers have verified its superiority on many downstream tasks,e.g., image generation [1, 4] and style transfer [16]. Disentangledrepresentation is also required by recommendation system to distin-guish various hidden intentions under the same behavior [23, 36].However, two obstacles slow its extensive application: the lack ofenough fine-grained labels and the complexity of user-item interac-tions. Furthermore, Locatello et al. [19] theoretically demonstratethe difficulty and even impossibility of unsupervised disentangle-ment and propose solutions using a few labels. It inspires us toput forward a Semi-Disentangled Representation Learning (SDRL)approach for recommendation system based on limited labels.
Specifically, wewould introduce an examplewith Fig. 1 to explainthe requirements and challenges of disentangled representationfor recommendation and clarify our motivations in this paper. For
βWenjun Jiang is the corresponding author.1Images come from https://www.imdb.com/.
Avengers:
EndgameMarriage Story Little Women
Sen to Chihiro no
kamikakushiFrozen II
u3u1 u2
Figure 1: An Illustration of User-Item Interactions 1.
instance, users π’1 and π’3 have watched the same movie MarriageStory, but their motivations may differ from each other. π’1 choosesit probably because he is a fan of the actress Scarlett Johanssonwhile the motivation ofπ’3 maybe that he takes interests in romancemovies. Another scene (i.e., users π’1 and π’2 watch the same movieSen to Chihiro no kamikakushi) reflects a similar phenomenon. π’2likes animation movies while π’1 also watches it perhaps becauseπ’1 and π’2 are friends. Disentangled representation could help dis-tinguish these different intentions under the same behavior, so asto improve the representation accuracy and offer clues to explainwhy the items are provided.
However, it is hardly possible to develop complete and accuratedisentanglement in recommendation system, in consideration thatit lacks fine-grained labels and user-item interactions are compli-cated. Specific with the example, it usually lacks enough labels forbuilding completely fine-grained aspects, e.g., π’1 is a fan of ScarlettJohansson. On the other hand, some unknowable and random fac-tors also affect usersβ decisions, e.g.,π’2 invites his friendπ’1 to watchSen to Chihiro no kamikakushi together. Disentanglement basedon incomplete or inaccurate factors may decrease the expressionability of representation.
To overcome these limitations, we propose a semi-disentangledrepresentation method SDRL, separating representation into threeparts to respectively express internal, external and other remainingcomplex information. In particular, we present πππ‘πππππ πππππ to de-note the features related to individual itself, e.g., product category,movie style, user age. ππ₯π‘πππππ πππππ represents the characteristicsfrom user-item interactions, e.g., user ratings and implicit feedback.
arX
iv:2
010.
1328
2v1
[cs
.SI]
26
Oct
202
0
Conferenceβ17, July 2017, Washington, DC, USA Weiguang Chen, Wenjun Jiang, Xueqi Li, Kenli Li, Albert Zomaya, and Guojun Wang
Besides, we introduce the ππ‘βππ πππππ to generalize the informationthat may not contained by the former two blocks or random fac-tors in the real scenes. Moreover, in addition to reduce the overallreconstruction error, we utilize category information and user-itemratings as the supervision for πππ‘πππππ πππππ and ππ₯π‘πππππ πππππ
respectively. In this way, SDRL could not only capture complex in-teractive information but also express various features into differentblocks. To sum up, the main contributions are as follows:
β’ We identify the problem of semi-disentangled representationlearning in recommendation system, to preserve complexinformation while achieving proper disentanglement. As faras we know, we are the first to study this problem.
β’ We propose a method SDRL to address the problem. It di-vides the representation into three blocks: πππ‘πππππ πππππ ,ππ₯π‘πππππ πππππ and ππ‘βππ πππππ . The former two are em-ployed to express individual- and interaction-based features.The ππ‘βππ πππππ contains remaining information.
β’ The experimental results demonstrate that the proposedSDRL improves the accuracy of two downstream tasks, aswell as enhances the explainablity of representation.
2 TASK FORMULATIONTo improve both disentanglement and expression ability of repre-sentation in recommendation system, we separate embeddings intothree blocks: πππ‘πππππ πππππ , ππ₯π‘πππππ πππππ and ππ‘βππ πππππ . Wewould formally define these concepts and the problem to solve inthis paper.
2.1 Key ConceptsDefinition 1: internal block. It contains features about individualitself, which are extracted from content information.
Definition 2: external block. It expresses features based oninteractions, i.e., user-item ratings, implicit feedback, etc.
Definition 3: other block. ππ‘βππ πππππ denotes characteristicsexcluding those contained by the former blocks.
In this work, we utilize category information to supervise πππ‘πππππ πππππ .User-item ratings are employed as the supervision of ππ₯π‘πππππ πππππ .ππ‘βππ πππππ generalizes some features belonged to individuals andinteractions but beyond supervision (e.g., the color of a product orthe director of a movie), as well as some random factors.
2.2 Problem DefinitionWe identify the problem of semi-disentangled representation in rec-ommendation system, to preserve complicated information whilestrengthening the representation explainability and generality withproper disentanglement. It requires to embed all features into threeblocks using limited labels as supervision. It could be formallydefined as follows.
Input: The normalized user-item ratings π and item categoriesπΆπΌ are taken as the input, in which π π, π = π denotes that the π-thuser rates π-th item as π points and πΆπΌ
π,π = 1 means that the π-thitem belongs to π-th category.
Output: Each user and item needs to be represented as a π-dimension vector consisting of three blocks, πππ‘πππππ πππππ , ππ₯π‘πππππ πππππand ππ‘βππ πππππ .
Objective: The goal is to (1) make representation π preservemore original information of users and items and (2) encourageπππ‘πππππ πππππ and ππ₯π‘πππππ πππππ to express more features relatedto categories and interactions respectively. The objective functionis defined as follows.
πππ₯ππππ§π π (π , πΌ, π |π ) + π (πΆ |ππππ‘ ) + π (π |πππ₯π‘ ), (1)whereπ and πΌ represent initial embeddings of users and items, πdenotes their representation in semi-disentangled space, ππππ‘ andπππ₯π‘ are the corresponding πππ‘πππππ πππππ and ππ₯π‘πππππ πππππ , andπΆ = πΆπΌ βͺπΆπ contains category labels of items and users. π (π΄|π΅)means the possibility of generating π΄ given π΅.
Decoder
Decoder
Other
External
Internal
Other
External
Internal
User
Representation
Item
Representation
Semi-disentangled
Space
CategotyRatings Ratings
Initial Space Initial SpaceInitial Space
Encoders
Encoders
Figure 2: The Framework of SDRL.
3 SDRL: THE DETAILSWe propose a semi-disentangled representation learning methodSDRL for recommendation system. The framework is shown in Fig.2. It consists of two major components. (1) Node representation (i.e.,user representation and item representation) employs autoencodersto preserve characteristics of users, items and their interactions. (2)Supervised semi-disentanglement utilizes category information anduser-item ratings to encourage πππ‘πππππ πππππ and ππ₯π‘πππππ πππππto respectively express more individual and interactive features.We also point out some possible directions to extend SDRL.
3.1 Node RepresentationWe exploit autoencoders to transform representation in initial spaceinto semi-disentangled space.
Autoencoder is an unsupervised deep neural network [14, 35],which has been extensively applied for network representationlearning [31, 38]. It has two modules of encoder and decoder asfollows,
π (π₯) = π (ππ₯ + π), (2)π(π¦) = π (π β²π¦ + π β²). (3)
Semi-Disentangled Representation Learning in Recommendation System Conferenceβ17, July 2017, Washington, DC, USA
Encoder denotes the mapping π that transforms the input vectorπ₯ β π π into the latent representation π¦ β π π
β² , whereπ β π πβ²Γπ
is a weight matrix and π β π πβ² is an offset vector. The mapping π
is called the decoder, reconstructing π¦ in the latent space into π₯ β²
in the initial space, in whichπ β² β π πΓπβ² and π β² β π π denote the
weight matrix and the offset vector respectively.π (.) is an activationfunction.
The objective of autoencoder is to minimize the reconstructionerror. Stacked autoencoders [35] is a widely used variant, which hasbeen experimentally verified the improvement of representationquality. Therefore, we employ it to generate representations ofusers and items.
Different from typical autoencoders, we use three encodersπΈπππππππππ‘ , πΈππππππππ₯π‘ and πΈππππππππ‘β to produce the correspond-ing πππ‘πππππ πππππ , ππ₯π‘πππππ πππππ and ππ‘βππ πππππ , respectively.The generation process of users is as follows,
πππππ‘ = πΈπππππππππ‘ (π ), (4)
ππππ₯π‘ = πΈππππππππ₯π‘ (π ), (5)
ππππ‘β
= πΈππππππππ‘β (π ), (6)
ππ = πππππππ‘πππ‘π (ππππ‘ , πππ₯π‘ , πππ‘β), (7)
π = π·ππππππ (ππ ). (8)
πππππ‘
, ππππ₯π‘ and ππ
ππ‘βrepresent the above three blocks respectively,
ππ denotes the embeddings of usersπ in semi-disentangled space,andπ is the generated representations of users in the initial space.The representation of items also utilizes a similar process. The goalof the process is to reconstruct the representation of users and itemsin the initial space as well as user-item ratings with autoencoders.Considering that the number of unobserved interactions (i.e., thereis no rating) far exceeds that of the observed, we employ BinaryCross Entropy (BCE) as the basic metric. We define the loss functionof reconstruction as follows,
πππ π πππππ = π΅πΆπΈ (π ,π ) + π΅πΆπΈ (πΌ , πΌ ) + π΅πΆπΈ (π , π ), (9)
whereπ , πΌ denote the reconstructed users and items, π representpredicted ratings by matching ππ and π πΌ . Details of BCE are asfollow, where π¦ denote labels, π¦ represent the predicted values andπ is the number of π¦,
π΅πΆπΈ (π¦,π¦) = 1
π
πβοΈπ=1
π¦π log(π¦π ) + (1 β π¦π ) log(1 β π¦π ). (10)
3.2 Supervised Semi-DisentanglementDisentangled representation with weakly supervision has demon-strated its effectiveness in computer vision [3, 21]. The successfulapplication and the complexity of interactions inspires us to im-prove the representation disentanglement using limited labels inrecommendation system.
3.2.1 Internal Block Supervision. We employ category informationπΆ as the supervision for πππ‘πππππ πππππ . πΆπΌ represents the item-category correspondence based on side information. πΆπ denotes
the user preference on category extracted from ratings and πΆπΌ ,which is calculated as follows,
πΆππ = ππππππππ§π (βοΈ
πΌπ βπΌ ,π π,π>0
π π,ππΆπΌπ ) . (11)
πΆππ and πΆπΌπ respectively denote category vectors of ππ and πΌπ .We sum the product of rating π π,π and item category vectorπΆπΌπ ofall rated items, and normalize it as ππ βs category preferences πΆππ .The loss function is as follows,
πππ π πππ‘ = π΅πΆπΈ ( ^πΆπππ‘ ,πΆ)), (12)
where ^πΆπππ‘ denotes the predicted category features of users anditems using the corresponding πππ‘πππππ πππππ .
3.2.2 External Block Supervision. We utilize ratings π for super-vising ππ₯π‘πππππ πππππ to contain more interactive information. Theloss function on ππ₯π‘πππππ πππππ is as follows,
πππ π ππ₯π‘ = π΅πΆπΈ ( ^π ππ₯π‘ , π )) . (13)^π ππ₯π‘ denote the predicted ratings based on ππ₯π‘πππππ πππππ .
3.2.3 Semi-Disentanglement Analysis. Most existing methods ex-plicitly improve the block independence with the mutual informa-tion or distance correlation [5], which maybe not well applicablein recommendation. The major reason is that we need improve rep-resentation disentanglement as well as preserve interrelated char-acteristics. Hence, we propose semi-disentangled method SDRL.It does not separate the whole embedding into explainable blocksas that in disentangled representation learning, i.e., preserving noexplicit meanings or factors in ππ‘βππ πππππ . Furthermore, it doesnot force the independence among different blocks. Instead, it justpushes the explainable blocks to express the more correspondingcharacteristics, i.e., encouraging πππ‘πππππ πππππ to express morecategory-based information and ππ₯π‘πππππ πππππ to contain moreinteractive features.
3.3 Model Optimization and ExtensionWe aim at improving the expression ability and proper disentan-glement of representation at the same time, so we combine theloss function of node representation and semi-disentanglement asfollows,
πππ π = πππ π πππππ + πππ π πππ‘ + πππ π ππ₯π‘ . (14)
3.3.1 Setting-oriented Extension. We separate the explainable partof representation into two blocks using autoencoders. As data char-acteristics and label information change, itβs flexible to adjust thenumber and the type of blocks as well as switch basic representationmethod. There are some possible extensions.
When more fine-grained labels are available even not for eachsample, it is reasonable and convenient to add more blocks andoptimize them with a small number of labels as in [21]. In anothersetting where the extracted features are independent from eachother, variational autoencoders (VAE) [15] maybe a good choice toreplace autoencoders.
Conferenceβ17, July 2017, Washington, DC, USA Weiguang Chen, Wenjun Jiang, Xueqi Li, Kenli Li, Albert Zomaya, and Guojun Wang
3.3.2 Task-oriented Extension. The objective of this work is to pro-duce general representation, so just a simple match of embeddingsis employed for various downstream tasks. In fact, different tasksmay emphasize different factors. For a specific task, some following-up modules maybe required and it is easy to extend SDRL withthem.
For instance, for some tasks with supervision (e.g., rating pre-diction, node classification), attention mechanism [34] is an in-tuitively excellent option as the following-up module to allocatedifferent weights for different factors. For some tasks without la-bels (e.g., serendipity recommendation [17]), pre-assigned weightsprobably could improve the performance. In brief, the proposedsemi-disentangled representation provides opportunity to adap-tively or manually pay different attentions to known factors onvarious tasks.
4 EXPERIMENTSTo validate the effectiveness and explainability of SDRL, as wellas the role of various semi-disentangled blocks, we conduct inten-sive experiments on three real-world datasets. We would brieflyintroduce the experimental settings in Section 4.1. Section 4.2 dis-plays the comparison among our method and baselines in Top-Krecommendation and item classification. In Sections 4.3 and 4.4,we perform ablation experiments and hyper-parameter analysis tostudy the impacts of different blocks and parameters in SDRL. Fi-nally, we demonstrate the semi-disentanglement and explainabilityof representation with visualization and a case study in Section 4.5.
4.1 Experimental SettingsWe perform experiments on three real-world datasets: MovieLens-latest-small (ml-ls), MovieLens-10m (ml-10m) [10] and Amazon-Book [11, 24], whose statistics are shown in Table 1. We filter outusers and items with less than 20 ratings and employ 80% ratingsas the training data and the others as test data. We compare ourmethod with four baselines, NetMF [27, 29], ProNE [39], VAECF[18] and MacridVAE [22], based on four metrics, ππππππ , ππππππ πππ,πΉ1 and ππππ [30].
Table 1: Statistics of Datasets.
dataset #Users #Items #Ratings #Categories Densityml-ls 610 9742 100836 18 1.7%
ml-10m 71567 10681 10000054 18 1.3%Amazon-Book 52643 91599 2984108 22 0.062%
4.1.1 Baselines. We would briefly introduce four baselines andfour variants of our proposed method SDRL.
NetMF: NetMF [27] makes network embedding as matrix fac-torization, unifying four network embedding methods DeepWalk[26], LINE [33], PTE [32], and node2vec [9].
ProNE: ProNE [39] is a fast and scalable network representationapproach consisting of two modules, sparse matrix factorizationand propagation in the spectral space.
VAECF: Liang et al. develop a variant of Variational AutoEn-coders for Collaborative Filtering [18] on implicit feedback data,which is a non-linear probabilistic model.
MacridVAE:MacridVAE [22] is one of the state-of-the-art meth-ods that learn disentangled representation for recommendation,achieving macro and micro disentanglement.
To study the impacts of three blocks, πππ‘πππππ πππππ (int), ππ₯π‘πππππ πππππ(ext) and ππ‘βππ πππππ (oth), we develop some variants through dif-ferent combinations. For variants with two blocks, we set theirproportion as 1:1.
SDRLβ(int+ext) generates node embeddings consisting of twoblocks, πππ‘πππππ πππππ and ππ₯π‘πππππ πππππ .
SDRLβ(int+oth) keeps πππ‘πππππ πππππ and ππ‘βππ πππππ .SDRLβ(ext+oth) consists of ππ₯π‘πππππ πππππ and ππ‘βππ πππππ .SDRLβ(whole) represents a node as a whole embedding, which
is similar to those common representation methods, but its opti-mization is based on autoencoders as in SDRL.
4.2 Performance ComparisonWe verify the effectiveness of SDRL in two common downstreamtasks in recommendation system, Top-K recommendation and itemclassification. We run the experiments 20 times and report the aver-age values and the standard deviation. We highlight the best valuesof baselines in bold and calculate the corresponding improvements.
5 10 15# Category
0.00
0.05
0.10
0.15
0.20
% U
ser
(a) ml-ls
5 10 15# Category
0.00
0.05
0.10
0.15
0.20
% U
ser
(b) ml-10m
5 10 15 20# Category
0.00
0.02
0.04
0.06
0.08
0.10
% U
ser
(c) Amazon-Book
Figure 3: Statistics of Categories Grouped by Users.
Top-K Recommendation. We generate Top-5, 10 and 15 items ac-cording to predicted ratings as recommendations. The comparisonresults on three datasets are shown in Tables 2, 3 and 4, respec-tively. Based on the observations of comparison results, we find thatour method SDRL achieves steady improvements compared withbaselines. It at least improves F1 (ndcg) by 20.41% on MovieLens-latest-small, 35.24% onMovieLens-10m and 0.29% on Amazon-Book.It demonstrates that the proposed representation method SDRL sig-nificantly improves the Top-K recommendation performance espe-cially on MovieLens datasets ml-ls and ml-10m. The possible reasonis that more dense interactions (as shown in Table 1) reflect richerfeatures and it could promote (semi-)disentangled representationmore accurate overall.
Item Classification. Item classification also plays an importantrole in recommendation system such as cold-start task [41]. SinceVAECF and MacridVAE do not generate item representations, weonly utilize NetMF and ProNE as baselines and item categories aslabels in item classification.We employ item embedding as the input
Semi-Disentangled Representation Learning in Recommendation System Conferenceβ17, July 2017, Washington, DC, USA
Table 2: Comparison on Top-K recommendation(%). (ml-ls)
method F1@5 F1@10 F1@15 ndcg@5 ndcg@10 ndcg@15NetMF 2.9573(Β±0.0) 4.7776(Β±0.0) 5.1616(Β±0.0) 3.793(Β±0.0) 5.2728(Β±0.0) 6.9132(Β±0.0)ProNE 3.0491(Β±0.1) 4.5284(Β±0.14) 5.3116(Β±0.17) 3.8125(Β±0.18) 5.243(Β±0.13) 7.0057(Β±0.11)VAECF 5.5595(Β±0.29) 7.7409(Β±0.33) 8.9691(Β±0.43) 8.7243(Β±0.55) 9.8197(Β±0.36) 11.3705(Β±0.35)
MacridVAE 5.1578(Β±0.15) 7.3867(Β±0.26) 8.5167(Β±0.32) 8.0452(Β±0.33) 9.0503(Β±0.32) 10.6263(Β±0.23)improvement 30.0171 27.9192 25.4228 20.4062 25.8338 28.8774
SDRL 7.2283(Β±0.27) 9.9021(Β±0.27) 11.2493(Β±0.26) 10.5046(Β±0.57) 12.3565(Β±0.3) 14.654(Β±0.26)
Table 3: Comparison on Top-K recommendation(%). (ml-10m)
method F1@5 F1@10 F1@15 ndcg@5 ndcg@10 ndcg@15NetMF 3.9352(Β±0.0) 6.2(Β±0.0) 7.4752(Β±0.0) 4.831(Β±0.0) 6.8198(Β±0.0) 9.2123(Β±0.0)ProNE 2.1065(Β±0.04) 3.1941(Β±0.06) 3.8371(Β±0.05) 2.3357(Β±0.05) 3.5148(Β±0.04) 4.9629(Β±0.03)VAECF 5.1655(Β±0.07) 7.5696(Β±0.1) 8.881(Β±0.1) 8.033(Β±0.18) 9.4863(Β±0.13) 11.4674(Β±0.11)
MacridVAE 5.1832(Β±0.07) 7.5236(Β±0.11) 8.8513(Β±0.14) 7.0934(Β±0.18) 9.0107(Β±0.16) 11.4481(Β±0.15)improvement 44.6963 38.8184 35.2393 42.2793 38.2288 37.0189
SDRL 7.4999(Β±0.16) 10.508(Β±0.23) 12.0106(Β±0.25) 11.4293(Β±0.46) 13.1128(Β±0.35) 15.7125(Β±0.27)
Table 4: Comparison on Top-K recommendation(%). (Amazon-Book)
method F1@5 F1@10 F1@15 ndcg@5 ndcg@10 ndcg@15NetMF 9.0322(Β±0.0) 10.9837(Β±0.0) 11.5754(Β±0.0) 13.4461(Β±0.0) 12.623(Β±0.0) 13.8484(Β±0.0)ProNE 6.818(Β±0.07) 8.8102(Β±0.05) 9.5218(Β±0.07) 9.9492(Β±0.14) 9.9032(Β±0.06) 11.1619(Β±0.04)VAECF 6.1366(Β±0.09) 7.9531(Β±0.14) 8.6195(Β±0.14) 9.6312(Β±0.19) 9.3344(Β±0.14) 10.3365(Β±0.1)
MacridVAE 8.0043(Β±0.12) 9.9379(Β±0.14) 10.5185(Β±0.17) 12.1028(Β±0.22) 11.5303(Β±0.13) 12.6398(Β±0.11)improvement 0.2879 4.3592 5.9549 2.0549 5.4543 7.1127
SDRL 9.0582(Β±0.07) 11.4625(Β±0.12) 12.2647(Β±0.14) 13.7224(Β±0.18) 13.3115(Β±0.14) 14.8334(Β±0.1)
Table 5: Comparison on item classification(%). (ml-ls)
method recall precison micro_F1NetMF 50.3539 59.8585 54.6957ProNE 53.1639 62.73 57.5517
improvement 12.4686 13.6479 13.0066SDRL 59.7927 71.2913 65.0372
Table 6: Comparison on item classification(%). (ml-10m)
method recall precison micro_F1NetMF 66.4181 65.6991 66.0564ProNE 67.7764 66.7775 67.2731
improvement 18.0496 19.9254 18.9872SDRL 80.0098 80.0832 80.0464
Table 7: Comparison on item classification(%). (Amazon-Book)
method recall precison micro_F1NetMF 51.4335 47.3786 49.3226ProNE 51.2197 47.3555 49.2115
improvement 83.7268 76.2521 79.759SDRL 94.4971 83.5058 88.6618
and a MLPClassifier 2 as the classification algorithm for all methods.The comparison results are shown in Tables 5, 6 and 7, respectively.
2https://scikit-learn.org/
We observe that our method SDRL outperforms baselines on threedatasets with the help of category supervision.
4.3 Ablation AnalysisTo study the impacts of various blocks in SDRL, we develop theablation study based on the same setting as comparison experiment.Not only highlighting the best values in bold, we also underline thesecond and third best values in the results.
Top-K Recommendation. Based on the observations of compari-son results in Tables 8, 9 and 10, we have three major findings. (1)Overall, SDRL consisting of three blocks and SDRLβ(int+ext) achievea relatively better performance. We infer that both category- andrating-based features play an important role in Top-K recommenda-tion. (2) Another find is that variants with separated blocks usuallyoutperforms that with the only one block i.e., SDRLβ(whole). Itdemonstrates that semi-disentanglement significantly improves theperformance in Top-K recommendation.
(3) We also find a difference between the results on MovieLensdatasets (i.e., ml-ls and ml-10m) and Amazon-Book. On Movie-Lens datasets, SDRLβ(int+oth) outperforms SDRLβ(ext+oth), while onAmazon-Book SDRLβ(ext+oth) shows a better performance. Thatβsto say, the πππ‘πππππ πππππ has a bigger impact onMovieLens datasetsand the ππ₯π‘πππππ πππππ takes more effects on Amazon-Book. Weconduct further analysis and find that the difference may have aclose relation with the statistical characteristics of datasets. As isshown in Fig. 3, on MovieLens datasets, over 80% users relate withat least 14 categories (77.78%); and on Amazon-Book, over 80%
Conferenceβ17, July 2017, Washington, DC, USA Weiguang Chen, Wenjun Jiang, Xueqi Li, Kenli Li, Albert Zomaya, and Guojun Wang
Table 8: Comparison on Top-K recommendation with different combinations of blocks (%). (ml-ls)
method F1@5 F1@10 F1@15 ndcg@5 ndcg@10 ndcg@15SDRL 7.2283(Β±0.27) 9.9021(Β±0.27) 11.2493(Β±0.26) 10.5046(Β±0.57) 12.3565(Β±0.3) 14.654(Β±0.26)
SDRLβ(int+ext) 6.9607(Β±0.22) 9.7205(Β±0.24) 11.1285(Β±0.27) 9.8062(Β±0.48) 12.0343(Β±0.31) 14.5021(Β±0.25)SDRLβ(int+oth) 7.1316(Β±0.21) 9.9035(Β±0.26) 11.3039(Β±0.24) 10.247(Β±0.51) 12.2864(Β±0.29) 14.8077(Β±0.19)SDRLβ(ext+oth) 6.7585(Β±0.28) 9.2792(Β±0.19) 10.5929(Β±0.2) 9.5182(Β±0.55) 11.5124(Β±0.26) 13.985(Β±0.18)SDRLβ(whole) 6.2179(Β±0.75) 9.146(Β±0.59) 10.4072(Β±0.51) 12.796(Β±1.96) 13.1785(Β±1.05) 13.864(Β±0.6)
Table 9: Comparison on Top-K recommendation with different combinations of blocks (%). (ml-10m)
method F1@5 F1@10 F1@15 ndcg@5 ndcg@10 ndcg@15SDRL 7.4999(Β±0.16) 10.508(Β±0.23) 12.0106(Β±0.25) 11.4293(Β±0.46) 13.1128(Β±0.35) 15.7125(Β±0.27)
SDRLβ(int+ext) 6.8778(Β±0.23) 9.5076(Β±0.34) 10.8903(Β±0.44) 10.5319(Β±1.02) 11.8883(Β±0.81) 14.2037(Β±0.75)SDRLβ(int+oth) 7.432(Β±0.29) 10.4417(Β±0.38) 11.9863(Β±0.44) 11.936(Β±1.22) 13.2881(Β±0.88) 15.5704(Β±0.74)SDRLβ(ext+oth) 5.7576(Β±0.22) 8.4256(Β±0.28) 9.886(Β±0.46) 8.5747(Β±0.77) 10.3502(Β±0.62) 12.8529(Β±0.49)SDRLβ(whole) 5.1166(Β±0.39) 7.5039(Β±0.37) 8.7737(Β±0.53) 10.1397(Β±2.41) 10.757(Β±2.44) 11.8431(Β±2.08)
Table 10: Comparison on Top-K recommendation with different combinations of blocks (%). (Amazon-Book)
method F1@5 F1@10 F1@15 ndcg@5 ndcg@10 ndcg@15SDRL 9.0582(Β±0.07) 11.4625(Β±0.12) 12.2647(Β±0.14) 13.7224(Β±0.18) 13.3115(Β±0.14) 14.8334(Β±0.1)
SDRLβ(int+ext) 8.8361(Β±0.07) 11.1924(Β±0.1) 12.0149(Β±0.11) 13.3619(Β±0.17) 12.9737(Β±0.12) 14.5178(Β±0.08)SDRLβ(int+oth) 8.1967(Β±0.1) 10.4707(Β±0.12) 11.3059(Β±0.14) 12.3478(Β±0.18) 12.1008(Β±0.1) 13.5747(Β±0.08)SDRLβ(ext+oth) 9.0518(Β±0.08) 11.4633(Β±0.09) 12.3021(Β±0.1) 13.7024(Β±0.12) 13.3441(Β±0.09) 14.9128(Β±0.06)SDRLβ(whole) 1.908(Β±0.05) 2.4953(Β±0.05) 2.8446(Β±0.03) 3.0402(Β±0.11) 2.9335(Β±0.05) 3.3497(Β±0.02)
Table 11: Comparison on item classification with differentcombinations of blocks (%). (ml-ls)
method recall precison micro_F1SDRL 59.7927 71.2913 65.0372
SDRLβ(int+ext) 62.1688 74.263 67.6797SDRLβ(int+oth) 61.9033 73.8443 67.3484SDRLβ(ext+oth) 52.5572 62.046 56.908SDRLβ(whole) 39.9923 42.0991 41.0177
Table 12: Comparison on item classification with differentcombinations of blocks (%). (ml-10m)
method recall precison micro_F1SDRL 80.0098 80.0832 80.0464
SDRLβ(int+ext) 77.1831 77.2037 77.1933SDRLβ(int+oth) 76.558 76.4345 76.4961SDRLβ(ext+oth) 51.3015 49.2074 50.2283SDRLβ(whole) 47.1581 43.9449 45.492
Table 13: Comparison on item classification with differentcombinations of blocks (%). (Amazon-Book)
method recall precison micro_F1SDRL 94.4971 83.5058 88.6618
SDRLβ(int+ext) 98.2341 86.0925 91.7634SDRLβ(int+oth) 97.9884 85.9162 91.556SDRLβ(ext+oth) 53.6879 49.6705 51.6008SDRLβ(whole) 48.922 45.5 47.1487
users only relate with at least 12 categories (54.55%). The statisticalcharacteristics of datasets could affect the role of different features
in downstream tasks, and (semi-)disentanglement increases theflexibility to emphasize certain features.
We also conduct a deeper analysis in terms ofmodel optimization,to explore what causes the difference. As introduced in Section 3,we employ πππ π πππ‘ (in Equation 12) to encourage the πππ‘πππππ πππππto express relatively more category-based features. Meanwhile,the ππ₯π‘πππππ πππππ (in Equation 13) is expected to contain moreinteractive features with the optimization of πππ π ππ₯π‘ . In addition,πππ π πππππ (in Equation 9) optimizes the whole reconstruction ofusers, items and ratings, which would make the πππ‘πππππ πππππ alsocontain some interactive features.
We draw the trends of each parts of loss functions for three meth-ods (i.e., SDRL, SDRLβ(int+oth) and SDRLβ(ext+oth)) respectively inFig. 4. We observe that there is a similar trend among πππ π πππππ ,πππ π πππ‘ and πππ π ππ₯π‘ on MovieLens datasets in Fig.s 4(a), 4(d) and4(g) (or in Fig.s 4(b), 4(e) and 4(h)). That is, the training processfairly optimizes the πππ‘πππππ πππππ , ππ₯π‘πππππ πππππ and the wholereconstruction. However, on Amazon-Book in Fig.s 4(c), 4(f) and4(i), πππ π πππ‘ obviously decreases faster than the other two losses.
It demonstrates that in the model training on Amazon-Book,the optimization for πππ‘πππππ πππππ plays a more important role.The bias on Amazon-Book pushes the embeddings to contain morecategory-based features and relatively fewer rating-based features.The effect appears especially significant in SDRLβ(int+oth), in whichπππ π ππ₯π‘ is removed from loss function. However, in Top-K recom-mendation, interactive features based on ratings may play a majorpart. It probably explain why SDRLβ(int+oth) performs poorer thanSDRLβ(ext+oth) on Amazon-Book. Similarly, on MovieLens datasets,SDRLβ(ext+oth) generates representations without the supervisionof category-based information. Meanwhile, for SDRLβ(int+oth), the
Semi-Disentangled Representation Learning in Recommendation System Conferenceβ17, July 2017, Washington, DC, USA
0 50 100 150 200 250 300epoch
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
lossint
lossext
lossrecon
(a) SDRL, ml-ls
0 50 100 150 200 250 300epoch
0.0
0.1
0.2
0.3
0.4
0.5
0.6lossint
lossext
lossrecon
(b) SDRL, ml-10m
0 50 100 150 200 250 300epoch
0.00
0.05
0.10
0.15
0.20
0.25
0.30lossint
lossext
lossrecon
(c) SDRL, Amazon-Book
0 50 100 150 200 250 300epoch
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
lossint
lossrecon
(d) SDRLβ(int+other), ml-ls
0 50 100 150 200 250 300epoch
0.0
0.1
0.2
0.3
0.4
0.5
0.6lossint
lossrecon
(e) SDRLβ(int+other), ml-10m
0 50 100 150 200 250 300epoch
0.00
0.05
0.10
0.15
0.20
0.25
0.30lossint
lossrecon
(f) SDRLβ(int+other), Amazon-Book
0 50 100 150 200 250 300epoch
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
lossext
lossrecon
(g) SDRLβ(ext+other), ml-ls
0 50 100 150 200 250 300epoch
0.0
0.1
0.2
0.3
0.4
0.5
0.6lossext
lossrecon
(h) SDRLβ(ext+other), ml-10m
0 50 100 150 200 250 300epoch
0.00
0.05
0.10
0.15
0.20
0.25
0.30lossext
lossrecon
(i) SDRLβ(ext+other), Amazon-Book
Figure 4: Illustration of Losses in the model training.
fair optimization would encourage embeddings contain both cate-gory information and interactive features. Therefore, SDRLβ(int+oth)outperforms SDRLβ(ext+oth) on MovieLens datasets.
Last but not least, the difference also validates the importanceof employing multiple features as supervision for different blocksin the representation learning, i.e., supervised (semi-)disentangledrepresentation.
4.4 Hyper-parameter AnalysisIt is flexible to adjust the proportion of πππ‘πππππ πππππ , ππ₯π‘πππππ πππππand ππ‘βππ πππππ in the embeddings. We set the proportion as 2:1:1,1:2:1 and 1:1:2 to test their performance on Top-15 recommendation.The results in Fig. 5 show the variant with proportion 2:1:1 (i.e.,a bigger proportion for πππ‘πππππ πππππ) outperforms the others onMovieLens datasets. On Amazon-Book, the variant with a biggerproportion for ππ₯π‘πππππ πππππ performs better. Therefore, we setthe proportion in SDRL as 2:1:1 on the Movielens datasets and 1:2:1on Amazon-Book.
2:1:1 1:2:1 1:1:2block proportion
0
2
4
6
8
10
12
f1@
15(%
)
ml-lsml-10mAmazon
Figure 5: Impact of Block Proportion.
4.5 Representation VisualizationIn SDRL, πππ‘πππππ πππππ and ππ₯π‘πππππ πππππ are expected to respec-tively express features from individual itself and user-item interac-tions. We visualize node representation to qualitatively analyze thesemi-disentanglement of two parts.
Conferenceβ17, July 2017, Washington, DC, USA Weiguang Chen, Wenjun Jiang, Xueqi Li, Kenli Li, Albert Zomaya, and Guojun Wang
4.5.1 Visualization based on Item Representation. For clarity, wechoose the 7most common categories as labels and visualize πππ‘πππππ πππππand ππ₯π‘πππππ πππππ of items using t-SNE [28] respectively. The re-sults are revealed in Fig. 6. We could find that in the left figures,nodes in same color (i.e., items belonged to the same category)are more clustered. It indicates that πππ‘πππππ πππππ contains morecategory-related information than ππ₯π‘πππππ πππππ , which is consis-tent with our expectation.
60 40 20 0 20 40 60
60
40
20
0
20
40
60
(a) ml-ls (int)
75 50 25 0 25 50 75 100
40
20
0
20
40
60
(b) ml-ls (ext)
100 75 50 25 0 25 50 75 100100
75
50
25
0
25
50
75
100
(c) ml-10m (int)
100 50 0 50 100
75
50
25
0
25
50
75
(d) ml-10m (ext)
75 50 25 0 25 50 75 100 125100
50
0
50
100
(e) Amazon-Book (int)
75 50 25 0 25 50 75 100100
50
0
50
100
(f) Amazon-Book (ext)
Figure 6: Visualization of Item Representation.
4.5.2 Visualization based on User Representation. We also selectfour users (whose user IDs are 551, 267, 313 and 216) on MovieLens-latest-small to visualize their representation in Fig. 7. For 128-dimension embeddings on MovieLens-latest-small, the first 64 bitsrepresent πππ‘πππππ πππππ , the next 32 bits ππ₯π‘πππππ πππππ and thelast 32 bits ππ‘βππ πππππ . Among them, users No.551, No.267 andNo.313 share a similar πππ‘πππππ πππππ representation while the em-bedding of user No. 216 shows a different one. Based on sourcedata displayed in Fig. 7(b), we find that the former three users takeinterest in action, adventure and thriller movies while the laterlikes comedy movies the most. Meanwhile, there is similar infor-mation on ππ₯π‘πππππ πππππ among users No.216, No.267 and No.313(especially users No.216 and No.313), while user No.551 is a littlebit different. Based on the statistics of common items in Fig. 7(c),we could find that there are relatively more common items amongusers No.216, No. 313 and No.267 than user No.551 and the others.
0 32 64 96 128User Embedding
216267313551
User
Id
0.000.250.500.751.00
(a) Visualization of User Representation.
Action
Adven
ture
Animati
on
Children
Comed
yCrim
e
Documen
tary
Drama
Fantas
y
Film-Noir
Horror
Musical
Mystery
Roman
ceSci
-Fi
Thrill
er War
Western
Category
216267313551
User
Id
(b) Visualization of User Preference on Category.
0 5 10 15 20 25# Common Items
216&551267&551313&551216&267267&313216&313
User
Gro
up
(c) Statistics of Common Items.
Figure 7: Relation among User Representation, Preferenceon Category and Common Items.
Shortly, in linewith the expectation, πππ‘πππππ πππππ and ππ₯π‘πππππ πππππexpress more features of category and user-item interactions, re-spectively. It also improves the interpretability of node representa-tion as well as offer clues to generate explanations. For instance,when the recommended item better matches the target user onπππ‘πππππ πππππ , an explanation based on category maybe more rea-sonable, while that on ππ₯π‘πππππ πππππ indicates interaction-basedexplanation is more suitable.
4.6 Summary of ExperimentsIn summary, we have the following findings in the experiments.(1) Overall, SDRL gains stable and general improvements, demon-strating its effectiveness in representation learning. (2) In consis-tent with expectation, πππ‘πππππ πππππ and ππ₯π‘πππππ πππππ respec-tively express more category-based and interactive features. (3)Semi-disentanglement enhances the explainability and generalityin terms of representation in recommendation system.
5 RELATEDWORKWe would briefly review related works on representation in recom-mendation system and disentangled representation.
5.1 Representation in RecommendationSystem
Representation learning transforms the sparse data in recommen-dation system into structured embeddings, providing great con-venience for complex network process [6]. Based on the type ofsource data, existing representation methods could be divided intothree categories: structure-based [9, 26], content-based [39, 40] andboth-based [8, 12, 37]. Among them, both-based approaches receivelots of attentions in recent years, e.g., graph neural network (GNN).It initializes representation with content features and then updateit with structure information.
Our method SDRL also employs content and structure featuresas the input. The major difference between SDRL and GNN-basedapproaches is that SDRL embeds content- and structure-based infor-mation into different blocks (i.e., πππ‘πππππ πππππ and ππ₯π‘πππππ πππππrespectively) while they represent it as a whole.
Semi-Disentangled Representation Learning in Recommendation System Conferenceβ17, July 2017, Washington, DC, USA
5.2 Disentangled RepresentationDisentangled representation learning aims to separate embeddinginto distinct and explainable factors in an unsupervised way [2, 7].It has been successfully applied into computer vision [25]. Recently,Locatello et al. [19, 20] demonstrate that unsupervised disentangledrepresentation learning without inductive biases is theoreticallyimpossible. To deal with this problem, some works with (semi-)supervision are proposed [3, 21]. However, the assumption of fac-tor independence makes typical disentangled representation notapplicable for recommendation.
Our method differs from existing disentangled representationworks in that we do not separate the whole embedding (i.e., we pre-serve ππ‘βππ πππππ for no specific meanings) and we do not explicitlyforce the independence of different factors. In this way, we couldpreserve more complicated relations in recommendation system,while achieving proper disentanglement.
6 CONCLUSIONSTo improve both disentanglement and accuracy of representation,we propose a semi-disentangled representation method for recom-mendation SDRL.We take advantages of category features and user-item ratings as the supervision for the proposed πππ‘πππππ πππππ andππ₯π‘πππππ πππππ . To the best of our knowledge, it is the first attemptto develop semi-disentangled representation as well as improvedisentanglement with supervision in recommendation system. Weconduct intensive experiments on three real-world datasets andthe results validate the effectiveness and explainability of SDRL. Inthe future work, we would try to extract more labels and utilizethem to separate the explainable part into fine-grained features forextensive applications. We are also interested to deeply study theunexplainable part to manage the uncertainty.
REFERENCES[1] Yazeed Alharbi and Peter Wonka. Disentangled Image Generation Through
Structured Noise Injection. In CVPR 2020, pages 5133β5141, 2020.[2] Y. Bengio, A. Courville, and P. Vincent. Representation Learning: A Review
and New Perspectives. IEEE Trans. Pattern Anal. Mach. Intell., 35(8):1798β1828,August 2013.
[3] Junxiang Chen and Kayhan Batmanghelich. Weakly Supervised Disentanglementby Pairwise Similarities. In AAAI 2020, pages 3495β3502, 2020.
[4] Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya Sutskever, and PieterAbbeel. InfoGAN: Interpretable Representation Learning by Information Maxi-mizing Generative Adversarial Nets. In NIPS 2016, pages 2172β2180, 2016.
[5] Pengyu Cheng, Martin Renqiang Min, Dinghan Shen, Christopher Malon, YizheZhang, Yitong Li, and Lawrence Carin. Improving Disentangled Text Repre-sentation Learning with Information-Theoretic Guidance. In ACL 2020, pages7530β7541, 2020.
[6] Peng Cui, Xiao Wang, Jian Pei, and Wenwu Zhu. A Survey on Network Embed-ding. IEEE Trans. Knowl. Data Eng., 31(5):833β852, 2019.
[7] Kien Do and Truyen Tran. Theory and Evaluation Metrics for Learning Disen-tangled Representations. In ICLR 2020, 2020.
[8] Xinyu Fu, Jiani Zhang, Ziqiao Meng, and Irwin King. MAGNN: Metapath Aggre-gated Graph Neural Network for Heterogeneous Graph Embedding. InWWW2020, pages 2331β2341, 2020.
[9] Aditya Grover and Jure Leskovec. node2vec: Scalable Feature Learning forNetworks. In KDD 2016, pages 855β864, 2016.
[10] F. Maxwell Harper and Joseph A. Konstan. The MovieLens Datasets: History andContext. ACM Trans. Interact. Intell. Syst., 5(4):19:1β19:19, 2016.
[11] Ruining He and Julian J. McAuley. Ups and Downs: Modeling the Visual Evolutionof Fashion Trends with One-Class Collaborative Filtering. In WWW 2016, pages507β517, 2016.
[12] Xiangnan He, Kuan Deng, Xiang Wang, Yan Li, Yong-Dong Zhang, and MengWang. LightGCN: Simplifying and Powering Graph Convolution Network forRecommendation. In SIGIR 2020, pages 639β648, 2020.
[13] Irina Higgins, LoΓ―c Matthey, Arka Pal, Christopher Burgess, Xavier Glorot,Matthew Botvinick, Shakir Mohamed, and Alexander Lerchner. beta-VAE: Learn-ing Basic Visual Concepts with a Constrained Variational Framework. In ICLR2017, 2017.
[14] Geoffrey E. Hinton and Richard S. Zemel. Autoencoders, Minimum DescriptionLength and Helmholtz Free Energy. In NIPS 1993, pages 3β10, 1993.
[15] Diederik P. Kingma and Max Welling. An Introduction to Variational Autoen-coders. Found. Trends Mach. Learn., 12(4):307β392, 2019.
[16] Dmytro Kotovenko, Artsiom Sanakoyeu, Sabine Lang, and Bjorn Ommer. Contentand Style Disentanglement for Artistic Style Transfer. In ICCV 2019, pages 4421β4430, 2019.
[17] Xueqi Li, Wenjun Jiang, Weiguang Chen, Jie Wu, Guojun Wang, and Kenli Li.Directional and Explainable Serendipity Recommendation. InWWW 2020, pages122β132, 2020.
[18] Dawen Liang, Rahul G. Krishnan, Matthew D. Hoffman, and Tony Jebara. Varia-tional Autoencoders for Collaborative Filtering. InWWW 2018, pages 689β698,2018.
[19] Francesco Locatello, Stefan Bauer, Mario Lucic, Gunnar RΓ€tsch, Sylvain Gelly,Bernhard SchΓΆlkopf, and Olivier Bachem. Challenging Common Assumptionsin the Unsupervised Learning of Disentangled Representations. In ICML 2019,volume 97, pages 4114β4124, 2019.
[20] Francesco Locatello, Stefan Bauer, Mario Lucic, Gunnar RΓ€tsch, Sylvain Gelly,Bernhard SchΓΆlkopf, and Olivier Bachem. A Commentary on the UnsupervisedLearning of Disentangled Representations. In AAAI 2020, pages 13681β13684,2020.
[21] Francesco Locatello, Michael Tschannen, Stefan Bauer, Gunnar RΓ€tsch, BernhardSchΓΆlkopf, and Olivier Bachem. Disentangling Factors of Variations Using FewLabels. In ICLR 2020, 2020.
[22] Jianxin Ma, Chang Zhou, Peng Cui, Hongxia Yang, and Wenwu Zhu. LearningDisentangled Representations for Recommendation. In NeurIPS 2019, pages5712β5723, 2019.
[23] Jianxin Ma, Chang Zhou, Hongxia Yang, Peng Cui, Xin Wang, and Wenwu Zhu.Disentangled Self-Supervision in Sequential Recommenders. In KDD 2020, pages483β491, 2020.
[24] Julian J. McAuley, Christopher Targett, Qinfeng Shi, and Anton van den Hengel.Image-Based Recommendations on Styles and Substitutes. In SIGIR 2015, pages43β52, 2015.
[25] JΓ³zsef NΓ©meth. Adversarial Disentanglement with Grouped Observations. InAAAI 2020, pages 10243β10250, 2020.
[26] Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. DeepWalk: online learning ofsocial representations. In KDD 2014, pages 701β710, 2014.
[27] Jiezhong Qiu, Yuxiao Dong, Hao Ma, Jian Li, Kuansan Wang, and Jie Tang. Net-work Embedding as Matrix Factorization: Unifying DeepWalk, LINE, PTE, andnode2vec. In WSDM 2018, pages 459β467, 2018.
[28] Paulo E. Rauber, Alexandre X. FalcΓ£o, and Alexandru C. Telea. VisualizingTime-Dependent Data Using Dynamic t-SNE. In EuroVis 2016, pages 73β77, 2016.
[29] Benedek Rozemberczki, Oliver Kiss, and Rik Sarkar. Karate Club: An API OrientedOpen-source Python Framework for Unsupervised Learning on Graphs. In CIKM2020, 2020.
[30] Guy Shani and Asela Gunawardana. Evaluating Recommendation Systems. InRecommender Systems Handbook, pages 257β297. 2011.
[31] Dharahas Tallapally, Rama Syamala Sreepada, Bidyut Kr. Patra, and Korra SathyaBabu. User preference learning in multi-criteria recommendations using stackedauto encoders. In RecSys 2018, pages 475β479, 2018.
[32] Jian Tang, Meng Qu, and Qiaozhu Mei. PTE: Predictive Text Embedding throughLarge-scale Heterogeneous Text Networks. In KDD 2015, pages 1165β1174, 2015.
[33] Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei.LINE: Large-scale Information Network Embedding. In WWW 2015, pages 1067β1077, 2015.
[34] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones,Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is All youNeed. In NIPS 2017, pages 5998β6008, 2017.
[35] Pascal Vincent, Hugo Larochelle, Isabelle Lajoie, Yoshua Bengio, and Pierre-Antoine Manzagol. Stacked Denoising Autoencoders: Learning Useful Represen-tations in a Deep Network with a Local Denoising Criterion. J. Mach. Learn. Res.,11:3371β3408, 2010.
[36] Xiang Wang, Hongye Jin, An Zhang, Xiangnan He, Tong Xu, and Tat-Seng Chua.Disentangled Graph Collaborative Filtering. In SIGIR 2020, pages 1001β1010,2020.
[37] Xiang Wang, Yaokun Xu, Xiangnan He, Yixin Cao, Meng Wang, and Tat-SengChua. Reinforced Negative Sampling over Knowledge Graph for Recommenda-tion. In WWW 2020, pages 99β109, 2020.
[38] Yao Wu, Christopher DuBois, Alice X. Zheng, and Martin Ester. CollaborativeDenoising Auto-Encoders for Top-N Recommender Systems. In WSDM 2016,pages 153β162, 2016.
[39] Jie Zhang, Yuxiao Dong, Yan Wang, Jie Tang, and Ming Ding. ProNE: Fast andScalable Network Representation Learning. In IJCAI 2019, pages 4278β4284, 2019.
Conferenceβ17, July 2017, Washington, DC, USA Weiguang Chen, Wenjun Jiang, Xueqi Li, Kenli Li, Albert Zomaya, and Guojun Wang
[40] Kai Zhao, Ting Bai, Bin Wu, Bai Wang, Youjie Zhang, Yuanyu Yang, and Jian-Yun Nie. Deep Adversarial Completion for Sparse Heterogeneous InformationNetwork Embedding. In WWW 2020, pages 508β518, 2020.
[41] Qian Zhao, Jilin Chen, Minmin Chen, Sagar Jain, Alex Beutel, Francois Belletti,and Ed H. Chi. Categorical-attributes-based item classification for recommendersystems. In RecSys 2018, pages 320β328, 2018.
Recommended