Recommender Systemsand
Linked Open Data
Tommaso Di Noia
Polytechnic University of BariITALY
11th Reasoning Web Summer School โ Berlin August 1, 2015
[email protected]@TommasoDiNoia
Agenda
โข A quick introduction to Linked Open Data
โข Recommender systems
โข Evaluation
โข Recommender Systems and Linked Open Data
Linked (Open) Data
Some definitions:
โ A method of publishing data on the Web
โ (An instance of) the Web of Data
โ A huge database distributed in the Web
โ Linked Data is the Semantic Web done right
Web vs Linked Data
Web Linked Data
Analogy File System Database
Designed for Men Machines
(Software Agents)
Main elements Documents Things
Links between Documents Things
Semantics Implicit Explicit
Courtesy of Prof. Enrico Motta, The Open University, Milton Keynes โ Uk โ Semantic Web: Technologies and Applications.
URI
โข Every resource/entity/thing/relation isidentified by a (unique) URI
โ URI: <http://dbpedia.org/resource/Berlin>
โ CURIE: dbpedia:Berlin
โ URI: <http://purl.org/dc/terms/subject>
โ CURIE: dcterms:subject
Which vocabularies/ontologies?
โข Most popular on http://prefix.cc (July 25, 2015)
โ YAGO: http://yago-knowledge.org/resource/
โ FOAF: http://xmlns.com/foaf/0.1/
โ DBpedia Ontology: http://dbpedia.org/ontology/
โ DBpedia Properties: http://dbpedia.org/property/
Which vocabularies/ontologies?
โข Most popular on http://lov.okfn.org (July 25, 2015)
โ VANN: http://purl.org/vocab/vann/
โ SKOS: http://www.w3.org/2004/02/skos/core
โ FOAF
โ DCTERMS
โ DCE: http://purl.org/dc/elements/1.1/
RDF โ Resource Description Framework
โข Basic element: triple
[subject] [predicate] [object]
URI URI
URI | Literal
"string"@lang | "string"^^datatype
RDF โ Resource Description Framework
dbpedia:Berlin dbo:country dbpedia:Germany .
dbpedia:Berlin rdfs:label "Berlin"@en .
dbpedia:Berlin rdfs:label "Berlino"@it .
dbpedia:Berlin dbo:populationTotal "3517424"^^xsd:integer .
dbpedia:Berlin dcterms:subject category:Capitals_in_Europe .
dbpedia:Berlin rdf:type yago:UrbanArea108675967 .
dbpedia:Germany dbo:language dbpedia:German_Language .
dbpedia:Germany dbo:firstDriverCountry dbpedia:2014_German_Grand_Prix .
RDF โ Resource Description Framework
Germany Berlin
2014_German_Grand_Prix
German_Language
Capitals_in_Europe
UrbanArea108675967"Berlin"@en
"Berlin"@it
"3517424"^^xsd:integer
country
language
firstDriverCountry
type
subject
label
populationTotal
RDFS and OWL in two statements
dbo:country rdfs:range dbo:Country .
dbpedia:Berlin owl:sameAs freebase:Berlin .
SPARQL
PREFIX dbo: <http://dbpedia.org/ontology/>PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>PREFIX dcterms: <http://purl.org/dc/terms/>PREFIX category: <http://dbpedia.org/resource/Category:>
SELECT DISTINCT ?name ?city WHERE {?city dcterms:subject category:Capitals_in_Europe .?city rdfs:label ?name .?city dbo:populationTotal ?population .FILTER (?population < 30000) .}
SPARQL
curl -g -H 'Accept: application/json' 'http://dbpedia.org/sparql?query=SELECT+DISTINCT+?name+?city+WHERE+{?city+dcterms:subject+category:Capitals_in_Europe+.+?city+rdfs:label+?name+.+?city+dbpedia-owl:populationTotal+?population+.+FILTER+(?population+<+30000)+.}'
Personalized Information Access
โข Help the user in finding the information theymight be interested in
โข Consider their preferences/past behaviour
โข Filter irrelevant information
Recommender Systems
โข Help users in dealing with Information/Choice Overloadโข Help to match users with items
Some definitions
โ In its most common formulation, the recommendation problem is reduced to the problem of estimating ratings for the items that have not been seen by a user.
[G. Adomavicius and A. Tuzhilin. Toward the Next Generation of Recommender Systems: A survey of the State-of-the-Art and Possible Extension. TKDE, 2005.]
โ Recommender Systems (RSs) are software tools and techniques providing suggestions for items to be of use to a user.
[F. Ricci, L. Rokach, B. Shapira, and P. B. Kantor, editors. Recommender Systems Handbook. Springer, 2011.]
The problem
โข Estimate a utility function to automatically predict how much a user will like an item which is unknown to them.
InputSet of users
Set of items
Utility function
๐ = {๐ข1 , โฆ , ๐ข๐}
๐ = {๐ฅ1 , โฆ , ๐ฅ๐}
๐: ๐ ร ๐ โ ๐
โ ๐ข โ ๐, ๐ฅ๐ขโฒ = arg ๐๐๐ฅ๐ฅโ๐ ๐(๐ข, ๐ฅ)
Output
The rating matrix
5 1 2 4 3 ??
2 4 5 3 5 2
4 3 2 4 1 3
3 5 1 5 2 4
4 4 5 3 5 2
The
Mat
rix
Tita
nic
I lo
ve s
ho
pp
ing
Arg
o
Love
Act
ual
ly
The
han
gove
r
Tommaso
Vito
Phuong
Jessica
Paolo
The rating matrix(in the real world)
5 ? ? 4 3 ?
2 4 5 ? 5 ?
? 3 ? 4 ? 3
3 5 ? 5 2 ?
4 4 5 ? 5 2
The
Mat
rix
Tita
nic
I lo
ve s
ho
pp
ing
Arg
o
Love
Act
ual
ly
The
han
gove
r
Tommaso
Vito
Phuong
Jessica
Paolo
Recommendation techniques
โข Content-based
โข Collaborative filtering
โข Demographic
โข Knowledge-based
โข Community-based
โข Hybrid recommender systems
Collaborative Recommender Systems
Collaborative RSs recommend items to a user by identifying other users with a similar profile
Recommender System
User profile
Users
Item7
Item15Item11โฆ
Top-N Recommendations
Item1, 5Item2, 1Item5, 4Item10, 5โฆ.
โฆ.
Item1, 4Item2, 2Item5, 5Item10, 3โฆ.
Item1, 4Item2, 2Item5, 5Item10, 3โฆ.
Item1, 4Item2, 2Item5, 5Item10, 3โฆ.
Content-based Recommender Systems
Recommender System
User profile
Item7
Item15Item11โฆ
Top-N Recommendations
Item1, 5Item2, 1Item5, 4Item10, 5โฆ.
Items
Item1Item2
Item100Itemโs
descriptions
โฆ.
CB-RSs recommend items to a user based on their description and on the profile of the userโs interests
Knowledge-based Recommender Systems
Recommender System
Item7
Item15Item11โฆ
Top-N Recommendations
Items
Item1Item2
Item100Itemโs descriptions
โฆ.
KB-RSs recommend items to a user based on their description and domain knowledge encoded in a knowledge base
Knowledge-base
Collaborative Filtering
โข Memory-based
โ Mainly based on k-NN
โ Does not require any preliminary model building phase
โข Model-based
โ Learn a predictive model before computingrecommendations
User-based Collaborative Recommendation
5 1 2 4 3 ??
2 4 5 3 5 2
4 3 2 4 1 3
3 5 1 5 2 4
4 4 5 3 5 2
The
Mat
rix
Tita
nic
I lo
ve s
ho
pp
ing
Arg
o
Love
Act
ual
ly
The
han
gove
r
Tommaso
Vito
Phuong
Jessica
Paolo
Pearsonโs correlation coefficient
Rate prediction
= ๐
Item-based Collaborative Recommendation
5 1 2 4 3 ??
2 4 5 3 5 2
4 3 2 4 1 3
3 5 1 5 2 4
4 4 5 3 5 2
The
Mat
rix
Tita
nic
I lo
ve s
ho
pp
ing
Arg
o
Love
Act
ual
ly
The
han
gove
r
๐ ๐๐ ๐ฅ๐ , ๐ฅ๐ = ๐ฅ๐ โ ๐ฅ๐
|๐ฅ๐| โ |๐ฅ๐ |=
ฯ ๐๐ข,๐ฅ๐โ ๐๐ข,๐ฅ๐
๐ข
ฯ ๐๐ข,๐ฅ๐2
๐ข
โ ฯ ๐๐ข,๐ฅ2
๐ข
Cosine Similarity
Rate prediction
๐ว ๐ข๐ , ๐ฅโฒ = ฯ ๐ ๐๐ ๐ฅิฆ, ๐ฅิฆโฒ โ ๐๐ฅ,๐ข๐
๐ฅโ๐๐ข๐
ฯ ๐ ๐๐ ๐ฅิฆ, ๐ฅิฆโฒ ๐ฅโ๐๐ข๐
Adjusted Cosine Similarity
= ๐๐ข๐
Tommaso
Vito
Phuong
Jessica
Paolo
Content-Based Recommender Systems
โข Items are described in terms of attributes/features
โข A finite set of values is associated to eachfeature
โข Item representation is a (Boolean) vector
Content-based
CB-RSs try to recommend items similar* to those a given user has liked in the past
[P. Lops, M. de Gemmis, G. Semeraro. Content-based Recommender Systems: State of the Art and Trends. Recommender Systems Handbook. 2011]
โข Heuristic-based
โ Usually adopt techniques borrowed from IR
โข Model-based
โ Often we have a model for each user
(*) similar from a content-based perspective
CB drawbacks
โข Content overspecialization
โข Portfolio effect
โข Sparsity / Cold-start
โ New user
Knowledge-basedRecommender Systems
โข Conversational approaches
โข Reasoning techniques
โ Case-based reasoning
โ Constraint reasoning
Hybrid recommender systems
โข Weighted
โข Switching
โข Mixed
โข Feature combination
โข Cascade
โข Feature augmentation
โข Meta-level
Robin D. Burke. Hybrid recommender systems: Survey and experiments. User Model. User-Adapt. Interact., 12(4):331โ370, 2002.
Protocols
โข Rated test-items
โข All unrated items: compute a score for everyitem not rated by the user (also items notappearing in the user test set)
Accuracy metrics for rating prediction
๐๐๐๐ ๐ด๐๐ ๐๐๐ข๐ก๐ ๐ธ๐๐๐๐
๐ ๐๐๐ก ๐๐๐๐ ๐๐๐ข๐๐๐๐ ๐ธ๐๐๐๐
MAE and RMSE drawback
โข Not very suitable for top-N recommendation
โ Errors in the highest part of the recommendationlist are considered in the same way as the ones in the lowest part
Accuracy metrics for top-N recommendation
๐๐๐๐๐๐ ๐๐๐ @ ๐
๐๐ข@๐ = |๐ฟ๐ข ๐ โฉ ๐๐๐ข
+|
๐
๐ ๐๐๐๐๐ @ ๐
๐ ๐ข@๐ = |๐ฟ๐ข ๐ โฉ ๐๐๐ข
+|
|๐๐๐ข+|
๐๐๐๐๐๐๐๐ง๐๐ ๐ท๐๐ ๐๐๐ข๐๐ก ๐ถ๐ข๐๐ข๐๐๐ก๐๐ฃ๐ ๐บ๐๐๐ @ ๐
๐ฟ๐ข ๐ is the recommendation list up to the N-th element
๐๐๐ข+ is the set of relevant test
items for ๐ข
๐ผ๐ท๐ถ๐บ@๐ indicates the score Obtained by an ideal ranking of ๐ฟ๐ข ๐
Is all about precision?
โข Diversity
โ Avoid to recommend only items in a small subset of the catalog
โ Suggest diverse items in the recommendation list
โข Novelty
โ Recommend items in the long tail
โข Serendipity
โ Suggest unexpected but interesting items
Diversity
๐ผ๐๐ก๐๐ โ ๐ฟ๐๐ ๐ก ๐ท๐๐ฃ๐๐๐ ๐๐ก๐ฆ
๐ผ๐ฟ๐ท๐ข@๐ = 1
2โ 1 โ ๐ ๐๐ ๐ฅ๐ , ๐ฅ๐
๐ฅ๐โ๐ฟ๐ข ๐
๐ฅ๐โ๐ฟ๐ข ๐
๐ผ๐ฟ๐ท@๐ = 1
|๐|โ ๐ผ๐ฟ๐ท๐ข@๐
๐ขโ๐
๐ด๐๐๐๐๐๐๐ก๐ ๐ท๐๐ฃ๐๐๐ ๐๐ก๐ฆ
๐ด๐ท๐๐@๐ = | ฺ ๐ฟ๐ข(๐)
๐ขโ๐ |
|๐|
Content-Based Recommender Systems
P. Lops, M. de Gemmis, G. Semeraro. Content-based recommender Systems: State of the Art and Trends. In: P. Kantor, F. Ricci, L. Rokach, B. Shapira,
editors, Recommender Systems Hankbook: A complete Guide for Research Scientists & Practitioners
Content-Based Recommender Systems
P. Lops, M. de Gemmis, G. Semeraro. Content-based recommender Systems: State of the Art and Trends. In: P. Kantor, F. Ricci, L. Rokach, B. Shapira,
editors, Recommender Systems Hankbook: A complete Guide for Research Scientists & Practitioners
Need of domain knowledge!We need rich descriptions of the items!
No suggestion is available if the analyzed content does not contain enough information to discriminate items the user might like from items the user might not like.*
(*) P. Lops, M. de Gemmis, G. Semeraro. Content-based Recommender Systems: State of the Art and Trends. In: P. Kantor, F. Ricci, L. Rokach and B. Shapira, editors, Recommender Systems Handbook: A Complete Guide for Research Scientists & Practitioners
The quality of CB recommendations are correlated with the quality of the features that are explicitly associated with the items.
Limited Content Analysis
Traditional Content-based RecSys
โข Base on keyword/attribute -based item representations
โข Rely on the quality of the content-analyzer to extract expressive item features
โข Lack of knowledge about the items
Semantic-aware approaches
Traditional Ontological/Semantic
Recommender Systems
make use of limited
domain
ontologies;
What about Linked Data?
Use Linked Data to mitigate the limited content analysis issue
โข Plenty of structured data available
โข No Content Analyzer required
Linking Open Data cloud diagram 2014, by Max Schmachtenberg, Christian Bizer, Anja Jentzsch and Richard Cyganiak. http://lod-cloud.net/
Why RS + LOD
โข Standardized access to dataPREFIX dbpedia: <http://dbpedia.org/resource/>
PREFIX dbo: <http://dbpedia.org/ontology/>SELECT ?actor WHERE {
dbpedia:Pulp_Fiction dbo:starring ?actor .
}
PREFIX yago: <http://yago-knowledge.org/resource/>
PREFIX owl: <http://www.w3.org/2002/07/owl#>PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>PREFIX dbpedia-owl: <http://dbpedia.org/ontology/>
CONSTRUCT{?book ?p ?o .?book yago:linksTo ?yagolink .
}WHERE{
SERVICE <http://live.dbpedia.org/sparql> {
?book rdf:type dbpedia-owl:Book .?book ?p ?o .?book owl:sameAs ?yago .
FILTER(regex(str(?yago),"http://yago-knowledge.org/resource/")) .}
SERVICE <http://lod2.openlinksw.com/sparql> {?yago yago:linksTo ?yagolink .
}
}
Direct item Linking
dbpedia:I_Am_Legend_(film)
dbpedia:Troy_(film)
dbpedia:Scarface_(1983_film)
dbpedia:Scarface:_The_World_Is_Yours
Direct Item Linking
โข The easy way
SELECT DISTINCT ?uri, ?title WHERE {?uri rdf:type dbpedia-owl:Film.?uri rdfs:label ?title.FILTER langMatches(lang(?title), "EN") .FILTER regex(?title, "matrix", "i")
}
Direct item Linking
โข Other approaches
โ DBpedia Lookup
https://github.com/dbpedia/lookup
โ Silk Framework
http://silk-framework.com/
Item Graph Analyzer
โข Build your own knowledge graph
โ Select relevant properties. Possible solutions:
โข Ontological properties
โข Categorical properties
โข Frequent properties
โ Explore the graph up to a limited depth
Different item featuresrepresentations
โข Direct properties
โข Property paths
โข Node paths
โข Neighborhoods
โข โฆ
Datasets
Subset of Movielens mapped to DBpedia
Subset of Last.fm mapped to DBpedia
Subset of The Library Thing mapped to DBpedia
Mappings
http://sisinflab.poliba.it/semanticweb/lod/recsys/datasets/
Vector Space Model for LOD
Righteous Kill
starringdirector
subject/broadergenre
Heat
Ro
ber
tD
e N
iro
Joh
n A
vne
t
Seri
al k
ille
r fi
lms
Dra
ma
Al P
acin
oB
rian
Den
neh
y
He
ist
film
sC
rim
efi
lms
starring
Ro
be
rtD
e N
iro
Al P
acin
o
Bri
an D
en
ne
hy
Righteous KillHeat
โฆ โฆ
Vector Space Model for LOD
Righteous Kill
STARRINGAl Pacino
(v1)
Robert De Niro
(v2)
BrianDennehy
(v3)
Righteous Kill (m1)
X X X
Heat (m2) X X
Heat
Righteous Kill (x1) wv1,x1 wv2,x1 wv3,x1
Heat (x2) wv1,x2 wv2,x2 0
๐ค๐ด๐๐๐๐๐๐๐,๐ป๐๐๐ก = ๐ก๐๐ด๐๐๐๐๐๐๐,๐ป๐๐๐ก โ ๐๐๐๐ด๐๐๐๐๐๐๐
Vector Space Model for LOD
Righteous Kill
STARRINGAl Pacino
(v1)
Robert De Niro
(v2)
BrianDennehy
(v3)
Righteous Kill (m1)
X X X
Heat (m2) X X
Heat
Righteous Kill (x1) wv1,x1 wv2,x1 wv3,x1
Heat (x2) wv1,x2 wv2,x2 0
๐ค๐ด๐๐๐๐๐๐๐,๐ป๐๐๐ก = ๐ก๐๐ด๐๐๐๐๐๐๐,๐ป๐๐๐ก โ ๐๐๐๐ด๐๐๐๐๐๐๐
๐ก๐ โ {0,1}
Vector Space Model for LOD
+
+
+
โฆ =
๐๐๐๐๐๐๐๐๐๐๐(๐๐ , ๐๐) = ๐๐๐,๐๐
โ ๐๐๐,๐๐+ ๐๐๐,๐๐
โ ๐๐๐,๐๐+ ๐๐๐,๐๐
โ ๐๐๐,๐๐
๐๐๐,๐๐๐ + ๐๐๐,๐๐
๐ + ๐๐๐,๐๐๐
โ ๐๐๐,๐๐
๐ + ๐๐๐,๐๐๐ + ๐๐๐,๐๐
๐
๐ถ๐๐๐๐๐๐๐๐ โ ๐๐๐๐๐๐๐๐๐๐๐(๐๐, ๐๐)
๐ถ๐ ๐๐๐๐๐๐๐ โ ๐๐๐๐ ๐๐๐๐๐๐๐(๐๐, ๐๐)
๐ถ๐๐๐๐๐๐๐ โ ๐๐๐๐๐๐๐๐๐๐(๐๐, ๐๐)
๐๐๐ (๐๐, ๐๐)
VSM Content-based Recommender
We predict the rating using a Nearest Neighbor Classifier wherein the similarity measure is a linear combination of local property similarities
If this similarity is greater or equal to 0, we suggest the movie m to the user u.
Tommaso Di Noia, Roberto Mirizzi, Vito Claudio Ostuni, Davide Romito, Markus Zanker. Linked Open Data to support Content-based Recommender Systems. 8th International Conference on Semantic Systems (I-SEMANTICS) - 2012
VSM Content-based Recommender
We predict the rating using a Nearest Neighbor Classifier wherein the similarity measure is a linear combination of local property similarities
If this similarity is greater or equal to 0, we suggest the movie m to the user u.
Selected properties
VSM Content-based Recommender
We predict the rating using a Nearest Neighbor Classifier wherein the similarity measure is a linear combination of local property similarities
If this similarity is greater or equal to 0, we suggest the movie m to the user u.
heuristic-based โ model-based
Property subset evaluation
The subject+broadersolution is better than only subject or subject+morebroaders.
The best solution is achieved with subject+broader+genres.
Too many broadersintroduce noise.
Rated test items protocol
Path-based features
Analysis of complex relations between the user preferences and the target item
Vito Claudio Ostuni, Tommaso Di Noia, Eugenio Di Sciascio, Roberto Mirizzi. Top-N Recommendations from Implicit Feedback leveraging Linked Open Data.
7th Conference on Recommender Systems (RecSys ) โ 2013
Data model
I1 i2 i3 i4
u1 1 1 0 0
u2 1 0 1 0
u3 0 1 1 0
u4 0 1 0 1
Implicit Feedback Matrix Knowledge Graph
^
S
Data modelImplicit Feedback Matrix Knowledge Graph
^
S
I1 i2 i3 i4
u1 1 1 0 0
u2 1 0 1 0
u3 0 1 1 0
u4 0 1 0 1
Data modelImplicit Feedback Matrix Knowledge Graph
^
S
I1 i2 i3 i4
u1 1 1 0 0
u2 1 0 1 0
u3 0 1 1 0
u4 0 1 0 1
Path-based features
Path: acyclic sequence of relations ( s , .. rl , .. rL )
Frequency of j-th path in the sub-graph related to u and x
โข The more the paths, the more the relevance of the item.โข Different paths have different meaning.โข Not all types of paths are relevant.
u3 s i2 p2 e1 p1 i1 (s, p2 , p1)
Problem formulation
Feature vector
Set of irrelevant items for u
Set of relevant items for u
Training Set
Sample of irrelevant items for u
๐๐ข+ = ๐ฅ โ ๐ ๐ ฦธ๐ข๐ฅ = 1}
๐๐ขโ = ๐ฅ โ ๐ ๐ ฦธ๐ข๐ฅ = 0}
๐๐ขโโ โ ๐๐ข
โ
๐ค๐ข๐ฅ โ โ๐ท
TR = ฺ < ๐ค๐ข๐ฅ , ๐ ฦธ๐ข๐ฅ > ๐ฅ โ (๐๐ข+ โช ๐๐ข
โโ)} ๐ข
u1
u2
u3
e1
e3
e4
e2
e5
u4
Path-based features
path(1) (s, s, s) : 2path(2) (s, p2, p1) : 1
x1
x2
x3
x4
u1
u2
u3
e1
e3
e4
e2
e5
u4
Path-based features
path(1) (s, s, s) : 2path(2) (s, p2, p1) : 2
x1
x2
x3
x4
u1
u2
u3
e1
e3
e4
e2
e5
u4
Path-based features
path(1) (s, s, s) : 2path(2) (s, p2, p1) : 2path(3) (s, p2, p3, p1) : 1
x1
x2
x3
x4
Path-based features
path(1) (s, s, s) : 2path(2) (s, p2, p1) : 2path(3) (s, p2, p3, p1) : 1
u1
u2
u3
e1
e3
e4
e2
e5
u4
x1
x2
x3
x4
Evaluation of different ranking functions
0
0,1
0,2
0,3
0,4
0,5
0,6
given 5 given 10 given 20 given 30 given 50 given All
reca
ll@5
user profile size
Movielens
BagBoo
GBRT
Sum
Evaluation of different ranking functions
0
0,1
0,2
0,3
0,4
0,5
0,6
given 5 given 10 given 20 given All
reca
ll@5
user profile size
Last.fm
BagBoo
GBRT
Sum
Comparative approaches
โข BPRMF, Bayesian Personalized Ranking for Matrix Factorization
โข BPRLin, Linear Model optimized for BPR (Hybrid alg.)
โข SLIM, Sparse Linear Methods for Top-N Recommender Systems
โข SMRMF, Soft Margin Ranking Matrix Factorization
MyMediaLite
Comparison with other approaches
0
0,1
0,2
0,3
0,4
0,5
0,6
given 5 given 10 given 20 given 30 given 50 given All
user profile size
Movielens
SPrank
BPRMF
SLIM
BPRLin
SMRMF
pre
cisi
on
@5
Comparison with other approaches
0
0,1
0,2
0,3
0,4
0,5
0,6
given 5 given 10 given 20 given All
user profile size
Last.fm
SPrank
BPRMF
SLIM
BPRLin
SMRMF
pre
cisi
on
@5
Graph-based Item Representation
The Godfather
Mafia_films
Gangster_films
American Gangster
Films_about_organized_crime_in_the_United_States
Best_Picture_Academy_Award_winners
Best_Thriller_Empire_Award_winners
Films_shot_in_New_York_City
subject
subjectsubject
subject
subject
subject
subject
Vito Claudio Ostuni, Tommaso Di Noia, Roberto Mirizzi, Eugenio Di Sciascio. A Linked Data Recommender System using a Neighborhood-based Graph Kernel. The 15th International Conference on Electronic Commerce and Web Technologies โ 2014
Graph-based Item Representation
The Godfather
Mafia_films Films_about_organized_crime
Gangster_films
American Gangster
Films_about_organized_crime_in_the_United_States
Films_about_organized_crime_by_country
Best_Picture_Academy_Award_winners
Best_Thriller_Empire_Award_winners
Awards_for_best_film
Films_shot_in_New_York_City
subject
subjectsubject
broader
broader
broader
broader
broader
subject
subject
subject
subject
Graph-based Item Representation
The Godfather
Mafia_films Films_about_organized_crime
Gangster_films
American Gangster
Films_about_organized_crime_in_the_United_States
Films_about_organized_crime_by_country
Best_Picture_Academy_Award_winners
Best_Thriller_Empire_Award_winners
Awards_for_best_film
Films_shot_in_New_York_City
subject
subjectsubject
broader
broader
broader
broader
broader
subject
subject
subject
subject
Graph-based Item Representation
The Godfather
Mafia_films Films_about_organized_crime
Gangster_films
American Gangster
Films_about_organized_crime_in_the_United_States
Films_about_organized_crime_by_country
Best_Picture_Academy_Award_winners
Best_Thriller_Empire_Award_winners
Awards_for_best_film
Films_shot_in_New_York_City
subject
subjectsubject
broader
broader
broader
broader
broader
broader
subject
subject
subject
subject
Graph-based Item Representation
The Godfather
Mafia_films Films_about_organized_crime
Gangster_films
American Gangster
Films_about_organized_crime_in_the_United_States
Films_about_organized_crime_by_country
Best_Picture_Academy_Award_winners
Best_Thriller_Empire_Award_winners
Awards_for_best_film
Films_shot_in_New_York_City
subject
subjectsubject
broader
broader
broader
broader
broader
broader
subject
subject
subject
subject
Exploit entities descriptions
h-hop Item Neighborhood Graph
The Godfather
Mafia_films Films_about_organized_crime
Gangster_films
Best_Picture_Academy_Award_winners Awards_for_best_film
Films_shot_in_New_York_City
subject
subjectsubject
broader
broader
broader
Kernel Methods
Work by embedding data in a vector space and looking for linear patterns in such space
๐ฅ โ ๐(๐ฅ)
[Kernel Methods for General Pattern Analysis. Nello Cristianini . http://www.kernel-methods.net/tutorials/KMtalk.pdf]
๐(๐ฅ)
๐๐ฅInput space Feature space
We can work in the new space F by specifying an inner product function between points in it
๐ ๐ฅ๐, ๐ฅ๐ = < ๐(๐ฅ๐), ๐(๐ฅ๐)>
h-hop Item Entity-based Neighborhood Graph Kernel
Explicit computation of the feature map
entity importance in the item neighborhood graph
๐๐บโ ๐ฅ๐, ๐ฅ๐ = ๐๐บโ ๐ฅ๐ , ๐๐บโ ๐ฅ๐
๐๐บโ ๐ฅ๐ = (๐ค๐ฅ๐ ,๐1, ๐ค๐ฅ๐ ,๐2
, โฆ, ๐ค๐ฅ๐ ,๐๐, โฆ , ๐ค๐ฅ๐ ,๐๐ก
)
Explicit computation of the feature map
# edges involving ๐๐ at l hops from ๐ฅ๐
a.k.a. frequency of the entity in the
item neighborhood graph
factor taking into account at which hop the entity appears
h-hop Item Entity-based Neighborhood Graph Kernel
๐๐บโ ๐ฅ๐, ๐ฅ๐ = ๐๐บโ ๐ฅ๐ , ๐๐บโ ๐ฅ๐
๐๐บโ ๐ฅ๐ = (๐ค๐ฅ๐ ,๐1, ๐ค๐ฅ๐ ,๐2
, โฆ, ๐ค๐ฅ๐ ,๐๐, โฆ , ๐ค๐ฅ๐ ,๐๐ก
)
Weights computation example
i
e1e2
p3
p2
e4
e5
p3p3
h=2
๐๐1 ๐ฅ๐ ,๐1= 2
๐๐1 ๐ฅ๐ ,๐2= 1
๐๐2 ๐ฅ๐ ,๐4= 1
๐๐2 ๐ฅ๐ ,๐5= 2
Weights computation example
i
e1e2
p3
p2
e4
e5
p3p3
h=2
๐๐1 ๐ฅ๐ ,๐1= 2
๐๐1 ๐ฅ๐ ,๐2= 1
๐๐2 ๐ฅ๐ ,๐4= 1
๐๐2 ๐ฅ๐ ,๐5= 2
Informative entity about the item even if not directly related to it
Experimental Settings
โข Trained a SVM Regression model for each user
โข Accuracy Evaluation: Precision, Recall
โข Novelty Evaluation: Entropy-based Novelty (All Items protocol) [the lower the better]
Comparative approaches
โขNB: 1-hop item neigh. + Naive Bayes classifier
โขVSM: 1-hop item neigh. Vector Space Model (tf-idf) + SVM regr
โขWK: 2-hop item neigh. Walk-based kernel + SVM regr
Comparison with other approaches (i)
0
0,1
0,2
0,3
0,4
0,5
0,6
0,7
0,8
Prec@10 [20/80] Prec@10 [40/60] Prec@10 [80/20]
NK-bestPrec
NK-bestEntr
NB
VSM
WK
Rated test items protocol
Comparison with other approaches (ii)
0
0,2
0,4
0,6
0,8
1
1,2
1,4
1,6
1,8
EBN@10 [20/80] EBN@10 [40/60] EBN@10 [80/20]
NK-bestPrec
NK-bestEntr
NB
VSM
WK
The FreeSound case study
Vito Claudio Ostuni, Sergio Oramas, Tommaso Di Noia, Xavier Serra, Eugenio Di Sciascio. A Semantic Hybrid Approach for Sound Recommendation. 24th
World Wide Web Conference - 2015
FreeSound Knowledge Graph
Item textual descriptions enrichment: Entity Linking tools can be usedto enrich item textual descriptions with LOD
Explicit computation of the feature map
# sequences and subsequences of nodes
from ๐ฅ๐ to em
Normalization factor
h-hop Item Node-Based Neighborhood Graph Kernel
๐๐บโ ๐ฅ๐ = (๐ค๐ฅ๐ ,๐โ1, โฆ, ๐ค๐ฅ๐ ,๐โ๐
, โฆ , ๐ค๐ฅ๐ ,๐โ๐ก)
๐๐บโ ๐ฅ๐, ๐ฅ๐ = ๐๐บโ ๐ฅ๐ , ๐๐บโ ๐ฅ๐
Hybrid Recommendation via Feature Combination
The hybridizations is based on the combination of different data sources
Final approach: collaborative + LOD + textual description + tags
Users who rated the item
u1 u2 u3 โฆ. entity1 entity2 โฆ. keyw1 keyw2 โฆ tag1 โฆ
entities from the knowledgegraph (explicit feature mapping)
Keywords extracted from the textual description
tags associated to the item
Item Feature Vector
โข Feature combination hybrid approach
โข adding collaborative features to item content feature vectors can improveconsiderably recommendation accuracy
โข Semantic Enrichment
โข semantics can help in improving different performances beyond accuracysuch as novelty and catalog coverage
Hybrid approaches: some lessons learnt
Select the domain(s) of your RS
SELECT count(?i) AS ?num ?c
WHERE {
?i a ?c .
FILTER(regex(?c, "^http://dbpedia.org/ontology")) .
}
ORDER BY DESC(?num)
A comparison betweenDBpedia and Freebase
Accuracy Coverage Diversity Novelty
Freebase + + - -
DBpedia - - + +
Phuong Nguyen, Paolo Tomeo, Tommaso Di Noia, Eugenio Di Sciascio. Content-based recommendations via DBpedia and Freebase: a case study in the music domain. The 14th International Semantic Web Conference - ISWC 2015
A comparison betweenDBpedia and Freebase
Accuracy Coverage Diversity Novelty
1-hop - - - +
2-hop + + + -
Phuong Nguyen, Paolo Tomeo, Tommaso Di Noia, Eugenio Di Sciascio. Content-based recommendations via DBpedia and Freebase: a case study in the music domain. The 14th International Semantic Web Conference - ISWC 2015
Conclusions
โข Linked Open Data to enrich the content descriptions of item
โข Exploit different characteristcs of the semantic network to represent/learn features
โข Improved accuracyโข Improved noveltyโข Improved Aggregate Diversityโข Entity linking for a better expoitation of text-based
dataโข Select the right approach, dataset, set of properties to
build your RS
Open issues
โข Generalize to graph pattern extraction to represent features
โข Automatically select the triples related to the domain of interest
โข Automatically select meaningful properties to represent items
โข Analysis with respect to ยซknowledgecoverageยป of the datasetโ What is the best approach?
Not covered here
โข User profile
โข Preferences
โข Context-aware
โข Knowledge-based approaches
โข โฆ
Many thanks to the RecSys crew @ SisInf Lab
Roberto Mirizzi
now at Yahoo! CA
Vito Claudio Ostuni
now at
Jessica Rosati
Phd Fellowship Awardee @
Paolo Tomeo
Jindลich Mynarz
Phuong Nguyen
Sergio Oramas
Aleksandra Karpus
Visiting Students and PostDoc
Recommender Systemsand
Linked Open Data
Tommaso Di Noia
Polytechnic University of BariITALY
11th Reasoning Web Summer School โ Berlin August 1, 2015
[email protected]@TommasoDiNoia