Upload
nicolas-marie
View
988
Download
0
Embed Size (px)
DESCRIPTION
Discovery hub is an exploratory search engine (http://en.wikipedia.org/wiki/Exploratory_search) which helps you to discover things you might like or be interested in. It widens your cultural and knowledge horizons by revealing and explaining unattended information. Want a film recommendation related to writers you like ? Want to discover bands at the crossroad of an electro and rock record-labels you like ? Interested by more complex and composite recommendations based on your deepest interests : a writer, a film and a band combination ? Or maybe something simpler ? If you have a thirst for discovery and knowledge, Discovery Hub has answers for you. Discovery Hub is based on leading edge semantic web technologies. It allows you to discover new and unknown items of interest starting from what you like. Thanks to Discovery Hub you interactively explore DBpedia. DBpedia is a huge knowledge graph derived from Wikipedia data, it is composed of approximately 4 millions entities linked by more than 270 millions connexions. DBpedia covers many topics such as arts, technology, sciences, sport, etc. Discovery Hub allows performing queries in an innovative way and helps you to navigate rich results. As a hub, it proposes redirections to others platforms to make you benefit from your discoveries (Youtube, Deezer and more). The results are explained in depth thanks to 3 explanatory features. It supports composite explorations i.e. starting from several items of interest; and proposes advanced exploration modes such as serendipitous, multi-lingual, and fine-grained ones Discovery Hub V2 is more social ! You can like a topic, and share it on Twitter, but more important, now you can share searches you've made, collections you made, to your Discovery Hub followers ! And of course you can also follow your friends and/or interesting people if you find them !
Citation preview
COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery hub - a discovery engine on the top of DBpedia
Nicolas Marie, Fabien Gandon, Damien Legrand, Myriam Ribière
COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
2
CONTEXT
RESEARCH QUESTION
RESEARCH - Proposition - Implementation - Operational prototypes - Users evaluations
PUBLICATION
COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
3
CONTEXT
RESEARCH QUESTION
RESEARCH - Proposition - Implementation - Operational prototypes - Users evaluations
PUBLICATION
COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub
Search is only a partially solved problem
[ White, 2006]
COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub
Gary Marchioninni, 2006
COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub
Exploration/discovery todayLookup today
« Claude Monet » + impressionism
« Claude Monet » + birthday
COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub
COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub
COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub
Search is only a partially solved problem, White 2006
The degree of structure of the web content is the determining factor for the type of functionality that search engines can provide,
Bizer and al., 2012
COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub
SEMANTIC WEB
• “The Semantic Web is a mesh of information linked up in such a way as to be easily processable by machines, on a global scale. You can think of it as being an efficient way of representing data on the World Wide Web, or as a globally linked database.” Marianna Sigala, Luisa Mich, Jamie Murphy
COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub
Tim Berners-Lee, WWW1994
[Stankovic, 2012]
COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub
3 000+ words, 13p. 191 triples
http://en.wikipedia.org/wiki/Claude_Monet
http://dbpedia.org/resource/Claude_Monet
COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub
•Accessible through:
Browsers
Dumps
SPARQL endpoint
Select * where { <http://dbpedia.org/resource/Claude_Monet> <http://dbpedia.org/property/influencedBy> ?x}
COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub
DBpedia
Google Knowledge
Graph
Linked Open Data cloud3.77 millions things
270 millions facts
500 millions things3.5 billions facts
31+ billions facts
Close Open Open
COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub
Google knowledge graph, 2012
COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub
Since 1995 Since 2007
2001 2007
COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub• Linked-data based exploratory
search systems
User interest, start point
Interactive result space
Results choiceRankingSorting/categorizationExplanation
dbpedia:Claude Monet
COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub State of the art: Seevl, 2010
COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub State of the art: Yovisto, 2010
COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub State of the art: LED, 2010
COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub State of the art: MORE, 2012
COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub State of the art: Aemoo, 2011
COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub State of the art: Kaminskas et al., 2011
COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub State of the art: Google Knowledge Panel, 2012
COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
25
CONTEXT
RESEARCH QUESTION
RESEARCH - Proposition - Implementation - Operational prototypes - Users evaluations
PUBLICATION
COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub
•Challenge: enable on-the-fly linked data processing for exploratory search
•3 major benefits:
Results freshness
Composite exploration enablement
Fine-grained querying capabilities
COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub Freshness
COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub
• 3.77 millions resources
• 2 resources possible combinations: 14.212.900.000.000
• 3 resources possible combinations: 53.582.633.000.000.000.000
&
Composite interest exploration: knowing my interest for X and Y what can I discover/learn which is related to all these resources?
COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub Fine-grained querying capabilities
• Artists_from_Paris • French_painters --• Impressionist_painters ++
painted --
painted ++
influenced by ++
COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub Fine-grained querying
• 1960s_science_fiction_films• American_epic_films• Films_set_on_the_Moon•
Artificial_intelligence_in_fiction
• Space_adventure_films, …
directed by
cinematography by
COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
31
CONTEXT
RESEARCH QUESTION
RESEARCH - Proposition - Implementation - Operational prototypes - Users evaluations
PUBLICATION
COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub
Refer to publications forthe complete algorithm
COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery HubSpreading activation basis – monocentric
Claude_Monet
…
…
…
…
…
…
Iteration 0 Iteration 1 Iteration 2
COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub
Claude_MonetMusée d’Orsay
Musée de l’Orangerie
…
……
Vincent Van Gogh
Spreading activation basis – polycentric
COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub
Wheeler_School
Art Institute of Chicago Gustave_Courbet
Cadmium_sulfideFarmington_Mountain
DBO:Museum
DBO:ChemicalSubstanceDBO:Mountain
DBO:Artistcat:Impressionist_p
…cat:Alumni_of_beaux
…
2
0 0 0
3 +2
Propagation domain: artist, book, film, museum, river, television show, university,
writer,…
cat:Impressionist painterscat:Alumni_of_Beaux_Arts
DBO:School
COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub
• How to be fast ?How to execute it fastOn very a large graph
COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub Very large graph
Locate the processing
COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub
COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub
1.sparql endpoint = http://xxx/sparql 2.seed(s) = xxx_Beatles
3. compute the propagation domain (w(i,o))(4. find a path between the seeds)5. import path nodes & their neighbors
6. for(i=1; i<=maxPulse; i++){7. pulse 8. if(sampleSize <= maxSampleSize){9. extend the sample10. }11.}
COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub
select distinct ?x ?y where {
service <sparqlEndpoint>
{
select * where {
?a(<…wikiPageWikiLink>| ^<…wikiPageWikiLink>){0,X} :: $path ?b
filter (?a=<resource1> &&?b=<resource2>)
}
}
graph $path {?x ?p ?y}
filter(?x!=<resource1> && ?x!=<resource2>)
}
Path query using Kgram for polycentric SA
COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
41
CONTEXT
RESEARCH QUESTION
RESEARCH - Proposition - Implementation - Operational prototypes - Users evaluations
PUBLICATION
COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub
• Analysis method
Analysis performed on a set of 100.000 queries
COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
0 5000 10000 15000 200000
500
1000
1500
2000
2500
3000
3500
4000
4500
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Triples loading limitMs
KT
Similarity of top 100 results (Kendall-Tau)
from one loading limit to another
maxSampleSize
COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Similarity of top 100 results (shared results, KT) from one iteration to another
maxPulse
1 2 3 4 5 6 7 8 9 100
10
20
30
40
50
60
70
80
90
100
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
KT
shared results
Iterations
Sh
are
d r
es
ult
s
Ke
nd
all
-Ta
u
COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub
0
2000
4000
6000
8000
10000
12000
14000
16000
18000
20000Chart Title
Queries response time histogram
Mil
lise
con
ds
Response time histogram
COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub
Algorithm visualization, available @ http://www.youtube.com/user/wearediscoveryhub
COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub
Polycentric query propagation visualization, iteration 0
• In red: Claude Monet• In blue : Musée d’Orsay• In purple: Recovery
COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub
Polycentric query propagation visualization, iteration 6
• In red: Claude Monet• In blue : Musée d’Orsay• In purple: Recovery
COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Semantic spreading activation
Distance A
Distance B
Gap A - B
Max distance A
Max distance B
poly1 - top 10
1.53 1.68 0.34 / /
poly2 - top 10
1.52 1.66 0.33 / /
poly1 - top 100
1.90 2.12 0.49 2.60 2.60
poly2 - top 100
1.88 2.11 0.48 2.58 2.58
Polycentric
Polycentric queries, average distances of top results from each seed.
COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
• We studied the convergence of the algorithm according to various graph metrics. For this purpose we generated many graphs thanks to the Graphstream graph library, conclusion : the diameter is crucial.
Discovery Hub Influence of graph diameteron algorithm convergence
http://graphstream-project.org/doc/Generators/
COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub
1 2 3 4 5 6 7 8 9 10111213141516171819200
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
diamètre 4.14diamètre 6.72diamètre 9.94diamètre 13.28diamètre 15.43diamètre 19.59diamètre 22.03diamètre 24.87diamètre 28.85
Iterations
Re
su
lts
sim
ila
rity
Influence of graph diameteron algorithm convergence
COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 200
10
20
30
40
50
60
70
80
diamètre 4.14diamètre 6.72diamètre 9.94diamètre 13.28diamètre 15.43diamètre 19.59diamètre 22.03diamètre 24.87diamètre 28.85
Iterations
Ave
rag
e ra
nk
vari
atio
nInfluence of graph diameteron algorithm convergence
COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
• Discovery hub is an exploratory search engine which helps you to discover things you might like or be interested in. It widens your cultural and knowledge horizons by revealing and explaining unattended information.
• It allows performing queries in an innovative way and helps you to navigate rich results. As a hub, it proposes redirections to others platforms to make you benefit from your discoveries (Youtube, Deezer and more). The results are explained in depth thanks to 3 explanatory features.
• Discovery Hub supports simple and composite explorations i.e. starting from one or several items of interest. It proposes and is able to combine advanced exploration modes such as serendipitous, multi-lingual, and fine-grained ones.
Discovery Hub powered
COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
54
CONTEXT
RESEARCH QUESTION
RESEARCH - Proposition - Implementation - Operational prototypes - Users evaluations
PUBLICATION
COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub
1. Start from what you like or are interested in
2. Explore, discover, understand
3. Be redirected on great platforms to experience your
discoveries
powered
Book
Band
Film
COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub V1
COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub V2
COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
• 3 features to understand the results: common properties
Discovery Hub
COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub
• 3 features to understand the results: Wikipedia crossed references
COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub
• 3 features to understand the results: explanatory graph
COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub Internationalization
COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub Serendipitous mode
?
?
?
?
Claude_Monet
…?
…?
?…
?…
?…
?…
Iteration 0 Iteration 1 Iteration 2
COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub Multi-lingual mode
dbpedia:Claude_Monet sameAs fr.dbpedia:Claude_Monet
COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub Fine search mode
1960s_science_fiction_filmsFilms_set_on_the_MoonArtificial_intelligence_in_fictionSpace_adventure_films
Top 4 films
COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub
Directed by Stanley Kubrick
Top 4 films
Fine search mode
COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub
1960s_science_fiction_filmsFilms_set_on_the_MoonArtificial_intelligence_in_fictionSpace_adventure_films
Directed by Stanley Kubrick
Top 4 films
Fine search mode
COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub Multi-criterias mode
COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub
« Surprise mode »
Multi-lingual
Fine-search
Multi-criterias mode
COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub
Demo videos, available @ http://www.youtube.com/user/wearediscoveryhub
COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
70
CONTEXT
RESEARCH QUESTION
RESEARCH - Proposition - Implementation - Operational prototypes - Users evaluations
PUBLICATION
COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub
• Mono-centric queries: evaluated positively on movie domain against another algorithm: the sSVM implemented in MORE movie recommender
Very interesting
Not interestin
g at all
COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Scores for partial lists
Discovery Hub
COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub - Films_about_criticism_and_refusal_of_work- Anti-modernist_films- Fiction_with_unreliable_narrators
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
Neutral Personalized
Interesting
COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub • Poly-centric queries: evaluated positively
0
0.5
1
1.5
2
2.5
3
0
0.5
1
1.5
2
2.5
3
Relevance
Discovery
Very interesting
Not interesting at
allVery
surprizing
Not suprizing at all
COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub
•Recovery between relevance and unexpectedness:
61.6% of results were rated as strongly relevant or relevant by the participants.
65% of results were rated as strongly unexpected or unexpected.
35.42% of results were rated both as strongly relevant or relevant and strongly unexpected or unexpected.
COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub •Explanation features
InW
ik G Ov0
0.5
1
1.5
2
2.5
3
Monocentric Polycentric
Common prop. Wiki-based Graph-based Overall
Very Helpful
Not helpfulAt all
COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Publications:
• Nicolas Marie, Fabien Gandon, Myriam Ribière, Florentin Rodio, Discovery Hub: on-the-fly linked data exploratory search. I-Semantics 2013, Graz, 4 – 6 september (paper).
• Nicolas Marie, Fabien Gandon, Damien Legrand, Myriam Ribière, Exploratory Search on the top of DBpedia chapters with the Discovery Hub Application. ESWC2013, Montpellier, 26 – 30 may (demo+poster).
• Nicolas Marie, Olivier Corby, Fabien Gandon, Myriam Ribière, Composite interests exploration thanks to on-the-fly linked data spreading activation, Hypertext 2013, 1-3 may, Paris (paper).
• Clare J. Hooper, Nicolas Marie, Evangelos Kalampokis, Dissecting the Butterfly: Representation of Disciplines Publishing at the Web Science Conference Series, Web Science 2012, Northeastern university, Evanston, United States, 22-24 june (paper).
• Nicolas Marie, Fabien Gandon. Advanced social objects recommendation in multidimensional social networks. Social Object Workshop 2011, MIT, Boston, USA (paper).
Discovery Hub
COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
78
CONTEXT
RESEARCH QUESTION
RESEARCH - Proposition - Implementation - Operational prototypes - Users evaluation
PUBLICATION
COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub
COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED.
Discovery Hub
ncmarie3&@gmail.com
http://ncmarie.tumblr.com
http://discoveryhub.co
Thank you !