20
Assigning Global Relevance Scores to DBpedia Facts Philipp Langer, Patrick Schulze, Stefan George, Tobias Metzke, Ziawasch Abedjan , Gjergji Kasneci DESWeb 03/31/2014

Assigning Global Relevance Scores to DBpedia Facts

  • Upload
    studs

  • View
    31

  • Download
    0

Embed Size (px)

DESCRIPTION

Assigning Global Relevance Scores to DBpedia Facts. Philipp Langer, Patrick Schulze, Stefan George, Tobias Metzke, Ziawasch Abedjan , Gjergji Kasneci DESWeb 03/31/2014. Structured Data. Advantages of structured data over unstructured data: S earch for explicit facts - PowerPoint PPT Presentation

Citation preview

Page 1: Assigning Global Relevance Scores  to DBpedia Facts

Assigning Global Relevance Scores to DBpedia Facts

Philipp Langer, Patrick Schulze, Stefan George, Tobias Metzke, Ziawasch Abedjan, Gjergji Kasneci

DESWeb 03/31/2014

Page 2: Assigning Global Relevance Scores  to DBpedia Facts

Assigning Global Relevance Scores to DBpedia Facts

2

Structured Data

■ Advantages of structured data over unstructured data:

□ Search for explicit facts□ Summarization of possibly

interesting information□ Automated knowledge discovery

■ Google Knowledge Graph

■ RDF Knowledge bases□ DBpedia, YAGO/NAGA

A handful of salient facts about the query entity.

Page 3: Assigning Global Relevance Scores  to DBpedia Facts

Assigning Global Relevance Scores to DBpedia Facts

3■ Asking for classes to which Albert Einstein belongs

Querying YAGO

Page 4: Assigning Global Relevance Scores  to DBpedia Facts

Assigning Global Relevance Scores to DBpedia Facts

4■ Asking for classes to which Albert Einstein belongs

Querying DBpedia

predicate object

rdf:type owl:Thing

rdf:type dbpedia:Agent

rdf:type dbpedia:Person

rdf:type dbpedia:Scientist

rdf:type umbel:Scientist

rdf:type schema:Person

rdf:type yago:Astronomer109818343

rdf:type foaf:Person

rdf:type 19th-centuryAmericanPeople

rdf:type 19th-centuryGermanPeople

Page 5: Assigning Global Relevance Scores  to DBpedia Facts

Assigning Global Relevance Scores to DBpedia Facts

5

Challengeselect distinct ?p, ?o where { dbpedia:Barack_Obama ?p ?o}

p c

rdf:type owl:Thing

rdf:type dbpedia:Person

rdf:type yago:Person100007846

... ...

rdf:type dbpedia:Politician

... ...

dbpedia:spouse dbpedia:Michelle_Obama

Web Documents

p c

owl:orderInOffice President of the United States

dbpedia:type dbpedia:Politician

dbpedia:spouse dbpedia:Michelle_Obama

owl:birthPlace dbpedia:Honolulu

dbpprop:residence dbpedia:White_House

.... .....

rdf:type owl:Thing

Page 6: Assigning Global Relevance Scores  to DBpedia Facts

Assigning Global Relevance Scores to DBpedia Facts

6

Challenges

Big DataDBpedia 3.8,

ClueWeb corpus

ArchitectureText extraction, score

computation/ranking, query processing

EvaluationConduction of user studies

Ranking StrategiesImrove the ranking results

Page 7: Assigning Global Relevance Scores  to DBpedia Facts

Assigning Global Relevance Scores to DBpedia Facts

Overview7

Languages

• Python

• Java

• SPARQL

• JavaScript

Frameworks:

• Django

• Lucene

Web application (Django)

DBpedia Endpoint

(Apache Jena)Application Data

(Postgres)Web corpus

(Lucene Index)

User StudiesQuerying

Ranking strategiesRanking strategies

Intra DBpedia

strategies

Web Corpus

strategies

7

Page 8: Assigning Global Relevance Scores  to DBpedia Facts

Assigning Global Relevance Scores to DBpedia Facts

8

Ranking Facts

■ Query types:□ Subject queries - return all physicists

□ Property queries - return all facts related to Einstein

■ Ranking strategies□ Ranking by frequency and document frequency□ Ranking by information diversity□ Random walk□ Web-based co-occurrence statistics

SELECT ?p ?o { Albert_Einstein ?p ?o }

SELECT ?s { ?s type Physicist }

Page 9: Assigning Global Relevance Scores  to DBpedia Facts

Assigning Global Relevance Scores to DBpedia Facts

9

Ranking by frequency and document frequency

<Albert_Einstein>

<topic> <Nobel_laureates>;<topic> <Theoretical_physicists>;<topic> <German_physicists>;<topic> <American_inventors>;<type> <Scientist>;<type> <Person>;<type> <Thing>;<residence> "Switzerland";<residence> "Austria-Hungary";<residence> "German Empire";<spouse> "Mileva Maric";...

subject document of „Albert Einstein“

<Newton> <topic> <Theoretical_physicists>.<Newton> <topic> <Nobel_laureates>.<Newton> <topic> <Mathematicians>.<Newton> <topic> <Optical_physicists>.<Newton> <topic> <History_of_calculus>.<Newton> <topic> <English_alchemists>.

<Einstein> <topic> <Theoretical_physicists>.<Einstein> <topic> <Nobel_laureates>.<Einstein> <topic> <German_physicists>.

<Einstein> <topic> <American_inventors>.

predicate document of „topic“

<Isaac_Newton> <topic> <Theoretical_physicists>.<Albert_Einstein> <topic> <Theoretical_physicists>.<Bruno_Coppi> <topic> <Theoretical_physicists>.<Ravi_Gomatam> <topic> <Theoretical_physicists>. ...

object document of „Theoretical physicists“

[Shady et al ESWC’11]

Page 10: Assigning Global Relevance Scores  to DBpedia Facts

Assigning Global Relevance Scores to DBpedia Facts

10

Ranking by frequency and document frequency

■ Subject queries:□ Global relevance

Isaac NewtonacademicAdvisor ...;birthDate ...;birthPlace ...;comment ...;ethnicity ...;field ...;influenced ...;influencedBy ...;knownFor ...;label ...;notableStudent ...;subject ...;subject ...;type ...;

Ravi Gomatam

subject ...;subject ...;subject ...;subject ...;subject ...;

Page 11: Assigning Global Relevance Scores  to DBpedia Facts

Assigning Global Relevance Scores to DBpedia Facts

11

Limitations for Property Queries

■ Property queries:□ Global relevant but distinctive to the given subject

– type Person vs. type Scientist

Page 12: Assigning Global Relevance Scores  to DBpedia Facts

Assigning Global Relevance Scores to DBpedia Facts

12

Ranking by diversity

■ Following a probabilistic model□ Property queries:

– Properties and objects that are as discriminative as possible

□ Subject queries:

Page 13: Assigning Global Relevance Scores  to DBpedia Facts

Assigning Global Relevance Scores to DBpedia Facts

13

Random Walk Model

■ Consider the knowledge base as a directed graph□ Already applied in [Kasneci CIKM’09]□ Problem: literals have no outgoing link

■ Use Wiki Pagelinks and Infobox Property Mappings□ Entities with high indegree, such as countries, are favored

– Good for subject queries– Bad for property queries

Page 14: Assigning Global Relevance Scores  to DBpedia Facts

Assigning Global Relevance Scores to DBpedia Facts

14Web Documents

Co-occurrence statistics

■ Lemur Project Clueweb09 Category-B web corpus□ 50 million web documents (1.5 TB)□ Only English-language documents□ Includes approx. 2.7 million Wikipedia articles

■ Create an inverted index■ Consider different word distance limits as documents■ Rank subject-object pairs

□ „Albert Einstein“ and „Physicist“□ Store only pairwise co-occurrence:□ Compute frequency of s:

Page 15: Assigning Global Relevance Scores  to DBpedia Facts

Assigning Global Relevance Scores to DBpedia Facts

15

Evaluation

■ User study 1□ 8 queries□ all results□ 12 users□ 19 approaches/

configurations

■ 1-4: irrelevant- highly relevant

■ User study 2□ 8+20 queries□ top-10 results of best 4

approaches side-by-side 10 users

□ Best 3 approaches from user study 1

Page 16: Assigning Global Relevance Scores  to DBpedia Facts

Assigning Global Relevance Scores to DBpedia Facts

16

Top 4 Approaches in User study 1

Page 17: Assigning Global Relevance Scores  to DBpedia Facts

Assigning Global Relevance Scores to DBpedia Facts

17

User study 2

Page 18: Assigning Global Relevance Scores  to DBpedia Facts

Assigning Global Relevance Scores to DBpedia Facts

18

Results Example:Theoretical Physicists

SubjectAlbert EinsteinIsaac NewtonGalileo GalileiJames Clerk MaxwellRichard FeynmanStephen HawkingMax PlanckEnrico FermiWerner HeisenbergPierre-Simon Laplace

DBpedia Random Walk Model

Page 19: Assigning Global Relevance Scores  to DBpedia Facts

Assigning Global Relevance Scores to DBpedia Facts

19

Results Example: Albert Einstein

DBpedia Co-occurrence statistics

predicate object

rdf:type owl:Thing

rdf:type dbpedia:Agent

rdf:type dbpedia:Person

rdf:type dbpedia:Scientist

rdf:type umbel:Scientist

rdf:type schema:Person

rdf:type yago:Astronomer109818343

rdf:type foaf:Person

rdf:type 19th-centuryAmericanPeople

rdf:type 19th-centuryGermanPeople

predicate object

fields Physics

field Physics

deathPlace United States

placeOfDeath United States

shortDescription Physicists

description Physicist

type Scientist

ethnicity Jewish

subject Einstein family

residence Switzerland

Page 20: Assigning Global Relevance Scores  to DBpedia Facts

Assigning Global Relevance Scores to DBpedia Facts

20

Conclusions

■ Investigated multiple approaches to rank DBpedia facts□ Information theory, statistical reasoning, random walk, and co-

occurrence statistics in web documents■ DBpedia Knowledge base already provides enough information to

improve the ranking of results■ Improvement of property queries through web-based co-

occurrence statistics■ We provide the annotated datasets at

□ https://www.hpi.uni-potsdam.de/naumann/sites/dbpedia/