22
+ Walking Linked Data: a graph traversal approach to explain clusters Ilaria Tiddi, Mathieu d’Aquin, Enrico Motta s

Walking Linked Data: a graph traversal approach to explain clusters

  • Upload
    i-tiddi

  • View
    474

  • Download
    0

Embed Size (px)

DESCRIPTION

My slides at the Consuming Linked Data workshop (COLD2014) at ISWC2014

Citation preview

Page 1: Walking Linked Data: a graph traversal approach to explain clusters

s

+

Walking Linked Data:a graph traversal approach to explain clusters Ilaria Tiddi, Mathieu d’Aquin, Enrico Motta

Page 2: Walking Linked Data: a graph traversal approach to explain clusters

+Problem: explaining patterns

Data: women/men literacy rate from UNESCO [1]

In which countries are men more educated than women?

Page 3: Walking Linked Data: a graph traversal approach to explain clusters

+Problem: explaining patterns

Data: women/men literacy rate from UNESCO [1]

In which countries are men more educated than women?

The yellow countries ( )

Education : Men Women Equal

Page 4: Walking Linked Data: a graph traversal approach to explain clusters

+Problem: explaining patterns

Data: women/men literacy rate from UNESCO [1]

In which countries are men more educated than women?

Education : Men Women Equal

How do you know?

Page 5: Walking Linked Data: a graph traversal approach to explain clusters

+Problem: explaining behaviors

We explain thanks to our own (background) knowledge.

Can we do the same with the knowledge from Linked Data?

Page 6: Walking Linked Data: a graph traversal approach to explain clusters

+Linked Data contain explanations

but where?

:India

:UK

:Ethiopia

:US

:Somalia

Page 7: Walking Linked Data: a graph traversal approach to explain clusters

+Linked Data contain explanations

but where?

:India

:UK

:Ethiopia

:US

:Somalia db:Somalia

db:Ethiopia

db:India

db:UK

db:US

sameAs

sameAs

sameAs

sameAs

sameAs

Page 8: Walking Linked Data: a graph traversal approach to explain clusters

+Linked Data contain explanations

but where?

:India

:UK

:Ethiopia

:US

:Somalia db:Somalia

db:Ethiopia

db:India

db:UK

db:US

sameAs

dc:subject

dc:subject

dc:subject

sameAs

sameAs

sameAs

sameAs

dc:subject

dc:subject

Page 9: Walking Linked Data: a graph traversal approach to explain clusters

+Linked Data contain explanations

but where?

:India

:UK

:Ethiopia

:US

:Somalia db:Somalia

db:Ethiopia

db:India

db:UK

db:US

db:Category:LeastDevelopedCountries

db:Category:LiberalCountries

sameAs

dc:subject

dc:subject

dc:subject

sameAs

sameAs

sameAs

sameAs

dc:subject

dc:subject

skos:relatedMatch

skos:relatedMatch

skos:relatedMatch

Page 10: Walking Linked Data: a graph traversal approach to explain clusters

+Linked Data contain explanations

but where?

:India

:UK

:Ethiopia

:US

:Somalia db:Somalia

db:Ethiopia

db:India

db:UK

db:US

600/pp

3,800/pp

36,000/ppdbp:gdp

49,000/pp

1,200/pp

dbp:gdpsameAs

sameAs

sameAs

sameAs

sameAs

dbp:gdp

dbp:gdp

dbp:gdp

Page 11: Walking Linked Data: a graph traversal approach to explain clusters

+Linked Data contain explanations

but where?

:India

:UK

:Ethiopia

:US

:Somalia db:Somalia

db:Ethiopia

db:India

db:UK

db:US

600/pp

3,800/pp

36,000/pp

3,800/pp

36,000/pp

49,000/pp ≥

1,200/pp≤

sameAs

sameAs

sameAs

sameAs

sameAs

dbp:gdp

dbp:gdp

dbp:gdp

dbp:gdp

dbp:gdp

Page 12: Walking Linked Data: a graph traversal approach to explain clusters

+Looking for explanations in graph

:India

:Ethiopia

:SomaliasameAs  

4,000/pp

cat:LeastDeveloped

Countries

Given a graph of Linked Data where URI are nodes RDF properties are edges

sameAs

sameAs

dc:subject

dbp:gdp

dc:subject

dbp:gdp

dc:subject

dbp:gdp

skos:related

skos:related

Page 13: Walking Linked Data: a graph traversal approach to explain clusters

+Looking for explanations in graph

:India

:Ethiopia

:SomaliasameAs  

4,000/pp

cat:LeastDeveloped

Countries

sameAs

sameAs

dc:subject

dbp:gdp

dc:subject

dbp:gdp

dc:subject

dbp:gdp

skos:related

skos:related

Find the ending value most pointed by entities in the

cluster the best path in order to further expand the graph

Page 14: Walking Linked Data: a graph traversal approach to explain clusters

+A* algorithm for Linked Data Best-first search algorithm

Given an initial node and a final node

find the least expensive path between them

Path cost function f(path) = actual cost g(path)+ future cost h(path)

Without knowledge of the graph

Search in the graph for the best path and explanation

The graph is iteratively build by URI dereferencing

No need to know the Linked Data graph a priori

Page 15: Walking Linked Data: a graph traversal approach to explain clusters

+Dedalo: an A* process for Linked Data

Building graph(URI dereferencing)

Choosing thebest path

Finding thebest explanation

Iteratively building a Linked Data graph and looking for an explanation of the pattern

Page 16: Walking Linked Data: a graph traversal approach to explain clusters

+Dedalo: an A* process for Linked Data

Dereference URIs through HTTP GET

take an entity

read its properties and values

add them to the graph

db:Ethiopia

db:Ethiopiadb:Category:AfricanCountries

dc:subject

1,200dbp:gdp

:India

:India

:India

:India

:India

db:Ethiopia

owl:sameAs

……

Page 17: Walking Linked Data: a graph traversal approach to explain clusters

+Dedalo: an A* process for Linked Data

Dereference URIs through HTTP GET

take an entity

read its properties and values

add them to the graph

db:Ethiopia

db:Ethiopiadb:Category:AfricanCountries

dc:subject

1,200dbp:gdp

:India

:India

:India

:India

:India

db:Ethiopia

db:Category:AfricanCountriesdc:subject

1,200dbp:gdp

owl:sameAs

……

Page 18: Walking Linked Data: a graph traversal approach to explain clusters

+Dedalo: an A* process for Linked DataCollect new paths (sequences of edges)

add the new property to the previous pathowl:sameAsdc:subject

owl:sameAsdbp:gdp

evaluate new paths with Entropy1

ent(owl:sameAsdc:subject)

ent(owl:sameAsdbp:gdp)

add to the pile of paths (the first one is chosen)owl:sameAsdc:subject

owl:sameAsdbp:gdp

owl:sameAs

[1] Tiddi et al., ESWC 2014

:India

:India

:India

:India

:India

……

Page 19: Walking Linked Data: a graph traversal approach to explain clusters

+Dedalo: an A* process for Linked DataBuild explanations (path + final nodes)

Each of the values the new path points to e1= owl:sameAsdc:subject e2= owl:sameAsdc:subject

Compare numerical value if the property is a datatype e2= owl:sameAsdc:gdp ≥ e3= owl:sameAsdc:gdp ≤ 1,200

Rank explanations according to the

F-Measure

db:Category:SouthAsianCountries

1,200

initial URIs (countries)

URIs pointing to

URIs in

1,200

db:Category:AfricanCountries

Page 20: Walking Linked Data: a graph traversal approach to explain clusters

+Dedalo: experiments

Countries where men are more educated than women

skos:exactMatchdbp:hdiRank ≥ 126 87.8% 197”

skos:exactMatchdc:subject db:Category:Least_Developed_Countries

74.7% 524’’

skos:exactMatchdbp:gdpPPPPerCapitaRank ≥ 89

68.3% 269”

Countries where women are more educated than men

skos:exactMatchdbp:hdiRank ≤ 119 63.4% 198”

skos:exactMatchdbp:gdpPPPPerCapitaRank ≤ 56

62.3% 236’’

Countries where education is equal

skos:exactMatchdbp:gdpPPPRank ≥ 64 62.0% 234”

skos:exactMatchdbp:gdpPPPPerCapitaRank ≥ 29

61.0% 268’’

Page 21: Walking Linked Data: a graph traversal approach to explain clusters

+Conclusions and future work Dedalo, A* process to search explanation within Linked

Data From a pattern to explain Finds the path to the best explanation Using Entropy and F-Measure

Focusing on the bias introduced by incomplete data2

Combining atomic explanations3

Evaluating Dedalo on a large use case: Google Trends

[2, 3] Tiddi et al., EKAW 2014

Page 22: Walking Linked Data: a graph traversal approach to explain clusters

s

+

Thanks! Questions?