27
Dedalo: looking for Clusters’ Explanations in a Labyrinth of Linked Data Ilaria Tiddi, Mathieu d’Aquin, Enrico Motta Knowledge Media Institute, The Open University May 28, 2014

Dedalo, looking for Cluster Explanations in a labyrinth of Linked Data

  • Upload
    i-tiddi

  • View
    91

  • Download
    0

Embed Size (px)

DESCRIPTION

Presentation of Dedalo at the Extended Semantic Web conference 2014 in Crete (ESWC2014)

Citation preview

Page 1: Dedalo, looking for Cluster Explanations in a labyrinth of Linked Data

Dedalo: looking for Clusters’ Explanations in aLabyrinth of Linked Data

Ilaria Tiddi, Mathieu d’Aquin, Enrico Motta

Knowledge Media Institute, The Open University

May 28, 2014

Page 2: Dedalo, looking for Cluster Explanations in a labyrinth of Linked Data

The Knowledge Discovery process

• Explaining patterns requires background knowledge.

• Background knowledge is attributed to the experts.

• Background knowledge comes from different domains.

• Experts might not be aware of some background knowledge.

Page 3: Dedalo, looking for Cluster Explanations in a labyrinth of Linked Data

Explaining clusters: an example

Authors clustered according to the papers they wrote together.

How to explain those clusters?

Page 4: Dedalo, looking for Cluster Explanations in a labyrinth of Linked Data

Explaining clusters – the easy solution

Use an expert

“each cluster represents a research group in KMi ”

Can one trust those experts?

Page 5: Dedalo, looking for Cluster Explanations in a labyrinth of Linked Data

Explaining clusters – the easy solution

Use an expert

“each cluster represents a research group in KMi ”

Can one trust those experts?

Page 6: Dedalo, looking for Cluster Explanations in a labyrinth of Linked Data

Explaining clusters – the easy solution

Use an expert

“each cluster represents a research group in KMi ”

Can one trust those experts?

Page 7: Dedalo, looking for Cluster Explanations in a labyrinth of Linked Data

Explaining clusters – the nice solution

Use Inductive Logic Programming (ILP)

E+ (positive examples) E− (negative examples)

attendsESWC(M.dAquin).attendsESWC(E.Motta).

attendsESWC(V.Lopez).

B: knowledge about E = E+ ∪ E−submitted(M.dAquin). submitted(V.Lopez).

submitted(E.Motta).accepted(V.Lopez). accepted(M.dAquin).

Learn a complete (B ∪H � E+) and consistent (B ∪H 2 E−)explanation for the relation attendsESWC(X).

attendsESWC(X) <- submitted(X)∧accepted(X)

Page 8: Dedalo, looking for Cluster Explanations in a labyrinth of Linked Data

Explaining clusters – still the nice solution

E+ (positive examples) E− (negative examples)

inMyCluster(M.dAquin).inMyCluster(M.Fernandez).

inMyCluster(V.Lopez).inMyCluster(H.Saif).

inMyCluster(M.Sabou).inMyCluster(C.Pedrinaci).inMyCluster(J.Domingue).

B: knowledge about E = E+ ∪ E−

B?

inMyCluster(X) <– ?

Page 9: Dedalo, looking for Cluster Explanations in a labyrinth of Linked Data

Explaining clusters – the cool solution

Integrate ILP with Linked Data

Page 10: Dedalo, looking for Cluster Explanations in a labyrinth of Linked Data

Explaining clusters – the cool solution

E+ (positive examples) E− (negative examples)

inMyCluster(M.dAquin).inMyCluster(M.Fernandez).

inMyCluster(V.Lopez).inMyCluster(H.Saif).

inMyCluster(M.Sabou).inMyCluster(C.Pedrinaci).inMyCluster(J.Domingue).

B: knowledge about E = E+ ∪ E−topic(M.dAquin, SemanticWeb). topic(M.Sabou, SemanticWeb).

topic(V.Lopez, SemanticWeb). topic(H.Saif, SocialWeb).topic(C.Pedrinaci, SemanticWebServices).topic(J.Domingue, SemanticWebServices).

topic(M.Fernandez, SocialWeb).

inMyCluster(X) <- topic(X,SemanticWeb)

Is this enough?

Page 11: Dedalo, looking for Cluster Explanations in a labyrinth of Linked Data

Producing Linked Data Explanations

on similar topicsPeople working in the same place are likely to write papers together.

on the same project

Page 12: Dedalo, looking for Cluster Explanations in a labyrinth of Linked Data

Producing Linked Data Explanations

People workingunder the same person

are likely to write papers together.with the same partner

Page 13: Dedalo, looking for Cluster Explanations in a labyrinth of Linked Data

Producing Linked Data Explanations

People working under people interested in the same thing write papers together.

Page 14: Dedalo, looking for Cluster Explanations in a labyrinth of Linked Data

Integrating ILP and Linked Data

Add to B each Linked Data explanation hi = 〈pk〉.〈vk〉*,where:

• pk (path): a chain of RDF propertiespk = {prop0 → prop1 → . . .→ propn}

• vk (value): a final instance

• roots(hi ): elements ∈ Ci having hi in commonroots(hi )={ou:M.dAquin, ou:V.Lopez, ou:M.Sabou}

*spread across different datasets

hi = 〈ou:project→ou:ledBy→foaf:topic〉pk .〈edu:SemanticWeb〉vk

Building each hi :– how?– which chains of properties?– where to find the good ones?

Page 15: Dedalo, looking for Cluster Explanations in a labyrinth of Linked Data

Dedalo – An iterative Linked Data traversal

Scoring hypotheses

WRacc1(hi ) = |roots(hi )|

|R|

(|roots(hi )∩Ci ||roots(hi )| −

|Ci ||R|

)

1 Geng et al. (2006). Interestingness measures for data mining: A survey.

Page 16: Dedalo, looking for Cluster Explanations in a labyrinth of Linked Data

Dedalo – An iterative Linked Data traversal

How to define the interestingness of a path pk?How to reach the best hi in the shortest time?

Page 17: Dedalo, looking for Cluster Explanations in a labyrinth of Linked Data

Dedalo – Comparing Heuristics

• We chose to compare different strategies.

• We want to find the path pk leading to the best hi in the shortest time.

• We want to save time and computational complexity

Path Length length of pk in number of properties composing itPath Frequency frequency of the paths in the graph

Adapted PMI joint and individual distribution of pk and CiAdapted TF–IDF how important is pk (term) in Ci (doc)

Delta |vals(pk)| ≈ |C|Entropy2 distribution of |vals(pk)|

Conditional Entropy distribution of |vals(pk)| w.r.t. Ci

2Shannon, C. (1948). A Mathematical Theory of Communication.

Page 18: Dedalo, looking for Cluster Explanations in a labyrinth of Linked Data

Dedalo’s Heuristics

Ci={ou:M.dAquin, ou:V.Lopez, ou:M.Sabou}

Path Frequency top(pk)=〈foaf:topic〉

Page 19: Dedalo, looking for Cluster Explanations in a labyrinth of Linked Data

Dedalo’s Heuristics

Ci={ou:M.dAquin, ou:V.Lopez, ou:M.Sabou}

Adapted TF–IDF top(pk)=〈ou:exMember〉

Page 20: Dedalo, looking for Cluster Explanations in a labyrinth of Linked Data

Dedalo’s Heuristics

Ci={ou:M.dAquin, ou:V.Lopez, ou:M.Sabou}

Entropy top(pk)=〈ou:project→ou:ledBy→foaf:topic〉

Page 21: Dedalo, looking for Cluster Explanations in a labyrinth of Linked Data

Experiments – KMi co-authorship

• Authors clustered according to their co-authorships.

• Network Partitioning clustering, |R|=92, |C|= 6

Cycles

Wra

cc

0 5 10 15

0.00

0.04

0.08

0.12

Semantic Web authorsLenFqDEntC.EntTFIDFPMI

Cycles

Wra

cc0 5 10 15

0.00

0.04

0.08

0.12

Learning Analytics authorsLenFqDEntC.EntTFIDFPMI

|Ci | hi WRacc

22 〈org:hasMembership→ox:hasPrincipal-0.128

Investigator→org:hasMembership〉p.〈ou:SmartProducts〉v123 〈org:hasMembership→ox:hasPrincipalInvestigator

0.127→org:hasMembership〉p.〈ou:SocialLearn〉v2

Page 22: Dedalo, looking for Cluster Explanations in a labyrinth of Linked Data

Experiments – KMi Publications

• Papers clustered according to their keywords.

• XK-Means clustering, |R|=865, |C|= 6

Cycles

Wra

cc

0 2 4 6 8 10

0.00

0.01

0.02

0.03

0.04

0.05

Learning Analytics papersLenFqDEntC.EntTFIDFPMI

Cycles

Wra

cc0 2 4 6 8 10

0.00

0.02

0.04

0.06

0.08

0.10

Semantic Web papersLenFqDEntC.EntTFIDFPMI

|Ci | hi WRacc

601 〈dc:creator→ntag:isRelatedTo〉p.〈ou:LearningAnalytics〉v1 0.042220 〈dc:creator→ntag:isRelatedTo〉p.〈ou:SemanticWeb〉v2 0.073

Page 23: Dedalo, looking for Cluster Explanations in a labyrinth of Linked Data

Experiments –Huddersfield’s dataset

• Books clustered according to the students’ Faculties.

• K-Means clustering, |R|=6969, |C|= 14

Cycles

Wra

cc

0 5 10 15

0.00

00.

001

0.00

20.

003

0.00

40.

005 Music students' borrowings

LenFqDEntC.EntTFIDFPMI

Cycles

Wra

cc

0 5 10 150.

000

0.00

50.

010

0.01

5 Theatre students' borrowingsLenFqDEntC.EntTFIDFPMI

|Ci | hi WRacc

335 〈dc:subject→skos:broader〉p1 .〈lcsh:PhysicalScience〉v 0.005919 〈dc:creator→bl:hasCreated→dc:subject〉p2 .〈bl:EnglishDrama〉v 0.013

Page 24: Dedalo, looking for Cluster Explanations in a labyrinth of Linked Data

Experiments – Comparing heuristics

Heuristics speed comparison in seconds.

KMiA1 KMiA2 KMiP1 KMiP2 Hud1 Hud2Len 1.64 4.15 8.95 9.01 69.13 135.5Freq 2.57 4.35 7.5 9.29 180 180PMI 2.05 3.88 11.28 18.42 180 180

TF–IDF 1.69 3.18 10.61 17.19 180 180Delta 2.02 3.92 180 180 180 180

Entropy 4.19 3.27 7.1 7.3 41.15 105.09Conditional Entropy 2.64 3.89 7.48 7.55 70.91 40.89

/ – Len, Freq : fast but inaccurate baselines

, – Entropy/Conditional Entropy: outperforming measures,reducing redundancy (following wrong paths) and time efforts

/ – PMI , TFIDF, Delta : they might work on less homogeneousclusters

Page 25: Dedalo, looking for Cluster Explanations in a labyrinth of Linked Data

Conclusions

• Linked Data – automatically explaining clusters

• Dedalo – traversing Linked Data to reveal explanations

• Entropy – driving the search in the Linked Data cloud

Beyond Dedalo.Dedalo works as far as there is a limited domain.New use-cases require its extension.

Page 26: Dedalo, looking for Cluster Explanations in a labyrinth of Linked Data

Future work: the OU students enrolment dataset

• Add sameAs linking

• Use of literals

• Aggregation of atomic rules

• Explore new hypotheses evaluation measures

Page 27: Dedalo, looking for Cluster Explanations in a labyrinth of Linked Data

Thanks for your attention!3

[email protected]@open.ac.uk

[email protected]

Questions?Better asking the robot than the experts

3Special thanks to the KMi (happy) faces.