43
COLIBRI Unsupervised Link Discovery Through Knowledge Base Repair Axel-Cyrille Ngonga Ngomo Mohamed Ahmed Sherif Klaus Lyko ESWC 2014, Crete, Greece

Colibri - Unsupervised Link Discovery through Knoeledge Base Repare

Embed Size (px)

Citation preview

Page 1: Colibri - Unsupervised Link Discovery through Knoeledge Base Repare

COLIBRIUnsupervised Link Discovery Through Knowledge Base

Repair

Axel-Cyrille Ngonga Ngomo Mohamed Ahmed Sherif Klaus Lyko

ESWC 2014, Crete, Greece

Page 2: Colibri - Unsupervised Link Discovery through Knoeledge Base Repare

Outline Motivation Approach Evaluation Conclusion and Future Work

Outline

1 Motivation

2 Approach

3 Evaluation

4 Conclusion and Future Work

Ngonga Ngomo· Sherif · Lyko Colibri 2/24

Page 3: Colibri - Unsupervised Link Discovery through Knoeledge Base Repare

Outline Motivation Approach Evaluation Conclusion and Future Work

Why Link Discovery?

1 Fourth principle

2 Links are central for• Cross-ontology QA• Data Integration• Reasoning• Federated Queries• ...

Ngonga Ngomo· Sherif · Lyko Colibri 2/24

Page 4: Colibri - Unsupervised Link Discovery through Knoeledge Base Repare

Outline Motivation Approach Evaluation Conclusion and Future Work

Why is it difficult?• Time complexity

• Large number of triples• Quadratic runtime

• Complexity of specifications• Combination of several attributes required for high precision• Tedious discovery of most adequate mapping• Dataset-dependent similarity functions

http://saim.aksw.org

Ngonga Ngomo· Sherif · Lyko Colibri 3/24

Page 5: Colibri - Unsupervised Link Discovery through Knoeledge Base Repare

Outline Motivation Approach Evaluation Conclusion and Future Work

Solution

1 Use unsupervised link discovery• No need for training data• Minimizes load on user

2 Combine results of linking tasksover n > 2 knowledge bases

• Make explicit use of the topologyof the Data Web

3 Repair noisy data to improve linkdiscovery

• Address different quality ofdatasets across the Data Web

Ngonga Ngomo· Sherif · Lyko Colibri 4/24

Page 6: Colibri - Unsupervised Link Discovery through Knoeledge Base Repare

Outline Motivation Approach Evaluation Conclusion and Future Work

Outline

1 Motivation

2 Approach

3 Evaluation

4 Conclusion and Future Work

Ngonga Ngomo· Sherif · Lyko Colibri 5/24

Page 7: Colibri - Unsupervised Link Discovery through Knoeledge Base Repare

Outline Motivation Approach Evaluation Conclusion and Future Work

Colibri overview

K1 K2. . . Kn

Unsupervised LD

Voting

Mappings

Repair

Voting matrices

Repair instances

Ngonga Ngomo· Sherif · Lyko Colibri 5/24

Page 8: Colibri - Unsupervised Link Discovery through Knoeledge Base Repare

Outline Motivation Approach Evaluation Conclusion and Future Work

Colibri overview

K1 K2. . . Kn

Unsupervised LD

Voting

Mappings

Repair

Voting matrices

Repair instances

Ngonga Ngomo· Sherif · Lyko Colibri 5/24

Page 9: Colibri - Unsupervised Link Discovery through Knoeledge Base Repare

Outline Motivation Approach Evaluation Conclusion and Future Work

Colibri overview

K1 K2. . . Kn

Unsupervised LD

Voting

Mappings

Repair

Voting matrices

Repair instances

Ngonga Ngomo· Sherif · Lyko Colibri 5/24

Page 10: Colibri - Unsupervised Link Discovery through Knoeledge Base Repare

Outline Motivation Approach Evaluation Conclusion and Future Work

Colibri overview

K1 K2. . . Kn

Unsupervised LD

Voting

Mappings

Repair

Voting matrices

Repair instances

Ngonga Ngomo· Sherif · Lyko Colibri 5/24

Page 11: Colibri - Unsupervised Link Discovery through Knoeledge Base Repare

Outline Motivation Approach Evaluation Conclusion and Future Work

Colibri overview

K1 K2. . . Kn

Unsupervised LD

Voting

Mappings

Repair

Voting matrices

Repair instances

Ngonga Ngomo· Sherif · Lyko Colibri 5/24

Page 12: Colibri - Unsupervised Link Discovery through Knoeledge Base Repare

Outline Motivation Approach Evaluation Conclusion and Future Work

Colibri overview

K1 K2. . . Kn

Unsupervised LD

Voting

Mappings

Repair

Voting matrices

Repair instances

Ngonga Ngomo· Sherif · Lyko Colibri 5/24

Page 13: Colibri - Unsupervised Link Discovery through Knoeledge Base Repare

Outline Motivation Approach Evaluation Conclusion and Future Work

Key Concepts

• Mapping matrix

• M12 =

1 0 00 1 00 0 0

ex2:1

ex2:2

ex2:3

ex1:1

ex1:2

ex1:3

1

1

K2K1

Ngonga Ngomo· Sherif · Lyko Colibri 6/24

Page 14: Colibri - Unsupervised Link Discovery through Knoeledge Base Repare

Outline Motivation Approach Evaluation Conclusion and Future Work

Key Concepts

• Pseudo-F-measure as objective function

• P(Mij) =|links(Ki ,Mij )|+|links(Kj ,Mij )|

2|Mij |

• R(Mij) =|links(Ki ,Mij )|+|links(Kj ,Mij )|

|Ki |+|Kj |

• Fβ = (1 + β2) PRβ2P+R

Example:

• P(M12) = 1

• R(M12) = 23

• F1(M12) = 45

ex2:1

ex2:2

ex2:3

ex1:1

ex1:2

ex1:3

1

1

K2K1

Ngonga Ngomo· Sherif · Lyko Colibri 7/24

Page 15: Colibri - Unsupervised Link Discovery through Knoeledge Base Repare

Outline Motivation Approach Evaluation Conclusion and Future Work

Step 1: Unsupervised Link Discovery

K1 K2. . . Kn

Unsupervised LD

VotingRepair

Mappings

Voting matrices

Repair instances

Ngonga Ngomo· Sherif · Lyko Colibri 8/24

Page 16: Colibri - Unsupervised Link Discovery through Knoeledge Base Repare

Outline Motivation Approach Evaluation Conclusion and Future Work

Step 1: Unsupervised Link Discovery

• Link all pairs (Ki ,Kj) using any unsupervised link discoveryapproach

• Here, Euclid• Specifications are points in a similarity space• Find accurate specification by using hierarchical grid search• Detect specification which maximizes Fβ

Ngonga Ngomo· Sherif · Lyko Colibri 9/24

Page 17: Colibri - Unsupervised Link Discovery through Knoeledge Base Repare

Outline Motivation Approach Evaluation Conclusion and Future Work

Step 1: Unsupervised Link Discovery

• Mapping matrices

• M12 =

1 0 00 1 00 0 0

• M13 =

1 0 10 0.5 00 0 0.5

• M23 =

1 0 00 0.5 00 0 0.5

ex1:1 ex1:2 ex1:3

ex2:1

ex2:2

ex2:3

ex3:1

ex3:2

ex3:3

1

1

11

0.5

0.51

0.5

0.5

K3

K1

K2

Ngonga Ngomo· Sherif · Lyko Colibri 10/24

Page 18: Colibri - Unsupervised Link Discovery through Knoeledge Base Repare

Outline Motivation Approach Evaluation Conclusion and Future Work

Step 2: Voting

K1 K2. . . Kn

Unsupervised LD

VotingRepair

Mappings

Voting matrices

Repair instances

Ngonga Ngomo· Sherif · Lyko Colibri 11/24

Page 19: Colibri - Unsupervised Link Discovery through Knoeledge Base Repare

Outline Motivation Approach Evaluation Conclusion and Future Work

Step 2: Voting

Ki s

Kjt

1

Kk z

1

1

Ngonga Ngomo· Sherif · Lyko Colibri 12/24

Page 20: Colibri - Unsupervised Link Discovery through Knoeledge Base Repare

Outline Motivation Approach Evaluation Conclusion and Future Work

Step 2: Voting

Ki s

Kjt

1

Kk z

1

1

Ngonga Ngomo· Sherif · Lyko Colibri 12/24

Page 21: Colibri - Unsupervised Link Discovery through Knoeledge Base Repare

Outline Motivation Approach Evaluation Conclusion and Future Work

Step 2: Voting

Ki s

Kjt

1

Kk z

1

1

Ngonga Ngomo· Sherif · Lyko Colibri 12/24

Page 22: Colibri - Unsupervised Link Discovery through Knoeledge Base Repare

Outline Motivation Approach Evaluation Conclusion and Future Work

Step 2: Voting

• Vij = 1n−1

Mij +n∑

k=1k 6=i ,j

MikMkj

• Mapping matrices

• M12 =

1 0 00 1 00 0 0

• M13 =

1 0 10 0.5 00 0 0.5

• M23 =

1 0 00 0.5 00 0 0.5

ex1:1 ex1:2 ex1:3

ex2:1

ex2:2

ex2:3

ex3:1

ex3:2

ex3:3

1

1

11

0.5

0.51

0.5

0.5

K3

K1

K2

V12 =

1 0 0.250 0.625 00 0 0.125

Ngonga Ngomo· Sherif · Lyko Colibri 13/24

Page 23: Colibri - Unsupervised Link Discovery through Knoeledge Base Repare

Outline Motivation Approach Evaluation Conclusion and Future Work

Step 2: Voting

• Vij = 1n−1

Mij +n∑

k=1k 6=i ,j

MikMkj

• Mapping matrices

• M12 =

1 0 00 1 00 0 0

• M13 =

1 0 10 0.5 00 0 0.5

• M23 =

1 0 00 0.5 00 0 0.5

ex1:1 ex1:2 ex1:3

ex2:1

ex2:2

ex2:3

ex3:1

ex3:2

ex3:3

1

1

11

0.5

0.51

0.5

0.5

K3

K1

K2

V12 =

1 0 0.250 0.625 00 0 0.125

Ngonga Ngomo· Sherif · Lyko Colibri 13/24

Page 24: Colibri - Unsupervised Link Discovery through Knoeledge Base Repare

Outline Motivation Approach Evaluation Conclusion and Future Work

Step 2: Voting

• Vij = 1n−1

Mij +n∑

k=1k 6=i ,j

MikMkj

• Mapping matrices

• M12 =

1 0 00 1 00 0 0

• M13 =

1 0 10 0.5 00 0 0.5

• M23 =

1 0 00 0.5 00 0 0.5

ex1:1 ex1:2 ex1:3

ex2:1

ex2:2

ex2:3

ex3:1

ex3:2

ex3:3

1

1

11

0.5

0.51

0.5

0.5

K3

K1

K2

V12 =

1 0 0.250 0.625 00 0 0.125

Ngonga Ngomo· Sherif · Lyko Colibri 13/24

Page 25: Colibri - Unsupervised Link Discovery through Knoeledge Base Repare

Outline Motivation Approach Evaluation Conclusion and Future Work

Step 2: Voting

• Voting matrices

• V12 =

1 0 0.250 0.625 00 0 0.125

• V13 =

1 0 0.50 0.5 00 0 0.25

• V23 =

1 0 00 0.5 00 0 0.25

• Post-processed matrices

• V12 =

1 0 00 0.625 00 0 0.125

ex1:1 ex1:2 ex1:3

ex2:1

ex2:2

ex2:3

ex3:1

ex3:2

ex3:3

1

1

11

0.5

0.51

0.5

0.5

K3

K1

K2

Ngonga Ngomo· Sherif · Lyko Colibri 14/24

Page 26: Colibri - Unsupervised Link Discovery through Knoeledge Base Repare

Outline Motivation Approach Evaluation Conclusion and Future Work

Step 2: Voting

• Assume links in Vij to be correct

• vij = 1→ All matrices agree onhow to link (Ki ,Kj)e.g., V12(ex1:1, ex2:1)

• For all vij < 1 assume either

1 Missing linkse.g., V12(ex1:3, ex2:3) notcontained in M12

2 Weak linkse.g., V12(ex1:2, ex2:2) < 1 isdue to M13(ex1:2, ex3:2) andM32(ex3:2, ex2:2) being 0.5

V12 =

1 0 00 0.625 00 0 0.125

ex1:1 ex1:2 ex1:3

ex2:1

ex2:2

ex2:3

ex3:1

ex3:2

ex3:3

1

1

11

0.5

0.51

0.5

0.5

K3

K1

K2

Ngonga Ngomo· Sherif · Lyko Colibri 15/24

Page 27: Colibri - Unsupervised Link Discovery through Knoeledge Base Repare

Outline Motivation Approach Evaluation Conclusion and Future Work

Step 2: Voting

• Assume links in Vij to be correct

• vij = 1→ All matrices agree onhow to link (Ki ,Kj)e.g., V12(ex1:1, ex2:1)

• For all vij < 1 assume either

1 Missing linkse.g., V12(ex1:3, ex2:3) notcontained in M12

2 Weak linkse.g., V12(ex1:2, ex2:2) < 1 isdue to M13(ex1:2, ex3:2) andM32(ex3:2, ex2:2) being 0.5

V12 =

1 0 00 0.625 00 0 0.125

ex1:1ex1:1 ex1:2 ex1:3

ex2:1ex2:1

ex2:2

ex2:3

ex3:1ex3:1

ex3:2

ex3:3

11

11

11

0.5

0.511

0.5

0.5

K3

K1

K2

Ngonga Ngomo· Sherif · Lyko Colibri 15/24

Page 28: Colibri - Unsupervised Link Discovery through Knoeledge Base Repare

Outline Motivation Approach Evaluation Conclusion and Future Work

Step 2: Voting

• Assume links in Vij to be correct

• vij = 1→ All matrices agree onhow to link (Ki ,Kj)e.g., V12(ex1:1, ex2:1)

• For all vij < 1 assume either

1 Missing linkse.g., V12(ex1:3, ex2:3) notcontained in M12

2 Weak linkse.g., V12(ex1:2, ex2:2) < 1 isdue to M13(ex1:2, ex3:2) andM32(ex3:2, ex2:2) being 0.5

V12 =

1 0 00 0.625 00 0 0.125

ex1:1 ex1:2 ex1:3ex1:3

ex2:1

ex2:2

ex2:3ex2:3

ex3:1

ex3:2

ex3:3

1

1

11

0.5

0.51

0.5

0.5

K3

K1

K2

Ngonga Ngomo· Sherif · Lyko Colibri 15/24

Page 29: Colibri - Unsupervised Link Discovery through Knoeledge Base Repare

Outline Motivation Approach Evaluation Conclusion and Future Work

Step 2: Voting

• Assume links in Vij to be correct

• vij = 1→ All matrices agree onhow to link (Ki ,Kj)e.g., V12(ex1:1, ex2:1)

• For all vij < 1 assume either

1 Missing linkse.g., V12(ex1:3, ex2:3) notcontained in M12

2 Weak linkse.g., V12(ex1:2, ex2:2) < 1 isdue to M13(ex1:2, ex3:2) andM32(ex3:2, ex2:2) being 0.5

V12 =

1 0 00 0.625 00 0 0.125

ex1:1 ex1:2ex1:2 ex1:3

ex2:1

ex2:2ex2:2

ex2:3

ex3:1

ex3:2ex3:2

ex3:3

1

1

11

0.50.5

0.51

0.50.5

0.5

K3

K1

K2

Ngonga Ngomo· Sherif · Lyko Colibri 15/24

Page 30: Colibri - Unsupervised Link Discovery through Knoeledge Base Repare

Outline Motivation Approach Evaluation Conclusion and Future Work

Step 3: Repair

K1 K2. . . Kn

Unsupervised LD

VotingRepair

Mappings

Voting matrices

Repair instances

Ngonga Ngomo· Sherif · Lyko Colibri 16/24

Page 31: Colibri - Unsupervised Link Discovery through Knoeledge Base Repare

Outline Motivation Approach Evaluation Conclusion and Future Work

Step 3: Repair

• Goal: Repair instance data soas to improve vij < 1

• Link to be repaired is(ex1:2, ex2:2).

• Reason for this link:

• rs = ex1:2 and• rt = ex3:2.

• Computing average similarity:

• σ(ex1:2) = 0.75 while• σ(ex3:2) = 0.5.

• Colibri overwrite the valuesof ex3:2 with those of ex1:2.

V12 =

1 0 00 0.625 00 0 0.125

ex1:1 ex1:2 ex1:3

ex2:1

ex2:2

ex2:3

ex3:1

ex3:2

ex3:3

1

1

1

1

0.5

0.5

1

0.5

0.5

K3

K1

K2

Ngonga Ngomo· Sherif · Lyko Colibri 17/24

Page 32: Colibri - Unsupervised Link Discovery through Knoeledge Base Repare

Outline Motivation Approach Evaluation Conclusion and Future Work

Step 3: Repair

• Goal: Repair instance data soas to improve vij < 1

• Link to be repaired is(ex1:2, ex2:2).

• Reason for this link:

• rs = ex1:2 and• rt = ex3:2.

• Computing average similarity:

• σ(ex1:2) = 0.75 while• σ(ex3:2) = 0.5.

• Colibri overwrite the valuesof ex3:2 with those of ex1:2.

V12 =

1 0 00 0.625 00 0 0.125

ex1:1 ex1:2ex1:2 ex1:3

ex2:1

ex2:2ex2:2

ex2:3

ex3:1

ex3:2ex3:2

ex3:3

1

1

1

1

0.50.5

0.5

1

0.50.5

0.5

K3

K1

K2

Ngonga Ngomo· Sherif · Lyko Colibri 17/24

Page 33: Colibri - Unsupervised Link Discovery through Knoeledge Base Repare

Outline Motivation Approach Evaluation Conclusion and Future Work

Step 3: Repair

• Goal: Repair instance data soas to improve vij < 1

• Link to be repaired is(ex1:2, ex2:2).

• Reason for this link:

• rs = ex1:2 and• rt = ex3:2.

• Computing average similarity:

• σ(ex1:2) = 0.75 while• σ(ex3:2) = 0.5.

• Colibri overwrite the valuesof ex3:2 with those of ex1:2.

V12 =

1 0 00 0.625 00 0 0.125

ex1:1 ex1:2ex1:2 ex1:3

ex2:1

ex2:2ex2:2

ex2:3

ex3:1

ex3:2ex3:2

ex3:3

1

1

1

11

0.50.5

0.5

1

0.50.5

0.5

K3

K1

K2

Ngonga Ngomo· Sherif · Lyko Colibri 17/24

Page 34: Colibri - Unsupervised Link Discovery through Knoeledge Base Repare

Outline Motivation Approach Evaluation Conclusion and Future Work

Step 3: Repair

• Goal: Repair instance data soas to improve vij < 1

• Link to be repaired is(ex1:2, ex2:2).

• Reason for this link:

• rs = ex1:2 and• rt = ex3:2.

• Computing average similarity:

• σ(ex1:2) = 0.75 while• σ(ex3:2) = 0.5.

• Colibri overwrite the valuesof ex3:2 with those of ex1:2.

V12 =

1 0 00 0.625 00 0 0.125

ex1:1 ex1:2ex1:2 ex1:3

ex2:1

ex2:2ex2:2

ex2:3

ex3:1

ex3:2ex3:2

ex3:3

1

1

1

1

0.50.5

0.5

1

0.50.5

0.5

K3

K1

K2

Ngonga Ngomo· Sherif · Lyko Colibri 17/24

Page 35: Colibri - Unsupervised Link Discovery through Knoeledge Base Repare

Outline Motivation Approach Evaluation Conclusion and Future Work

Outline

1 Motivation

2 Approach

3 Evaluation

4 Conclusion and Future Work

Ngonga Ngomo· Sherif · Lyko Colibri 18/24

Page 36: Colibri - Unsupervised Link Discovery through Knoeledge Base Repare

Outline Motivation Approach Evaluation Conclusion and Future Work

Benchmark Generation Approach

• So far, no benchmark for linking n > 2 knowledge bases

• Benchmark generation approach (Ferrara et al., 2011)

• Generated m − 1 copies of initial dataset K1

• Alteration operators:• Misspellings• Abbreviations• Word permutations

• Alteration strategy:• Pick random resource according to alteration probability• Pick random operator

Ngonga Ngomo· Sherif · Lyko Colibri 18/24

Page 37: Colibri - Unsupervised Link Discovery through Knoeledge Base Repare

Outline Motivation Approach Evaluation Conclusion and Future Work

Experimental Setup

• Datasets:• Two synthetic datasets (OAEI2010)• Three real-world datasets (Koepcke et al., 2010)

• Colibri:• Maximal number of iterations = 10• Number of knowledge bases = {3, 4, 5}• Alteration probability ap = {10%, 20%, . . . , 50%}• Repeat each experiment 5 times

Ngonga Ngomo· Sherif · Lyko Colibri 19/24

Page 38: Colibri - Unsupervised Link Discovery through Knoeledge Base Repare

Outline Motivation Approach Evaluation Conclusion and Future Work

Experimental Results (synthetic dataset)

KBs FEuclid FColibri Runtime (sec) Repaired links

3 0.89 0.98 0.4 434 0.90 1.00 0.9 355 0.88 1.00 1.3 34

• Restaurant dataset

• Average values after 10 iterations

• Alteration probability ap = 50%

Ngonga Ngomo· Sherif · Lyko Colibri 20/24

Page 39: Colibri - Unsupervised Link Discovery through Knoeledge Base Repare

Outline Motivation Approach Evaluation Conclusion and Future Work

Experimental Results (real-world dataset)

KBs FEuclid FColibri Runtime (sec) Repaired links

3 0.86 0.98 81.8 3004 0.85 0.99 160.4 1505 0.84 0.88 246.8 60

• Amazon dataset

• Average values after 10 iterations

• Alteration probability ap = 50%

Ngonga Ngomo· Sherif · Lyko Colibri 21/24

Page 40: Colibri - Unsupervised Link Discovery through Knoeledge Base Repare

Outline Motivation Approach Evaluation Conclusion and Future Work

Results on the Restaurants dataset

• Alteration probabilityap = 50%

• Knowledge bases = 5

iterationNr 2 4 60

20

40

60

80

100

PrecisionRecallF-MeasureError rate

Full results at:https://github.com/AKSW/LIMES/tree/master/

evaluationsResults/colibri

Ngonga Ngomo· Sherif · Lyko Colibri 22/24

Page 41: Colibri - Unsupervised Link Discovery through Knoeledge Base Repare

Outline Motivation Approach Evaluation Conclusion and Future Work

Outline

1 Motivation

2 Approach

3 Evaluation

4 Conclusion and Future Work

Ngonga Ngomo· Sherif · Lyko Colibri 23/24

Page 42: Colibri - Unsupervised Link Discovery through Knoeledge Base Repare

Outline Motivation Approach Evaluation Conclusion and Future Work

Conclusion and Future Work• Conclusion

• Presented Colibri• Improved F-measure of Euclid up to 14%

• Future Work• Evaluation on other datasets• Interactive scenarios (i.e., consult user before dataset repair)• Combination with other unsupervised solutions (e.g., Eagle)

Ngonga Ngomo· Sherif · Lyko Colibri 23/24

Page 43: Colibri - Unsupervised Link Discovery through Knoeledge Base Repare

Outline Motivation Approach Evaluation Conclusion and Future Work

Thank You!

Questions?Mohamed Sherif

Augustusplatz 10D-04109 Leipzig

[email protected]://aksw.org/MohamedSherif

http://limes.sf.net

Ngonga Ngomo· Sherif · Lyko Colibri 24/24