Transcript
Page 1: A Classification of Schema-based Matching Approaches

A Classification of Schema-based Matching Approaches

A Classification of Schema-based Matching Approaches

Pavel Shvaiko

Meaning Coordination and Negotiation Workshop, ISWC

8th November 2004, Hiroshima, Japan

Page 2: A Classification of Schema-based Matching Approaches

2

MCN workshop, ISWC, 8th November 2004, Hiroshima, Japan

Outline

Introduction

Classification of schema-based matching approaches

Matching systems

Conclusions

Future work

Page 3: A Classification of Schema-based Matching Approaches

3

MCN workshop, ISWC, 8th November 2004, Hiroshima, Japan

Introduction

Page 4: A Classification of Schema-based Matching Approaches

4

MCN workshop, ISWC, 8th November 2004, Hiroshima, Japan

Semantic Web and the Match operator

Information sources (e.g., database schemas, taxonomies or ontologies) can be viewed as graph-like structures containing terms and their inter-relationships

Match is one of the key operators for enabling the Semantic Web since it takes two graph-like structures and produces a mapping between the nodes of the graphs that “correspond” semantically to each other

Page 5: A Classification of Schema-based Matching Approaches

5

MCN workshop, ISWC, 8th November 2004, Hiroshima, Japan

Example: Two XML schemas

HT

FT

Page 6: A Classification of Schema-based Matching Approaches

6

MCN workshop, ISWC, 8th November 2004, Hiroshima, Japan

Schema matching vs Ontology alignmentDifferences:

Database schemas often do not provide explicit semantics for their data

Ontologies are logical systems that themselves incorporate semantics (intuitive or formal)

E.g., ontology definitions as a set of logical axioms

Ontology data models are richer (the number of primitives is higher, and they are more complex) then schema data models

E.g., OWL allows defining new classes as unions or intersections of other classes

Commonalities:

Ontologies can be viewed as schemas for knowledge bases

Techniques developed for both problems are of a mutual benefit

Page 7: A Classification of Schema-based Matching Approaches

7

MCN workshop, ISWC, 8th November 2004, Hiroshima, Japan

Matching

{M} {M'}

Parameters(e.g., weights, thresholds)

Auxiliary Information(e.g., lexicons, thesauri)

S1

S2

Match

Mapping element, M is a 5-tuple < ID, e1, e2, n, R >

n = {x[0,1]}

R = { =, , , , }

Page 8: A Classification of Schema-based Matching Approaches

8

MCN workshop, ISWC, 8th November 2004, Hiroshima, Japan

Classification of Schema-based Matching Approaches

Page 9: A Classification of Schema-based Matching Approaches

9

MCN workshop, ISWC, 8th November 2004, Hiroshima, Japan

Schema matching approaches

Individual matchers

Schema-based Instance-based

• Graph matching

Linguistic Constraint-based• Types• Keys

• Value pattern and ranges

Constraint-based

Linguistic

• IR (word frequencies, key terms)

Constraint-based

• Names• Descriptions

Structure-levelElement-level Element-level

Combined matchers

automatic composition

Composite

manual composition

Hybrid

Taxonomy from [E. Rahm, P. Bernstein, 2001]

Page 10: A Classification of Schema-based Matching Approaches

10

MCN workshop, ISWC, 8th November 2004, Hiroshima, Japan

Semantic view on matching

Heuristic vs formal:heuristic techniques try to guess relations which may hold between similar labels or graph structuresformal techniques have model-theoretic semantics which is used to justify their results

Implicit vs explicit: Implicit techniques are syntax driven techniques

E.g., techniques, which consider labels as strings, or analyze data types, or soundex of schema/ontology elements

Explicit techniques exploit the semantics of labelsE.g., thesauruses, ontologies

What is missing in the taxonomy of schema matching approaches we have just seen ?

Two new criteria:

Page 11: A Classification of Schema-based Matching Approaches

11

MCN workshop, ISWC, 8th November 2004, Hiroshima, Japan

Schema Matching Approaches

Individual matchers

Schema-based

• Graph matching

Linguistic Constraint-based• Types• Keys

Constraint-based

• Names• Descriptions

Structure-levelElement-levelHeuristic vs Formal

Implicit vs Explicit

Page 12: A Classification of Schema-based Matching Approaches

12

MCN workshop, ISWC, 8th November 2004, Hiroshima, Japan

Schema-based Matching Approaches

Heuristic Techniques Formal Techniques

Element-level Element-levelStructure-level Structure-level

Implicit ImplicitExplicit Explicit ExplicitExplicit

String-based

Constraint-based

Constraint-based

Constraint-based

Auxiliary Information

Ontology-based

Reasoner-based

- Names

- Descriptions

- Type similarity

- Key properties

- Precompiled dictionary

- Lexicons

- Graph matching

- Children

- Leaves

- Taxonomic structure

- OWL properties

- Propositional SAT

- Modal SAT

Page 13: A Classification of Schema-based Matching Approaches

13

MCN workshop, ISWC, 8th November 2004, Hiroshima, Japan

Heuristic Techniques

Element-level explicit techniques Precompiled dictionary (Cupid, COMA)

E.g., syn key - "NKN:Nikon = syn“

Lexicons (S-Match, CTXmatch)

E.g., WordNet: Camera is a hypernym for Digital Camera,

therefore, Digital_Cameras Photo_and_Cameras

Structure-level explicit techniquesTaxonomic structure (Anchor-Prompt, NOM)

E.g., Given that Digital_Cameras Photo_and_Cameras, then FJFLM and FujiFilm can be found as an appropriate match

Example

Page 14: A Classification of Schema-based Matching Approaches

14

MCN workshop, ISWC, 8th November 2004, Hiroshima, Japan

Formal Techniques

Example

Element-level explicit techniques OWL properties (NOM)

E.g., sameClassAs constructor explicitly states that one class is equivalent to the other

Digital_Cameras = Camera DigitalPhoto_Producer

Structure-level explicit techniques Propositional satisfiability (SAT) (S-Match, CTXmatch)

The approach is to translate the matching problem, namely the two graphs (trees) and mapping queries into propositional formula and then to check it for its validity

Modal SAT (S-Match) The idea is to enhance propositional logics with modal logic (or ALC DL) operators. Therefore, the matching problem is translated into a modal logic formula which is further checked for its validity using sound and complete satisfiability search procedures.

Page 15: A Classification of Schema-based Matching Approaches

15

MCN workshop, ISWC, 8th November 2004, Hiroshima, Japan

Matching Systems

Page 16: A Classification of Schema-based Matching Approaches

16

MCN workshop, ISWC, 8th November 2004, Hiroshima, Japan

Characteristics of state of the art matchers

Conclusions

Page 17: A Classification of Schema-based Matching Approaches

17

MCN workshop, ISWC, 8th November 2004, Hiroshima, Japan

Uses of Classification

The classification proposed provides a common conceptual basis, and hence can be used for comparing (analytically) different existing schema/ontology matching systems

It can help in designing a new matching system, or an elementary matcher, taking advantages of state of the art solutions

Page 18: A Classification of Schema-based Matching Approaches

18

MCN workshop, ISWC, 8th November 2004, Hiroshima, Japan

Future Work

Provide a more detailed view on the general properties of matching algorithms

Add to the classification language-based techniques, e.g., tokenization, lemmatization, elimination

Extend classification by taking into account DL-based matchmaking solutions

Extend classification by adding new appearing matching techniques and systems implementing them, e.g., OLA, QOM

Compare matching systems also experimentally, with the help of benchmarks

Page 19: A Classification of Schema-based Matching Approaches

19

MCN workshop, ISWC, 8th November 2004, Hiroshima, Japan

References

Knowledge Web project: http://knowledgeweb.semanticweb.org/

Project website at DIT - ACCORD: http://www.dit.unitn.it/~accord/

P. Shvaiko: A classification of schema-based matching approaches. Technical Report, DIT-04-93, University of Trento, 2004.

E. Rahm, P. Bernstein: A survey of approaches to automatic schema matching. In Very Large Databases Journal, 10(4):334-350, 2001.

F. Giunchiglia, P.Shvaiko: Semantic matching. In The Knowledge Engineering Review Journal, 18(3):265-280, 2003.

P. Bouquet, L. Serafini, S. Zanobini: Semantic coordination: a new approach and an application. In Proceedings of ISWC, 130-145, 2003.

Page 20: A Classification of Schema-based Matching Approaches

20

MCN workshop, ISWC, 8th November 2004, Hiroshima, Japan

Thank you!


Recommended