22
A survey of approaches to automatic schema matching Erhard Rahm, Universität für Informatik, Leipzig Philip A. Bernstein, Microsoft Research VLDB 2001

A survey of approaches to automatic schema matching

  • Upload
    torie

  • View
    45

  • Download
    1

Embed Size (px)

DESCRIPTION

A survey of approaches to automatic schema matching. Erhard Rahm, Universität für Informatik, Leipzig Philip A. Bernstein, Microsoft Research VLDB 2001. - PowerPoint PPT Presentation

Citation preview

Page 1: A survey of approaches to automatic schema matching

A survey of approaches to automatic schema matching

Erhard Rahm, Universität für Informatik, LeipzigPhilip A. Bernstein, Microsoft Research

VLDB 2001

Page 2: A survey of approaches to automatic schema matching

2

ProblemSchema matching: produce a mapping between elements of two schemas

such that the elements in the mapping correspond semantically to each other.

Cust

C#

Cname

FirstName

LastName

Customer

CustID

Company

Contact

Schema 1 Schema 2

A real-world problem:Schema integration, Data warehouses, E-commerce, Semantic query processing

Page 3: A survey of approaches to automatic schema matching

3

Problem (cont.)

The paper surveys approaches for automated schema matching and presents a taxonomy.

• Manual schema matching: tedious, time-consuming, error-prone and therefore expensive.

• Automated schema matching: the solution

Page 4: A survey of approaches to automatic schema matching

4

Overview• Problem and applications• Match operator• Classification

– Schema level matchers– Instance level matchers– Combining matchers

• Prototype implementations• Conclusion• Critique

Page 5: A survey of approaches to automatic schema matching

5

Match• Match is an abstract operator for implementing schema matching

– Input: two input schemas– Output: a set of mapping elements

• Match is based on heuristics that approximate what the user considers to be a good match

• Implementations of match produces ’match candidates’• Not possible to determine all matches automatically

Page 6: A survey of approaches to automatic schema matching

6

Match (cont.)

Match candidates

S1.C# = S2.CustID

S1.Cname = S2.Company

S1.Firstname = S2.Contact

Match

Schema 1

Schema 2

User acceptance

Matches

S1.C# = S2.CustID

S1.Cname = S2.Company

Page 7: A survey of approaches to automatic schema matching

7

Generic Match Architecture

Page 8: A survey of approaches to automatic schema matching

8

Classification

Page 9: A survey of approaches to automatic schema matching

9

Classification

Page 10: A survey of approaches to automatic schema matching

10

Schema-level matchers• Element-level

– Linguistic approaches:• Similarity of names, e.g. FirstName first_name• Equality of synonyms, e.g. car automobile• Equality of hypernyms, i.e. book publication, article publication• Description matching:

S1: empn // employee nameS2: name // name of employee

– Constraint-based approaches:• Data types, e.g. varchar text• Value ranges• Uniqueness

• Structural-level

Page 11: A survey of approaches to automatic schema matching

11

Classification

Page 12: A survey of approaches to automatic schema matching

12

Instance-level matchers• Linguistic characterization

– Keywords, frequencies of words, combinations, etc.

CName

Microsoft

Apple

Microsoft

Microsoft

Lenovo

Schema 1 Schema 2

Company

IBM

Microsoft

Apple

Microsoft

Apple

EmpName

Allan

Steve

Bob

Carol

match

Page 13: A survey of approaches to automatic schema matching

13

Instance-level matchers• Constraint-based characterization

– Character patterns and numerical value ranges

Price

$19.80

$136.25

$5.00

$64.36

Schema 1 Schema 2

Paid

$24.20

$32.54

$532.00

$33.33

match

Page 14: A survey of approaches to automatic schema matching

14

Classification

Page 15: A survey of approaches to automatic schema matching

15

Combining matchers• The best result is archived by combining multiple matchers• Two types:

– Hybrid matchers– Composite matchers

Hybrid matcher

DatatypesNames

Value ranges

Match candidates

...

...

...

Page 16: A survey of approaches to automatic schema matching

16

Combining matchers• The best result is archived by combining multiple matchers• Two types:

– Hybrid matchers– Composite matchers

Composite matcher

Match candidates

...

...

...

Name matcher

Datatypematcher

Page 17: A survey of approaches to automatic schema matching

17

Prototype implementations

Page 18: A survey of approaches to automatic schema matching

20

Example: SemInt• 15 contraint-based, 5 contant-based matching criteria• Each criteria is mapped to a range [0..1] for every element. Yields an N-

dimensional point for N matching criteria

1

1

0

0 Field length

Data type

C#CustID

CName

Company

Page 19: A survey of approaches to automatic schema matching

21

Conclusion• Proposes a taxonomy• Characterizes and compares previous implementations using this taxonomy• Useful for:

– Programmers who need to implement Match– Researchers looking to develop better algorithms

• Proposes subjects for further research:– Test of performance and accuracy of existing approaches– Better utilization of instance-level information

Page 20: A survey of approaches to automatic schema matching

22

CritiqueGood:• Provides a good overview of the subject, Fig. 2 and Table 5 in particular• Good at pointing out subjects that should be researched further• Taxonomy is easy to understand and is explained well

Could be improved:• Does not compared performance or correctness of implementations• No examples in the descripton of existing implementations• Lacking good examples of structural level matching• Relative performance of implementations are mentioned only once: ”Cupid

performed somewhat better overall”. Cupid is developed by the authors.

Page 21: A survey of approaches to automatic schema matching

Questions?Questions?

Page 22: A survey of approaches to automatic schema matching

Questions?