30
Mar 27, 2008 Christiano Santiago 1 Schema Matching Matching Large XML Schemas Erhard Rahm, Hong-Hai Do, Sabine Maßmann Putting Context into Schema Matching Philip Bohannon, Eiman Elnahrawy, Wenfei Fan, Michael Flaster COMA - A System for Flexible Combination of Schema Matching Approaches Hongai-Hai Do, Erhard Rahm

Mar 27, 2008 Christiano Santiago1 Schema Matching Matching Large XML Schemas Erhard Rahm, Hong-Hai Do, Sabine Maßmann Putting Context into Schema Matching

Embed Size (px)

Citation preview

Page 1: Mar 27, 2008 Christiano Santiago1 Schema Matching Matching Large XML Schemas Erhard Rahm, Hong-Hai Do, Sabine Maßmann Putting Context into Schema Matching

Mar 27, 2008 Christiano Santiago 1

Schema Matching

Matching Large XML SchemasErhard Rahm, Hong-Hai Do, Sabine Maßmann

Putting Context into Schema MatchingPhilip Bohannon, Eiman Elnahrawy, Wenfei Fan, Michael Flaster

COMA - A System for Flexible Combination of Schema Matching ApproachesHongai-Hai Do, Erhard Rahm

Page 2: Mar 27, 2008 Christiano Santiago1 Schema Matching Matching Large XML Schemas Erhard Rahm, Hong-Hai Do, Sabine Maßmann Putting Context into Schema Matching

Mar 27, 2008 Christiano Santiago 2

Goals

Introductory concepts on Schema Matching Context-Sensitive versus Context-Insensitive Complexity on XSD schemas

Page 3: Mar 27, 2008 Christiano Santiago1 Schema Matching Matching Large XML Schemas Erhard Rahm, Hong-Hai Do, Sabine Maßmann Putting Context into Schema Matching

Mar 27, 2008 Christiano Santiago 3

Agenda

Terminology Different Approaches XML Schema Definition Context-Insensitive Context-Sensitive Q&A

Page 4: Mar 27, 2008 Christiano Santiago1 Schema Matching Matching Large XML Schemas Erhard Rahm, Hong-Hai Do, Sabine Maßmann Putting Context into Schema Matching

Mar 27, 2008 Christiano Santiago 4

Terminology

Schema matching: it is the process of identifying that two objects are semantically related.

Mapping: it refers to the transformations between the objects.

Meaning

Conversion

Page 5: Mar 27, 2008 Christiano Santiago1 Schema Matching Matching Large XML Schemas Erhard Rahm, Hong-Hai Do, Sabine Maßmann Putting Context into Schema Matching

Mar 27, 2008 Christiano Santiago 5

Terminology

Student.Name ≈ GradStudent.Name

Student.SSN ≈ GradStudent.ID

Student.Marks ≈ GradStudents.Grades

StudentName, SSN, Level,

Major, Marks

GradStudentName, ID, Major,

Grades

Match

Match

Transformation

Page 6: Mar 27, 2008 Christiano Santiago1 Schema Matching Matching Large XML Schemas Erhard Rahm, Hong-Hai Do, Sabine Maßmann Putting Context into Schema Matching

Mar 27, 2008 Christiano Santiago 6

Schema Matching

Page 7: Mar 27, 2008 Christiano Santiago1 Schema Matching Matching Large XML Schemas Erhard Rahm, Hong-Hai Do, Sabine Maßmann Putting Context into Schema Matching

Mar 27, 2008 Christiano Santiago 7

Context

Context-insensitive Context-sensitive

Page 8: Mar 27, 2008 Christiano Santiago1 Schema Matching Matching Large XML Schemas Erhard Rahm, Hong-Hai Do, Sabine Maßmann Putting Context into Schema Matching

Mar 27, 2008 Christiano Santiago 8

Different Approaches

Schema-level matchers Instance-level matchers Hybrid matchers Reusing matching information

Page 9: Mar 27, 2008 Christiano Santiago1 Schema Matching Matching Large XML Schemas Erhard Rahm, Hong-Hai Do, Sabine Maßmann Putting Context into Schema Matching

Mar 27, 2008 Christiano Santiago 9

Schema-Level Matchers

Only consider schema information Name Description Data type Relationship Constraints Number of nesting levels

Page 10: Mar 27, 2008 Christiano Santiago1 Schema Matching Matching Large XML Schemas Erhard Rahm, Hong-Hai Do, Sabine Maßmann Putting Context into Schema Matching

Mar 27, 2008 Christiano Santiago 10

Instance-Level Matchers

Use instance-level to gather insight into the content and meaning of schema elements Linguistic

Dept DeptName EmpName

Constraints 416-7362100 M3J1P3

Page 11: Mar 27, 2008 Christiano Santiago1 Schema Matching Matching Large XML Schemas Erhard Rahm, Hong-Hai Do, Sabine Maßmann Putting Context into Schema Matching

Mar 27, 2008 Christiano Santiago 11

Hybrid-Level Matchers

Combines more than one approach

Page 12: Mar 27, 2008 Christiano Santiago1 Schema Matching Matching Large XML Schemas Erhard Rahm, Hong-Hai Do, Sabine Maßmann Putting Context into Schema Matching

Mar 27, 2008 Christiano Santiago 12

Reusing Matching Information

Use previous matching information for future matching tasks Structures or substructures often repeat

Caution Salary & Income

Payroll Tax Reporting

Page 13: Mar 27, 2008 Christiano Santiago1 Schema Matching Matching Large XML Schemas Erhard Rahm, Hong-Hai Do, Sabine Maßmann Putting Context into Schema Matching

Mar 27, 2008 Christiano Santiago 13

XML Schema Definition (XSD)

Data types 19 built-in primitive data types 25 built-in derived data types User defined complex types

Page 14: Mar 27, 2008 Christiano Santiago1 Schema Matching Matching Large XML Schemas Erhard Rahm, Hong-Hai Do, Sabine Maßmann Putting Context into Schema Matching

Mar 27, 2008 Christiano Santiago 14

Complex type definition: <complexType name="myNewNameType">

<complexContent> <restriction base="anyType"> <sequence>

<element name="name" type="string" /> <element name="location" type="string" />

</sequence> <attribute name="position" type="string" /> </restriction> </complexContent> </complexType>

<element name="employee" type="dc:myNewNameType" />

<dc:employee position="trainer"> <dc:name>Don Smith</dc:name> <dc:location>Dallas, TX</dc:location> </dc:employee>

XML Schema Definition (XSD)

Attribute

ChildElements

Page 15: Mar 27, 2008 Christiano Santiago1 Schema Matching Matching Large XML Schemas Erhard Rahm, Hong-Hai Do, Sabine Maßmann Putting Context into Schema Matching

Mar 27, 2008 Christiano Santiago 15

XML Schema Definition (XSD)

Shared schema components

Page 16: Mar 27, 2008 Christiano Santiago1 Schema Matching Matching Large XML Schemas Erhard Rahm, Hong-Hai Do, Sabine Maßmann Putting Context into Schema Matching

Mar 27, 2008 Christiano Santiago 16

XML Schema Definition (XSD)

Match Systems approaches COMA: path-based Cupid: materialized

Scalability issue: XCBL Order schema contains 1451 components, including 91 shared types. After resolving the shared components, 26000+ nodes/paths were identified.

Page 17: Mar 27, 2008 Christiano Santiago1 Schema Matching Matching Large XML Schemas Erhard Rahm, Hong-Hai Do, Sabine Maßmann Putting Context into Schema Matching

Mar 27, 2008 Christiano Santiago 17

XML Schema Definition (XSD)

Distributed schemas XSD allows a schema to be distributed

over several schema documents (.xsd files) and namespaces

Page 18: Mar 27, 2008 Christiano Santiago1 Schema Matching Matching Large XML Schemas Erhard Rahm, Hong-Hai Do, Sabine Maßmann Putting Context into Schema Matching

Mar 27, 2008 Christiano Santiago 18

XML Schema Definition (XSD)

Determining similarity between and

matching complex types can be as difficult

as matching two complete schemas.

Page 19: Mar 27, 2008 Christiano Santiago1 Schema Matching Matching Large XML Schemas Erhard Rahm, Hong-Hai Do, Sabine Maßmann Putting Context into Schema Matching

Mar 27, 2008 Christiano Santiago 19

Standard Schema Matching Context-Insensitive

Matchers Matching algorithms to compute similarity

scores between a pair of attributes Weights

Scores are weighted Confidence scores are identified based on

standard statistical techniques Selection of best matches

Page 20: Mar 27, 2008 Christiano Santiago1 Schema Matching Matching Large XML Schemas Erhard Rahm, Hong-Hai Do, Sabine Maßmann Putting Context into Schema Matching

Mar 27, 2008 Christiano Santiago 20

Fragmented-Based Schema Matching Context-Insensitive

Fragment identification Identifying fragment-pair candidates Fragment matching Result combination

Page 21: Mar 27, 2008 Christiano Santiago1 Schema Matching Matching Large XML Schemas Erhard Rahm, Hong-Hai Do, Sabine Maßmann Putting Context into Schema Matching

Mar 27, 2008 Christiano Santiago 21

Prototype

Based on COMA: COmbining MAtch algorithm Support to multiple file schema Multiple matching strategies Fragment-based approach Result combination

Page 22: Mar 27, 2008 Christiano Santiago1 Schema Matching Matching Large XML Schemas Erhard Rahm, Hong-Hai Do, Sabine Maßmann Putting Context into Schema Matching

Mar 27, 2008 Christiano Santiago 22

COMA

Schema representation

Schemas are represented by rooted DAGs (Directed Acyclic Graphs).

Page 23: Mar 27, 2008 Christiano Santiago1 Schema Matching Matching Large XML Schemas Erhard Rahm, Hong-Hai Do, Sabine Maßmann Putting Context into Schema Matching

Mar 27, 2008 Christiano Santiago 23

COMA

Directed Acyclic Graphs

Direct graph With no cycles Part tree & part graph Used in Critical Path Analysis,Expression Tree

Evaluation and Game Evaluation

Page 24: Mar 27, 2008 Christiano Santiago1 Schema Matching Matching Large XML Schemas Erhard Rahm, Hong-Hai Do, Sabine Maßmann Putting Context into Schema Matching

Mar 27, 2008 Christiano Santiago 24

COMA

Match processing

reusability

Page 25: Mar 27, 2008 Christiano Santiago1 Schema Matching Matching Large XML Schemas Erhard Rahm, Hong-Hai Do, Sabine Maßmann Putting Context into Schema Matching

Mar 27, 2008 Christiano Santiago 25

Continuity of this work

2004: COMA prototype 2005: COMA++, extended previous COMA

prototype High quality and fast execution times Default combination of 4 matchers

2007: MOMA: Mapping-based Object Matching

Page 26: Mar 27, 2008 Christiano Santiago1 Schema Matching Matching Large XML Schemas Erhard Rahm, Hong-Hai Do, Sabine Maßmann Putting Context into Schema Matching

Mar 27, 2008 Christiano Santiago 26

Context Schema MatchingContext-Sensitive

False Negatives

Rs.price.price → RT.music.priceRs.price.price → RT.music.sale

RS.price.prcode = “reg”

RS.price.prcode = “sale”

Page 27: Mar 27, 2008 Christiano Santiago1 Schema Matching Matching Large XML Schemas Erhard Rahm, Hong-Hai Do, Sabine Maßmann Putting Context into Schema Matching

Mar 27, 2008 Christiano Santiago 27

Context Schema MatchingContext-Sensitive

Two techniques for selecting contextual matches:

MultiTable: find the single match with the highest confidence for every target attribute

QualTable: find the best matches on a per-table basis

Page 28: Mar 27, 2008 Christiano Santiago1 Schema Matching Matching Large XML Schemas Erhard Rahm, Hong-Hai Do, Sabine Maßmann Putting Context into Schema Matching

Mar 27, 2008 Christiano Santiago 28

Context Schema MatchingContext-Sensitive

Experimental Results

“Because of its poor performance, MultiTable is not considered further”

Page 29: Mar 27, 2008 Christiano Santiago1 Schema Matching Matching Large XML Schemas Erhard Rahm, Hong-Hai Do, Sabine Maßmann Putting Context into Schema Matching

Mar 27, 2008 Christiano Santiago 29

Conclusion

Current schema matching approaches still have to improve for large and complex schemas.

The large search space increases the likelihood for false matches as well as execution times.

Further difficulties for schema matching are posed by the high expressive power and versatility of modern schema languages like XSD.

Page 30: Mar 27, 2008 Christiano Santiago1 Schema Matching Matching Large XML Schemas Erhard Rahm, Hong-Hai Do, Sabine Maßmann Putting Context into Schema Matching

Mar 27, 2008 Christiano Santiago 30

Questions