Upload
carol-george
View
220
Download
1
Tags:
Embed Size (px)
Citation preview
Mar 27, 2008 Christiano Santiago 1
Schema Matching
Matching Large XML SchemasErhard Rahm, Hong-Hai Do, Sabine Maßmann
Putting Context into Schema MatchingPhilip Bohannon, Eiman Elnahrawy, Wenfei Fan, Michael Flaster
COMA - A System for Flexible Combination of Schema Matching ApproachesHongai-Hai Do, Erhard Rahm
Mar 27, 2008 Christiano Santiago 2
Goals
Introductory concepts on Schema Matching Context-Sensitive versus Context-Insensitive Complexity on XSD schemas
Mar 27, 2008 Christiano Santiago 3
Agenda
Terminology Different Approaches XML Schema Definition Context-Insensitive Context-Sensitive Q&A
Mar 27, 2008 Christiano Santiago 4
Terminology
Schema matching: it is the process of identifying that two objects are semantically related.
Mapping: it refers to the transformations between the objects.
Meaning
Conversion
Mar 27, 2008 Christiano Santiago 5
Terminology
Student.Name ≈ GradStudent.Name
Student.SSN ≈ GradStudent.ID
Student.Marks ≈ GradStudents.Grades
StudentName, SSN, Level,
Major, Marks
GradStudentName, ID, Major,
Grades
Match
Match
Transformation
Mar 27, 2008 Christiano Santiago 6
Schema Matching
Mar 27, 2008 Christiano Santiago 7
Context
Context-insensitive Context-sensitive
Mar 27, 2008 Christiano Santiago 8
Different Approaches
Schema-level matchers Instance-level matchers Hybrid matchers Reusing matching information
Mar 27, 2008 Christiano Santiago 9
Schema-Level Matchers
Only consider schema information Name Description Data type Relationship Constraints Number of nesting levels
Mar 27, 2008 Christiano Santiago 10
Instance-Level Matchers
Use instance-level to gather insight into the content and meaning of schema elements Linguistic
Dept DeptName EmpName
Constraints 416-7362100 M3J1P3
Mar 27, 2008 Christiano Santiago 11
Hybrid-Level Matchers
Combines more than one approach
Mar 27, 2008 Christiano Santiago 12
Reusing Matching Information
Use previous matching information for future matching tasks Structures or substructures often repeat
Caution Salary & Income
Payroll Tax Reporting
Mar 27, 2008 Christiano Santiago 13
XML Schema Definition (XSD)
Data types 19 built-in primitive data types 25 built-in derived data types User defined complex types
Mar 27, 2008 Christiano Santiago 14
Complex type definition: <complexType name="myNewNameType">
<complexContent> <restriction base="anyType"> <sequence>
<element name="name" type="string" /> <element name="location" type="string" />
</sequence> <attribute name="position" type="string" /> </restriction> </complexContent> </complexType>
<element name="employee" type="dc:myNewNameType" />
<dc:employee position="trainer"> <dc:name>Don Smith</dc:name> <dc:location>Dallas, TX</dc:location> </dc:employee>
XML Schema Definition (XSD)
Attribute
ChildElements
Mar 27, 2008 Christiano Santiago 15
XML Schema Definition (XSD)
Shared schema components
Mar 27, 2008 Christiano Santiago 16
XML Schema Definition (XSD)
Match Systems approaches COMA: path-based Cupid: materialized
Scalability issue: XCBL Order schema contains 1451 components, including 91 shared types. After resolving the shared components, 26000+ nodes/paths were identified.
Mar 27, 2008 Christiano Santiago 17
XML Schema Definition (XSD)
Distributed schemas XSD allows a schema to be distributed
over several schema documents (.xsd files) and namespaces
Mar 27, 2008 Christiano Santiago 18
XML Schema Definition (XSD)
Determining similarity between and
matching complex types can be as difficult
as matching two complete schemas.
Mar 27, 2008 Christiano Santiago 19
Standard Schema Matching Context-Insensitive
Matchers Matching algorithms to compute similarity
scores between a pair of attributes Weights
Scores are weighted Confidence scores are identified based on
standard statistical techniques Selection of best matches
Mar 27, 2008 Christiano Santiago 20
Fragmented-Based Schema Matching Context-Insensitive
Fragment identification Identifying fragment-pair candidates Fragment matching Result combination
Mar 27, 2008 Christiano Santiago 21
Prototype
Based on COMA: COmbining MAtch algorithm Support to multiple file schema Multiple matching strategies Fragment-based approach Result combination
Mar 27, 2008 Christiano Santiago 22
COMA
Schema representation
Schemas are represented by rooted DAGs (Directed Acyclic Graphs).
Mar 27, 2008 Christiano Santiago 23
COMA
Directed Acyclic Graphs
Direct graph With no cycles Part tree & part graph Used in Critical Path Analysis,Expression Tree
Evaluation and Game Evaluation
Mar 27, 2008 Christiano Santiago 24
COMA
Match processing
reusability
Mar 27, 2008 Christiano Santiago 25
Continuity of this work
2004: COMA prototype 2005: COMA++, extended previous COMA
prototype High quality and fast execution times Default combination of 4 matchers
2007: MOMA: Mapping-based Object Matching
Mar 27, 2008 Christiano Santiago 26
Context Schema MatchingContext-Sensitive
False Negatives
Rs.price.price → RT.music.priceRs.price.price → RT.music.sale
RS.price.prcode = “reg”
RS.price.prcode = “sale”
Mar 27, 2008 Christiano Santiago 27
Context Schema MatchingContext-Sensitive
Two techniques for selecting contextual matches:
MultiTable: find the single match with the highest confidence for every target attribute
QualTable: find the best matches on a per-table basis
Mar 27, 2008 Christiano Santiago 28
Context Schema MatchingContext-Sensitive
Experimental Results
“Because of its poor performance, MultiTable is not considered further”
Mar 27, 2008 Christiano Santiago 29
Conclusion
Current schema matching approaches still have to improve for large and complex schemas.
The large search space increases the likelihood for false matches as well as execution times.
Further difficulties for schema matching are posed by the high expressive power and versatility of modern schema languages like XSD.
Mar 27, 2008 Christiano Santiago 30
Questions