OWL-based Semantic Conflicts Detection and Resolution for Data Interoperability
Changqing Li,Tok Wang Ling
Department of Computer ScienceSchool of ComputingNational University of Singapore
Preliminary and motivation
OWL-based Semantic Conflicts Detection and Resolution
Q & A
IntroductionData interoperability and integration is a long-standing challenge to the database research community.
Ontology provides sharing knowledge among different data sources
Clarify the semantics of information.
Provide a way to solve the interoperability problem in database integration
Introduction (Cont.)OWL is being promoted as a standard for web ontology language
In the future a considerable number of ontologies will be created based on OWL.
Therefore automatically detecting semantic conflicts based on OWL will greatly expedite the step to achieve semantic interoperability, and will greatly reduce the manual work to detect semantic conflicts.
Ontology DefinitionAn ontology defines the basic terms and relations comprising the vocabulary of a topic area, as well as the rules for combining terms and relations to define extensions to the vocabulary .
1. Robert Neches, Richard Fikes, Timothy W. Finin, Thomas R. Gruber, Ramesh Patil, Ted E. Senator, William R. Swartout: Enabling Technology for Knowledge Sharing. AI Magazine 12(3): pp36-56 (1991)
SHOEThe Simple HTML Ontological Extensions (SHOE)  extends HTML with machine-readable knowledge annotated.
2. Sean Luke and Jeff Heflin: SHOE Specification 1.01. http://www.cs.umd.edu/projects/plus/SHOE/spec.html
RDFResource Description Framework (RDF)  is a recommendation of W3C for Semantic Web 
It defines a simple model to describe relationships among resources in terms of properties and values.
SVO form (Subject-Verb-Object)Resource-property-Value
3. Ora Lassila and Ralph R. Swick: Resource description framework (RDF). http://www.w3c.org/TR/WD-rdf-syntax4. The SemanticWeb Homepage. http://www.semanticweb.org
RDFSRDF Schema (RDFS) , the primitive description language of RDF
Provide some basic primitivessubClassOfsubPropertyOf5.Dan Brickley and R.V. Guha. Resource Description Framework (RDF) Schema Specification 1.0, W3C Candidate Recommendation 27 March 2000. http://www.w3.org/TR/rdf-schema/
DAML+OILDARPA Agent Markup Language (DAML) To facilitate the semantic concepts and relationships understood by machines Ontology Inference Layer (OIL) Extends RDFS with additional language primitives not yet presented in RDFS. DAML+OIL  are the successors of RDFSCombination of DAML and OILMore semantic rich primitives are defined6.The DARPA Agent Markup Language Homepage. http://daml.semanticweb.org/7.The Ontology Inference Layer OIL Homepage.http://www.ontoknowledge.org/oil/TR/oil.long.html8.DAML+OIL Definition. http://www.daml.org/2001/03/daml+oil
OWLDAML+OIL is evolving as OWL (Web Ontology Language) .
OWL is almost the same as DAML+OIL
Some primitives of DAML+OIL are renamed in OWL for easier understanding.e.g., sameClassAs is changed to equivalentClass9. Frank van Harmelen, Jim Hendler, Ian Horrocks, Deborah L. McGuinness, Peter F. Patel-Schneider and Lynn Andrea Stein. OWL Web Ontology Language Reference. http://www.w3.org/TR/owl-ref/
Primitives of OWLowl before : is the namespaceowl:equivalentClassowl:euqivalentPropertyowl:sameIndividualAsowl:disjointWithowl:differentFrom
Our Extension of OWL (EOWL)We extend OWL with the following primitiveseowl:orderingPropertyeowl:overlapeowl:properSubClassOfeowl:properSubPropertyOf
OWL-based Semantic Conflicts CasesA. Name conflictsB. Order sensitive conflictsC. Scaling conflictsD. Whole and part conflictsE. Partial similarity conflictsF. Swap conflicts
A. Name conflictsExample A. two distributed data warehouses
one is used to analyze the United States marketcountry, state, city and district
and the other is used to analyze the China marketcountry, province, city and county
Based on the context
provicnce is defined equivalent to State using the OWL primitive owl:equivalentClass.
To resolve this conflict, one name needs to be changed. Change to the referenced name.
A. Name conflicts (Cont.)
owl:equivalentClass is the indicator to detect synonym conflicts
Change to State as which is referenced in the ontology definition.
Fig. A. Detection of synonym conflicts
A. Name conflicts (Cont.)Case A. Synonyms. The OWL primitives owl:equivalentClass, owl:equivalentProperty and owl:sameInvidualAs are indicators to detect this case.
Conflict Resolution Rule A. If synonym conflicts are detected, different attribute names with the same semantics need to be translated to the same name (referenced name) for smooth data interoperability.
B. Order sensitive conflictsExample B. Consider the highest three scores of a course.
The highest three scores of course A are listed as 90, 95, 100 at ascending order,
The highest three scores of course B are listed as 98, 95, 93 at descending order.
The highestThreeScores is defined as an eowl:orderingProperty in the ontology
The sequences of the highest three scores for course A and B should be adjusted both to ascending order or descending order.
Adjust to the sequence of the first one by default, e.g. the sequence of course A
B. Order sensitive conflicts (Cont.)We can further define the ascendant or descendant order for more precise semantics.Fig. B. Detection of order sensitive conflicts
highest three scores of a course
B. Order sensitive conflicts (Cont.)Case B. Order sensitive. EOWL primitive eowl:orderingProperty and RDF primitive rdf:Seq are indicators to detect this case.
Conflict Resolution Rule B. If order sensitive conflicts are detected, we need to adjust the member sequence according to the same criterion for smooth data interoperability, the sequence of the first one by default.
C. Scaling conflictsExample C. Consider two database schemas Product(ID, Price)Product(ID, Price)
One price may refer to the US dollars, while the other may refer to the Singapore dollars. Figure 4 shows some concepts about a currency ontology; price is defined
Translate the price to refer to the same currency unit. The unit of the first one by default.
C. Scaling conflicts (Cont.)Fig. C. Detection of scaling conflicts
C. Scaling conflicts (Cont.)Case C. Semantic conflicts may exist if the value of a data type property comprises both value and unit (Scaling). RDF primitive rdf:parseType="Resource" and OWL primitive owl:DatatypeProperty are indicators for this case.
Conflict Resolution Rule C. If scaling conflicts are detected, the value should be translated to refer to the same unit for smooth data interoperability. The first unit by default.
D. Whole and part conflictsExample D. Consider schemasPerson(ID, name)Person(ID, surname, givenName)
surname and givenName are both defined as the proper sub property of name; using eowl:properSubClassOf
eowl:properSubClassOf has clearer semantics than rdfs:subClassOf because rdfs:subClassOf is ambiguous with two meanings: eowl:properSubClassOfand owl:equivalentClass.
Divide the whole attribute name to the part attributes surname and givenName
Or combine the part attributes surname and givenName together in the correct sequence to form the whole attribute name.
D. Whole and part conflicts (Cont.)Fig. D1. Detection of whole and part conflicts
Fig. D2. Detection of whole and part conflicts
D. Whole and part conflicts (Cont.)Case D. Semantic conflicts may exist if one concept is completely contained in another concept (Whole and part). EOWL primitives eowl:properSubClassOf, eowl:properSubPropertyOf are indicators to detect this case.
Conflict Resolution Rule D. If whole and part conflicts are detected, the whole attributes should be divided into part attributes or the part attributes should be combined together to whole attributes for smooth data interoperability.
E. Partial similarity conflictsExample E. integration ResearchAssistant and GraduateStudent
The relationship between research assistant and graduate student is overlap because some research assistants are also graduate students,
but not all research assistants are graduate students,
and not all graduate students are research assistants.
After integration, there should be three schemas: Research Assistant but not Graduate Student RNotGGraduate Student but not Research Assistant GNotRboth Research Assistant and Graduate Student RAndG
E. Partial similarity conflicts (Cont.)Fig. E. Detection of partial similarity conflicts
E. Partial similarity conflicts (Cont.)Case E. Semantic conflicts may exist if two concepts are overlapped (Partial similarity). EOWL primitive eowl:overlap is indicators to detect this case.
Conflict Resolution Rule E. If partial similarity conflicts are detected, the overlap part should be separated before integration.
F. Swap conflictsExample F. Continued from Example A
In China, county is contained in city (city has larger area)
In US, city is contained in county (county has larger area).
The domain (County) of property region:containedIn in the China ontology is ju