36
OWL-based Semantic Conflicts Detection and Resolution for Data Interoperability Changqing Li, Tok Wang Ling Department of Computer Science School of Computing National University of Singapore

OWL-based Semantic Conflicts Detection and Resolution for Data Interoperability Changqing Li,Tok Wang Ling Department of Computer Science School of Computing

Embed Size (px)

Citation preview

OWL-based Semantic Conflicts Detection and Resolution for Data Interoperability

Changqing Li, Tok Wang Ling

Department of Computer ScienceSchool of Computing

National University of Singapore

2

Outline Introduction

Preliminary and motivation

OWL-based Semantic Conflicts Detection and Resolution

Conclusion

Q & A

3

Introduction Data interoperability and integration is a long-

standing challenge to the database research community.

Ontology provides sharing knowledge among different data sources

Clarify the semantics of information.

Provide a way to solve the interoperability problem in database integration

4

Introduction (Cont.) OWL is being promoted as a standard for web

ontology language

In the future a considerable number of ontologies will be created based on OWL.

Therefore automatically detecting semantic conflicts based on OWL will greatly expedite the step to achieve semantic interoperability, and will greatly reduce the manual work to detect semantic conflicts.

5

Ontology Definition An ontology defines the basic terms

and relations comprising the vocabulary of a topic area, as well as the rules for combining terms and relations to define extensions to the vocabulary [1].

1. Robert Neches, Richard Fikes, Timothy W. Finin, Thomas R. Gruber, Ramesh Patil, Ted E. Senator, William R. Swartout: Enabling Technology for Knowledge Sharing. AI Magazine 12(3): pp36-56 (1991)

6

Ontology Language SHOE

RDF

RDFS

DAML+OIL

OWL

7

SHOE

The Simple HTML Ontological Extensions (SHOE) [2] extends HTML with machine-readable knowledge annotated.

2. Sean Luke and Jeff Heflin: SHOE Specification 1.01. http://www.cs.umd.edu/projects/plus/SHOE/spec.html

8

RDF Resource Description Framework (RDF) [3] is a

recommendation of W3C for Semantic Web [4]

It defines a simple model to describe relationships among resources in terms of properties and values.

SVO form (Subject-Verb-Object) Resource-property-Value

3. Ora Lassila and Ralph R. Swick: Resource description framework (RDF).

http://www.w3c.org/TR/WD-rdf-syntax

4. The SemanticWeb Homepage. http://www.semanticweb.org

9

RDF (Cont.)

< Re s o u rc e A >

< p ro p e rty A >

< Re s o u rc e B>

< p ro p e rty B>

< Re s o u rc e C>

< p ro p e rty C>

Va lu e C

< /p ro p e rty C>

< /Re s o u rc e C>

< /p ro p e rty B>

< /Re s o u rc e B>

< /p ro p e rty A >

< /Re s o u rc e A >

Va lu e o fp ro p e rty B

Va lu e o fp ro p e rty A

10

RDFS RDF Schema (RDFS) [5], the primitive

description language of RDF

Provide some basic primitives subClassOf subPropertyOf …

5. Dan Brickley and R.V. Guha. Resource Description Framework (RDF) Schema Specification 1.0, W3C Candidate Recommendation 27 March 2000. http://www.w3.org/TR/rdf-schema/

11

DAML+OIL DARPA Agent Markup Language (DAML) [6]

To facilitate the semantic concepts and relationships understood by machines

Ontology Inference Layer (OIL) [7] Extends RDFS with additional language primitives

not yet presented in RDFS. DAML+OIL [8] are the successors of RDFS

Combination of DAML and OIL More semantic rich primitives are defined

6. The DARPA Agent Markup Language Homepage. http://daml.semanticweb.org/

7. The Ontology Inference Layer OIL Homepage.http://www.ontoknowledge.org/oil/TR/oil.long.html

8. DAML+OIL Definition. http://www.daml.org/2001/03/daml+oil

12

OWL DAML+OIL is evolving as OWL (Web Ontology

Language) [9].

OWL is almost the same as DAML+OIL

Some primitives of DAML+OIL are renamed in OWL for easier understanding.

e.g., “sameClassAs” is changed to “equivalentClass” …

9. Frank van Harmelen, Jim Hendler, Ian Horrocks, Deborah L. McGuinness, Peter F. Patel-Schneider and Lynn Andrea Stein. OWL Web Ontology Language Reference. http://www.w3.org/TR/owl-ref/

13

Primitives of OWL

“owl” before “:” is the namespace owl:equivalentClass owl:euqivalentProperty owl:sameIndividualAs owl:disjointWith owl:differentFrom …

14

Our Extension of OWL (EOWL)

We extend OWL with the following primitives eowl:orderingProperty eowl:overlap eowl:properSubClassOf eowl:properSubPropertyOf …

15

OWL-based Semantic Conflicts Cases

A. Name conflictsB. Order sensitive conflictsC. Scaling conflictsD. Whole and part conflictsE. Partial similarity conflictsF. Swap conflicts

16

A. Name conflicts Example A. two distributed data warehouses

one is used to analyze the United States market country, state, city and district

and the other is used to analyze the China market country, province, city and county

Based on the context

“provicnce” is defined equivalent to “State” using the OWL primitive “owl:equivalentClass”.

To resolve this conflict, one name needs to be changed. Change to the referenced name.

17

A. Name conflicts (Cont.)

<owl:Class rdf:ID="Province"> <rdfs:label>Province</rdfs:label> <owl:equivalentClass rdf:resource="#State"/></owl:Class>

Fig. A. Detection of synonym conflicts

“owl:equivalentClass” is the indicator to detect synonym conflicts

Change to “State” as which is referenced in the ontology definition.

18

A. Name conflicts (Cont.) Case A. Synonyms. The OWL primitives

“owl:equivalentClass”, “owl:equivalentProperty” and “owl:sameInvidualAs” are indicators to detect this case.

Conflict Resolution Rule A. If synonym conflicts are detected, different attribute names with the same semantics need to be translated to the same name (referenced name) for smooth data interoperability.

19

B. Order sensitive conflicts Example B. Consider the highest three scores of a course.

The highest three scores of course A are listed as “90, 95, 100” at ascending order,

The highest three scores of course B are listed as “98, 95, 93” at descending order.

The “highestThreeScores” is defined as an “eowl:orderingProperty” in the ontology

The sequences of the highest three scores for course A and B should be adjusted both to ascending order or descending order.

Adjust to the sequence of the first one by default, e.g. the sequence of course A

20

B. Order sensitive conflicts (Cont.)

Fig. B. Detection of order sensitive conflicts

<eowl:orderingProperty rdf:ID="highestThreeScores"> <rdfs:label>highest three scores of a course</rdfs:label> <rdfs:domain rdf:resource="#Course"/> <rdfs:range rdf:resource="xsd#integer"/></eowl:orderingProperty>

We can further define the ascendant or descendant order for more precise semantics.

21

B. Order sensitive conflicts (Cont.) Case B. Order sensitive. EOWL primitive

“eowl:orderingProperty” and RDF primitive “rdf:Seq” are indicators to detect this case.

Conflict Resolution Rule B. If order sensitive conflicts are detected, we need to adjust the member sequence according to the same criterion for smooth data interoperability, the sequence of the first one by default.

22

C. Scaling conflicts Example C. Consider two database schemas

Product(ID, Price) Product(ID, Price)

One price may refer to the US dollars, while the other may refer to the Singapore dollars. Figure 4 shows some concepts about a currency ontology; “price” is defined

Translate the price to refer to the same currency unit. The unit of the first one by default.

23

C. Scaling conflicts (Cont.)

Fig. C. Detection of scaling conflicts

<owl:DatatypeProperty rdf:ID="price"> <rdfs:domain rdf:resource="#Product"> <rdfs:range rdf:parseType="Resource"> <rdf:value/> <currency:CurrencyUnit/> </rdfs:range></owl:DatatypeProperty>

24

C. Scaling conflicts (Cont.) Case C. Semantic conflicts may exist if the

value of a data type property comprises both value and unit (Scaling). RDF primitive “rdf:parseType="Resource"” and OWL primitive “owl:DatatypeProperty” are indicators for this case.

Conflict Resolution Rule C. If scaling conflicts are detected, the value should be translated to refer to the same unit for smooth data interoperability. The first unit by default.

25

D. Whole and part conflicts Example D. Consider schemas

Person(ID, name) Person(ID, surname, givenName)

“surname” and “givenName” are both defined as the proper sub property of “name”; using “eowl:properSubClassOf”

“eowl:properSubClassOf” has clearer semantics than “rdfs:subClassOf” because “rdfs:subClassOf” is ambiguous with two meanings: “eowl:properSubClassOf”and “owl:equivalentClass”.

Divide the whole attribute “name” to the part attributes “surname” and “givenName”

Or combine the part attributes “surname” and “givenName” together in the correct sequence to form the whole attribute “name”.

26

D. Whole and part conflicts (Cont.)

Fig. D1. Detection of whole and part conflicts

<rdf:Property rdf:ID="surname"> <eowl:properSubPropertyOf rdf:resource="#name"></rdf:Property>

Fig. D2. Detection of whole and part conflicts

<rdf:Property rdf:ID=“givenname"> <eowl:properSubPropertyOf rdf:resource="#name"></rdf:Property>

27

D. Whole and part conflicts (Cont.) Case D. Semantic conflicts may exist if one

concept is completely contained in another concept (Whole and part). EOWL primitives “eowl:properSubClassOf”, “eowl:properSubPropertyOf” are indicators to detect this case.

Conflict Resolution Rule D. If whole and part conflicts are detected, the whole attributes should be divided into part attributes or the part attributes should be combined together to whole attributes for smooth data interoperability.

28

E. Partial similarity conflicts Example E. integration ResearchAssistant and

GraduateStudent

The relationship between research assistant and graduate student is overlap because some research assistants are also graduate students,

but not all research assistants are graduate students,

and not all graduate students are research assistants.

After integration, there should be three schemas: Research Assistant but not Graduate Student RNotG Graduate Student but not Research Assistant GNotR both Research Assistant and Graduate Student RAndG

29

E. Partial similarity conflicts (Cont.)

Fig. E. Detection of partial similarity conflicts

<owl:Class rdf:ID="ResearchAssistant"> <eowl:overlap rdf:resource="#GraduateStudent"/></owl:Class>

30

E. Partial similarity conflicts (Cont.) Case E. Semantic conflicts may exist if two

concepts are overlapped (Partial similarity). EOWL primitive “eowl:overlap” is indicators to detect this case.

Conflict Resolution Rule E. If partial similarity conflicts are detected, the overlap part should be separated before integration.

31

F. Swap conflicts Example F. Continued from Example A

In China, county is contained in city (city has larger area)

In US, city is contained in county (county has larger area).

The domain (“County”) of property “region:containedIn” in the China ontology is just the range of the same property “region:containedIn” in the US ontology

The range (“City”) of property “region:containedIn” in the China ontology is just the domain of the same property “region:containedIn” in the US ontology.

We can add “China.” or “US.” before “City” and “County” for smooth data interoperability.

32

F. Swap conflicts (Cont.)

Fig. F1. Detection of swap conflicts (the relationship between city and county in the China ontology)

<owl:Class rdf:ID="County"> <region:containedIn rdf:resource="#City”/></owl:Class>

Fig. F2. Detection of swap conflicts (the relationship between city and county in the US ontology)

<owl:Class rdf:ID="City"> <region:containedIn rdf:resource="#County”/></owl:Class>

33

F. Swap conflicts (Cont.) Case F. Semantic conflicts may exist if the

domain of a property in the first ontology is the range of the same property in the second ontology, and the range of the property in the first ontology is the domain of the same property in the second ontology (Swap).

Conflict Resolution Rule F. If swap conflicts are detected, context restrictions (see Example F) should be added to the schema for smooth data interoperability.

34

Conclusion We extend OWL with several primitives which have

clearer semantics

summarize several cases based on OWL in which semantic conflicts are easily to be encountered

The conflict resolution rules for each case are presented.

In the future, OWL will be frequently used to build ontologies, and this paper provides a computer-aid approach to detect and resolve semantic conflicts for smooth data interoperability.

35

References

1. Robert Neches, Richard Fikes, Timothy W. Finin, Thomas R. Gruber, Ramesh Patil, Ted E. Senator, William R. Swartout: Enabling Technology for Knowledge Sharing. AI Magazine 12(3): pp36-56 (1991)

2. Sean Luke and Jeff Heflin: SHOE Specification 1.01. http://www.cs.umd.edu/projects/plus/SHOE/spec.html

3. Ora Lassila and Ralph R. Swick: Resource description framework (RDF).

http://www.w3c.org/TR/WD-rdf-syntax

4. The SemanticWeb Homepage. http://www.semanticweb.org5. Dan Brickley and R.V. Guha. Resource Description Framework (RDF) Schema Specification 1.0,

W3C Candidate Recommendation 27 March 2000. http://www.w3.org/TR/rdf-schema/6. The DARPA Agent Markup Language Homepage.

http://daml.semanticweb.org/

7. The Ontology Inference Layer OIL Homepage.

http://www.ontoknowledge.org/oil/TR/oil.long.html

8. DAML+OIL Definition. http://www.daml.org/2001/03/daml+oil9. Frank van Harmelen, Jim Hendler, Ian Horrocks, Deborah L. McGuinness, Peter F. Patel-Schneider

and Lynn Andrea Stein. OWL Web Ontology Language Reference. http://www.w3.org/TR/owl-ref/