40
{Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Components of the same challenge? Invited Talk, International Workshop on Ontology Matching collocated with the 5th International Semantic Web Conference ISWC-2006 , November 5, 2006, Athens GA Professor Amit Sheth Special Thanks: Meena Nagarajan Acknowledgment: SemDis project, funded by NSF

{Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Components of the same challenge? Invited Talk, International Workshop on Ontology

  • View
    231

  • Download
    1

Embed Size (px)

Citation preview

Page 1: {Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Components of the same challenge? Invited Talk, International Workshop on Ontology

{Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} ::

Components of the same challenge?

Invited Talk, International Workshop on Ontology Matchingcollocated with the 5th International Semantic Web Conference

ISWC-2006, November 5, 2006, Athens GA

Professor Amit Sheth

Special Thanks: Meena NagarajanAcknowledgment: SemDis project, funded by NSF

Page 2: {Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Components of the same challenge? Invited Talk, International Workshop on Ontology

Semantic Web, some DL-II projects,Semantic Web, some DL-II projects,Semagix SCORE, Applied SemanticsSemagix SCORE, Applied Semantics

VideoAnywhereVideoAnywhereInfoQuiltInfoQuilt

OBSERVEROBSERVER

Generation IIIGeneration III(information

brokering)

1997...1997...

Semantics (Ontology, Context, Relationships, KB)

InfoSleuth, KMed, DL-I projectsInfoSleuth, KMed, DL-I projectsInfoscopes, HERMES, SIMS, Infoscopes, HERMES, SIMS,

Garlic,TSIMMIS,Harvest, RUFUS,...Garlic,TSIMMIS,Harvest, RUFUS,...

Generation IIGeneration II(mediators)

1990s1990s

VisualHarnessVisualHarnessInfoHarnessInfoHarness

Metadata (Domain model)

MermaidMermaidDDTSDDTS

Multibase, MRDSM, ADDS, Multibase, MRDSM, ADDS, IISS, Omnibase, ...IISS, Omnibase, ...

Generation IGeneration I(federated DB/

multidatabases)

1980s1980s

Data (Schema, “semantic data modeling)

Information System needs and Ontology Matching goals

SemDis, ISISSemDis, ISIS

Page 3: {Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Components of the same challenge? Invited Talk, International Workshop on Ontology

Information systems - From mediators to information brokering• Mediators

between heterogeneous information sources– InfoHarness,

VisualHarness, InfoSleuth, SIMS, Garlic etc.

IH Server

Raw Data

IH Clients

ImageText Video

AudioVisualHarness Architecture

End UserWebBrowsers

End UserWebBrowsers

End UserWebBrowsers

Internet

Information Resources

Metadata Database (Metabase)(Oracle)

Repository 1

Repository m

.....

IH administrative tools

Circa 1992-1996.

Page 4: {Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Components of the same challenge? Invited Talk, International Workshop on Ontology

INFORMATION CONSUMERSINFORMATION CONSUMERS

INFORMATION PROVIDERSINFORMATION PROVIDERS

CorporationsUniversities

People

GovernmentPrograms

User Query

UserQuery

UserQuery

InformationSystem

DataRepository

InformationSystem

NewswiresUniversities

CorporationsResearch Labs

INFORMATION BROKERINGINFORMATION BROKERING

Domain SpecificOntologies

Information systems - From mediators to information brokers• Information

brokers– InfoQuilt,

OBSERVER etc.

Circa 1996-2000

Page 5: {Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Components of the same challenge? Invited Talk, International Workshop on Ontology

Need for querying across multiple ontologies

IRM

InterontologiesRelationships

...Repositories

Mappings/Ontology Server

QueryProcessor

...Repositories

Mappings/Ontology Server

Query Processor

...

...Mappings/Ontology Server

Query Processor

UserQuery

Ontologies

OntologiesOntologies

OBSERVER

Circa 1994, 1996-2002

Page 6: {Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Components of the same challenge? Invited Talk, International Workshop on Ontology

Ontology Matching – goals

• Goals of ontology matching (and mapping, or integration) – Shallow analysis to identify dependencies for

integration– Deeper analysis to create mappings for query

based transformations / integration– Integrate schemas to create a global schema– Integrate instance bases

Sheth, Review of a real world experience in database schema integration (Bellcore, ca. 1993)

Page 7: {Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Components of the same challenge? Invited Talk, International Workshop on Ontology

Ontology Matching – changing notions• Given the distributed nature of modeling domains

and metadata, the need for matching advanced to Information Integration

• Now– Query processing not limited to multiple databases or

ontologies, but multiple domains and sources of information

– Exploiting structured, semi-structured and unstructured data sources, multi-model Web sources

Page 8: {Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Components of the same challenge? Invited Talk, International Workshop on Ontology

The process of Ontology Matching• Different for purposes of merging / aligning

ontologies – Type of relationships that suffice to be discovered are

limited to equivalence / inclusion / disjointness / overlap mappings

• Different for purposes of information integration to analytics to discovery– Need for discovering more Complex mappings

• Named relationships / associations• Graph based / numerical mappings

Page 9: {Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Components of the same challenge? Invited Talk, International Workshop on Ontology

Top down and bottom up view to ontology matching• Top Down: schema + instance integration

to provide information integration

Ontologies

Heterogeneous data

Today, the Food and Drug Administration (FDA) is announcing that it has asked Pfizer, Inc. to voluntarily withdraw Bextra from themarket. Pfizer has agreed to suspend sales and marketing of Bextra in the , pending further discussions with the agency.

Semantic metadata

Horizontal Semantic Integration

Vertical S

emantic Integration

IntegrationOntology

ComplexMapping

Relationship

• Top Down: schema + instance integration to provide information integration

Page 10: {Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Components of the same challenge? Invited Talk, International Workshop on Ontology

Top down and bottom up view to ontology matching• Bottom up: exploit external data sources

to drive schema matching

Ontologies

Heterogeneous data

Today, the Food and Drug Administration (FDA) is announcing that it has asked Pfizer, Inc. to voluntarily withdraw Bextra from themarket. Pfizer has agreed to suspend sales and marketing of Bextra in the , pending further discussions with the agency.

Semantic metadata

Horizontal Semantic Integration

Vertical S

emantic Integration

IntegrationOntology

ComplexMapping

Relationship

Page 11: {Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Components of the same challenge? Invited Talk, International Workshop on Ontology

A step backDB vs. Ontology - Fundamental

differences

Page 12: {Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Components of the same challenge? Invited Talk, International Workshop on Ontology

Schema integration goals – DB vs. Ontology• DB schema integration goal

– “Defining an integrated view of the data for all applications using the data.”

• Ontology schema integration goal– “Defining an agreement between multiple

ontology schemas modeled for the same domain.”

Page 13: {Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Components of the same challenge? Invited Talk, International Workshop on Ontology

Goals are different because of differences in:• The modeling paradigms

– A database schema is a model for the data that one more applications intend to use.

– An ontology is a model of knowledge for a bounded region of interest (also known as a domain)

• Data vs. Knowledge : A DB instance base is not the same as an ontology instance base– A database models data to be used by one or more

applications– An ontology models knowledge about a domain,

independent of the application

Page 14: {Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Components of the same challenge? Invited Talk, International Workshop on Ontology

Modeling Database vs. Ontology schemas -

Fundamental differencesAxis of

comparisonDatabase schemas

Ontology schemas

Modeling perspective

Intended to model data being used by one or more applications

Intended to model a domain

Structure vs. Semantics

Emphasis while modeling is on structure of the tables

Emphasis while modeling is on the semantics of the domain – emphasis on relationships, also facts/knowledge/ground truth

Page 15: {Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Components of the same challenge? Invited Talk, International Workshop on Ontology

Agreement Limited to a syntactic agreement between applications using the data

Symbolizes agreement of the modeling of a domain possibly used by applications in varying contexts.

Instance metadata modeling /

expressiveness

Limited expressivity in capturing instance level metadata due to static schemas

More expressive modeling paradigm

Context of modeling

Well defined by applications using the data

Modeling of a domain irrespective of applications

Choice of modeling affects the possible space of heterogeneities and

therefore the process of matching.

In both cases however, the schema is only an abstraction of the real world;

the real power/semantics lies at the instance level.

Page 16: {Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Components of the same challenge? Invited Talk, International Workshop on Ontology

The space of heterogeneities in DB schema integration• Conflicts/Heterogeneities in DB schema

integration– Model / representation : relational vs. network vs.

hierarchical models– Structural / schematic :

• Domain Incompatibilities• Entity Definition Incompatibilities• Data Value Incompatibilities• Abstraction level Incompatibilities

• Largely syntactic and structural; relatively few semantic conflicts

Sheth/Kashyap 1992, Kim/Seo 1993, Kashyap/Sheth 1996)

Page 17: {Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Components of the same challenge? Invited Talk, International Workshop on Ontology

• Conflicts/Heterogeneities in ontology schema integration – Significant conflicts in perception of a domain – semantic

conflicts– Other heterogeneities are similar to those in the DB

world• Model / representation : OWL/RDF ; topic maps etc.• Structural : modeling as an entity vs. an attribute/property;

generalization vs. abstraction etc.

• Largely semantic conflicts; comparable syntactic conflicts

The space of heterogeneities in ontology schema integration

Page 18: {Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Components of the same challenge? Invited Talk, International Workshop on Ontology

Key Observations

• There are significant philosophical differences in how a DB schema and an Ontology schema are modeled

• In spite of these distinctions, many schema matching techniques overlap significantly.

• Have we advanced the state of art in ontology schema matching?

Page 19: {Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Components of the same challenge? Invited Talk, International Workshop on Ontology

Schema Integration – DB vs. Ontology

Have we advanced the state of art ?

Page 20: {Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Components of the same challenge? Invited Talk, International Workshop on Ontology

Schema Integration – techniques usedSchema matching techniques Information exploited

DB Ontology

• Syntactic– Linguistic: Matching

names, descriptions, namespaces etc.

– Constraint-based: Constraint matches on data types, value ranges, uniqueness, cardinalities etc.

• Matching Table and column level names and constraints

• Matching class, properties/ relationship, attribute level names and constraints

Schema level

Page 21: {Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Components of the same challenge? Invited Talk, International Workshop on Ontology

Schema Integration – techniques usedSchema matching techniques Information exploited

• Structural– Constraint-based: Tree /

Graph structure matching

• Matching structures of relational tables

• Matching class hierarchies and structures

DB OntologySchema level

Page 22: {Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Components of the same challenge? Invited Talk, International Workshop on Ontology

Schema Integration – techniques usedSchema matching techniques Information exploited

• Linguistic– IR techniques, word frequencies, key terms, combination of

key terms etc.

• Constraint based– Numerical value patterns, ranges useful for recognizing

phone numbers etc.

DB OntologyInstance level

• Hybrid approaches use a combination of all techniques

Page 23: {Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Components of the same challenge? Invited Talk, International Workshop on Ontology

Discovered semantic relationships• State of the art – in DBs and Ontologies

– Relationships with set semantics: overlap / disjointness / exclusion / equivalence / subsumption

– Their logical encodings are what they mean

• Of more interest is discovering arbitrary named relationships– Relationships such as works_for or causes have “real-world”

semantics. Their encoding in first order logic lacks semantic grounding.

• Matching and mapping closely tied. Ability to capture complex mapping (e.g., semantic proximity) puts significantly different demand on matching

Page 24: {Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Components of the same challenge? Invited Talk, International Workshop on Ontology

Key Observation• DB and Ontology schema matching techniques overlap

significantly– Not much advancement since DB schema integration

efforts

• Ontologies formalize the semantics of a domain, but matching is still primarily syntactic / structural.– The semantics of ‘named relationships’ is largely

unexploited

• The real semantics lies in the relationships connecting entities– Modeled as first class objects in Ontologies– In DB, they are not explicit and have to be inferred

Page 25: {Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Components of the same challenge? Invited Talk, International Workshop on Ontology

(Complex) named relationships and Ontology Matching

Page 26: {Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Components of the same challenge? Invited Talk, International Workshop on Ontology

VOLCANO

LOCATIONASH RAIN

PYROCLASTICFLOW

ENVIRON.

LOCATION

PEOPLE

WEATHER

PLANT

BUILDING

DESTROYS

COOLS TEMP

DESTROYS

KILLS

(Complex) named relationships - example

Page 27: {Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Components of the same challenge? Invited Talk, International Workshop on Ontology

Discovering such (complex) named relationships• Matching techniques have exhausted

Schema + Instance properties

• Ontology modeling de couples schema + instance base– Tremendous opportunity to exploit knowledge

present outside the ontology knowledge base (External structured, semi-structured and unstructured data sources)

Page 28: {Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Components of the same challenge? Invited Talk, International Workshop on Ontology

Knowledge discovery and validation

PubMedetc.

PubMedetc.

DBs

Prediction of - Pathways- Symptoms of Diseases- Other complex relationship

Rele-vant docs

Rele-vant docs

Query and update

Page 29: {Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Components of the same challenge? Invited Talk, International Workshop on Ontology

A Vision for Ontology Matching : Discovering simple to

complex matches – from schema, instances and corpus

SIM

PL

E T

O C

OM

PL

EX

MA

TC

HE

S

Ontologies

Heterogeneous data

Today, the Food and Drug Administration (FDA) is announcing that it has asked Pfizer, Inc. to voluntarily withdraw Bextra from themarket . Pfizer has agreed to suspend sales and marketing of Bextra in the , pending further discussions with the agency .

Semantic metadata

Possible identifiable matches: equivalence / inclusion / overlap / disjointness

Possible to identify more complex relationships fromthe corpus.

Page 30: {Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Components of the same challenge? Invited Talk, International Workshop on Ontology

Corpus based schema matching

Page 31: {Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Components of the same challenge? Invited Talk, International Workshop on Ontology

The Intuition

9284 documents 4733

documents

Disease or Syndrome

Biologically active substance

Lipid

affects

causes

affects

causes

complicates

Fish Oils Raynaud’s Disease???????

instance_of instance_of

5 documents

UMLS

MeSH

PubMed

Page 32: {Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Components of the same challenge? Invited Talk, International Workshop on Ontology

ModifiersModified entitiesComposite Entities

The Method – Identify entities and Relationships in Parse Tree

Page 33: {Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Components of the same challenge? Invited Talk, International Workshop on Ontology

Key Observation

• What is interesting is not the entity “estrogen” or “endometrium”

• The real knowledge lies in the complex and modified entities “an excessive endogeneous stimulation by estrogen”

Current KR frameworks do not model this. Capturing this might affect the way we

think of matching and mapping.

Page 34: {Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Components of the same challenge? Invited Talk, International Workshop on Ontology

Converting candidate relationships to ontology matches• Linguistic and statistical challenges:

– Variations of entities, relationships and associations

• Translating instance level findings to the schema level– GOING FROM several discovered relationships

like “Deficiency in migraine causes Migraine” TO “substance X causes condition Y”

Page 35: {Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Components of the same challenge? Invited Talk, International Workshop on Ontology

Discovery vs. Validation of relationships – two sides of the coin• Discovering complex relationships from

text is a hard problem– Natural Language challenges (not all sentences

are well formed)

• Validating complex relationships / hypothesis is relatively simpler

Page 36: {Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Components of the same challenge? Invited Talk, International Workshop on Ontology

Corpus based Hypothesis validation

isaMagnesiumMigraine

Stress

Calcium Channel Blockers

Patient

affectedBy

inhibit

PubMed

Complex Query

SupportingDocument setsretrieved

Does magnesium alleviate effects of migraine in patients?One possible hypothesized connection

between magnesium and migraine….

Page 37: {Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Components of the same challenge? Invited Talk, International Workshop on Ontology

From matching to mappings – several challenges• Mappings are not always simple

mathematical / string transformations

• Examples of complex mappings– Associations / paths between classes– Graph based / form fitting functions

E1:Reviewer

E6:Person

E5:Person

E2:Paper

E4:Paper

E7:Submission

E3:Person

author_ofauthor_of

author_of

author_of

author_of

knows

knows

Number of earthquakes with magnitude > 7 almost constant. So if at all, then nuclear tests only cause earthquakes with magnitude < 7

Page 38: {Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Components of the same challenge? Invited Talk, International Workshop on Ontology

The take home message

Page 39: {Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Components of the same challenge? Invited Talk, International Workshop on Ontology

A world beyond simple matches and mappings• The distinction between schema and instances is

slowly disappearing

• Integrating new and external data sources, mining and analyzing them is gaining importance.

• Tremendous opportunities and challenges in using more information than what is modeled in a schema and captured in an instance base.

Need to go beyond well-mannered schemas and knowledge representations;

and relatively simpler mappings

Page 40: {Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Components of the same challenge? Invited Talk, International Workshop on Ontology

For more information

LSDIS Lab: http://lsdis.cs.uga.eduKno.e.sis Center: http://www.knoesis.org