Semantic Interoperability: Automatically Resolving Vocabularies 4 th Semantic Interoperability...

Preview:

Citation preview

Semantic Interoperability: Automatically Resolving Vocabularies

4th Semantic Interoperability Conference February 10, 2006

Chuck Mosher8500 Leesburg Pike

Vienna, VAcmosher@metamatrix.com

2

Interoperable Information Backbone

• Enterprise-wide data abstraction layer for applications• Integrated views of data from multiple sources

– Relational databases, applications, files

• Re-useable Data Services for data consistency• Metadata-driven data management and integration• Complements other data integration tools (ETL, EAI, quality, etc.)

MetaMatrix

Enterprise Data Service LayerApplications

Data Sources

3

Data Services

• A type of Web Service• Does all of the work to transform any data in

any format to a W3C compliant service– Implements all of the logic to effect the

transformation– Provides access to data sources, regardless of

source API, technology

• Does not implement application logic• Decouples the data from the application

while making the data discoverable and accessible

4

Custom Apps

Web Services,Business Processes

Packaged Apps

Reporting, Analytics

EAI, Data warehouses

xml

databases

warehouses

spreadsheets

services

<sale/> <value/></ sale >

geo-spatial

rich media

Enterprise Enterprise Information Information

Sources (EIS)Sources (EIS)

Information Information ConsumersConsumers

Reusable Integrated Reusable Integrated Business ObjectsBusiness Objects

OD

BC

JDB

CS

OA

P

Exposed Exposed Information Information

ServicesServices

<WSDL><WSDL>(contract)

<WSDL><WSDL>(contract)

<WSDL><WSDL>(contract)

Model-Based Approach Maximizes Re-useData Abstraction Without Coding

5

Data

Model

Meta-model

Meta Object Facility (MOF)

6

MetaMatrix MetaBase Modeler• Model disparate

information sources– Relational DBs– Content Management

Systems– Files– Services– Applications

• Uses and retains domain-specific modeling terminology– Relational models

have “Tables”, “Foreign Keys”, “Columns”, etc.

– UML models have “Packages”, “Classes”, “Attributes”, etc.

7

MetaMatrix MetaBase Modeler

• Define reusable data services/ business objects

• Transformations defined with:– Selects– Joins– Criteria– Unions– Functions– User defined

• Perform schema and semantic matching, data type conversion

8

T

Data Sources - Authoritative- Redundant

- Overlapping

Multiple Internal/External Information Sources

Aggregate Data Services:• Relational or XML• Application-specific• Access via ODBC,

JDBC, or SOAP APIs

T T

Virtual XML Document<a>

</a>

<b>

</b>…

TTT

ODBC/JDBC JDBC SOAP

WebServices

WebServices

Portal Applications

Portal Applications

BusinessIntelligence

Applications

BusinessIntelligence

Applications

Enterprise-wide or COI-driven Data Model

• Rationalization and Semantic mediation Layer• Harmonization• Data Catalog/Dictionary

Logical Data Model

Semantic Mediation: The Problem

bldg_id SITENUM Facility_ID

Location_ID

bldg_type Depot_Number

Location_Type

9

J-8 Force Structure

J-7 Operational Plans

J-6 C4CS

TData Sources- Authoritative- Redundant

- Overlapping

Multiple Internal/External Information Sources

T T

ODBC/JDBC JDBC SOAP

WebServices

WebServices

Portal Applications

Portal Applications

BusinessIntelligence

Applications

BusinessIntelligence

Applications

Enterprise-wide or COI-driven Data Models

• Rationalization• Harmonization• Data Catalogs

Building Enterprise Semantic Model(s)

J-5 Plans & Policy

J-4 Logistics (GCSS)

J-3 Operations

J-2 Intelligence

J-1 Manpower / Personnel

10

Biggest Challenge in Creating Data Services?

• Semantics!!!

• Structural differences are straightforward

• Differing definitions among data sources

• Differing vocabularies among COI’s

• Established, emerging, and evolving data standards– C2IEDM, JC3IEDM, GJXDM, NIEM, GFM,

many more

• Not addressed by ETL, EAI, SOA

11

A Previously Intractable Problem

• TWPDES has 1000+ core entities

• NIEM has 100,000+!

• Even a limited program with a dozen data sources could yield 10’s of 1000’s of potential mappings

• Humans cannot address this without help

• Indeed, it has stopped many data integration/reconciliation programs in their tracks.

Automated Semantic Matching

13

DISCLAIMER

• Semantic matching can't really be done automatically yet!

• Requires intelligence to understand the context and semantics.

• So use computers to do most of the work but then have the user confirm or check the result.

14

• Given two symbols, calculate a measure of the relationship between them:

Doesn’t seem so hard…

amount quantity

The Matching Problem

15

ftuqky aqfkyeyr

The Matching Problem

• Given two symbols, calculate a measure of the relationship between them:

This is what a computer “sees.”

16

The Matching Problem

• Even after extracting likely symbols, matching is a difficult problem.

• Symbols alone are not enough to generate good matches: – “ID” -> “SocialSecurityNumber” or “NY”

• The solution relies on context:– “NJ”,”MA”,”CA”,”ID”– “Ego”, “SuperEgo”, “ID”

• MatchIt provides that context

17

MatchIT 1.0

• Integrated component of the MetaMatrix Semantic Data Services product

• Based on ontology-driven semantic knowledge base– Word relationships, dictionaries, lexicons, thesauri

• Plug-in architecture• Standards-compliant:

– OWL– RDF– Inference engines– OSGI– Eclipse– JDBC

18

FBI CBP NYC NY NJ

Data Source Services

Matched (Confidence of 90%)

Gender ID

Person Sex Code

Ontology

“Sex” semantically related to “Gender”

(Semi-)Automated Semantic Mediation

*An extensible semantic knowledge base provides a dictionary and thesaurus like information on “words”, their “meanings”, and their relationships to other words.

*A sophisticated set of matching algorithms provides string similarity matches and semantic matches with confidence ratings and explanations.

19

Matching Techniques

• MatchIT uses two types of matching techniques:– String Matching

• Attempts to determine string similarity based on the lexical distance between them.

– Semantic Matching• Attempts to determine string similarity based on the

ontological distance between them within a semantic ontology.

• Generate Match Sets• Can be run individually or in combinations• Pluggable architecture allows for algorithmic

extendibility

20

String Matching

• What is the lexical distance between two symbols?– “PUZZLE”, “PUZZ”– “ID”,”IDENTIFIER”– “STRONG”,”SONG”

21

Semantic Matching

• How semantically similar are two concepts?

car

motor vehicle

self-propelled vehicle

wheeled vehicle

vehicle

craft

aircraft

heavier-than-air craft

airplanetruck

is a

is a

is a

is a is a

is a

is a

is a

is a

car and truck are very similar

Car and airplane are less similar

22

Semantic Matching Objectives

• Find and rank the potential matches, but let the user review and decide for sure.

• I.e., eliminate 99+% of the things that don't match, and let the user review the <1%.

• Many times, a user can visually scan a small list of the top 1% and very quickly agree or disagree with the results.

• Favor false positives over false negatives.

23

Semantic Matching in MetaMatrix

Ontologies[OWL/RDF]

Relational

XML

XML

XML

XMLDomain[UML/ER]

MetaBase Modeler

Custom

AnySource

XML

File System

JDBC

RDBMS

Instance-levelMatch

Instance-levelMatch

Schema-levelMatch

Schema-levelMatch

MatchIt Ontology

Semantic Knowledge Base

MetaMatrix Connector Framework

MetaMatrix Importer Framework

Models & Files[versioned]

Models & Files[versioned]

Search Index

Search Index

Web Reporting

Web Reporting

MetaBase Repository

Data Harmonization Complete

MetadataAccess

Data/ContentAccess

Ontological Semantics Access

Lexicons

Fact

Repository

Onomasticons

Find Matches

•Analyze

•Visualize

•Collaborate

•Transform

Import Export

Conceptual/Logical/Physical Data ModelsEnterprise Information Sources

Representations

Example

25

Overall process

• Import two nontrivial vocabularies– ERwin model of large data warehouse– TWPDES XML schema

• Extract symbols– Schema-specific tokenization algorithms

• Assign semantics to each– Symbols are keys into dictionaries

• Perform semantic matching between them

• Analyze results

26

ERwin Data Warehouse Model

27

TWPDES XML Schema

Mapping Classes for each XML frag

in hierarchy

28

Generated Symbol Dictionary (TWPDES)

29

Generated Symbol Dictionary (ERwin model)

30

Editing the Dictionary

Modify Definition

31

Editing the Semantics

Control Senses

32

Target Model

Match Results

33

Examine Details

34

Match Details

35

Matches Used to Build Mappings

36

From Pat Cassidy & COSMO

Obligation Duty

GenericObligation

SameAs

SameAs

The Integrating Function of the Common Semantic Model –via Domain-level Mapping

37

MatchIt Semantic Matching Tool

• A way to use ontologies in a world where nearly 100% of what already exists is not in an ontology.

• Map connections between ontologies that are being built and artifacts currently in use:– RDBMs schemas– XML and XSD files– Spreadsheet data– More coming, including ontologies!

• Map an imported model to a Vocabulary, and a Vocabulary to an Ontological structure

Thank you

Recommended