View
216
Download
0
Category
Tags:
Preview:
Citation preview
Semantic Interoperability: Automatically Resolving Vocabularies
4th Semantic Interoperability Conference February 10, 2006
Chuck Mosher8500 Leesburg Pike
Vienna, VAcmosher@metamatrix.com
2
Interoperable Information Backbone
• Enterprise-wide data abstraction layer for applications• Integrated views of data from multiple sources
– Relational databases, applications, files
• Re-useable Data Services for data consistency• Metadata-driven data management and integration• Complements other data integration tools (ETL, EAI, quality, etc.)
MetaMatrix
Enterprise Data Service LayerApplications
Data Sources
3
Data Services
• A type of Web Service• Does all of the work to transform any data in
any format to a W3C compliant service– Implements all of the logic to effect the
transformation– Provides access to data sources, regardless of
source API, technology
• Does not implement application logic• Decouples the data from the application
while making the data discoverable and accessible
4
Custom Apps
Web Services,Business Processes
Packaged Apps
Reporting, Analytics
EAI, Data warehouses
xml
databases
warehouses
spreadsheets
services
<sale/> <value/></ sale >
geo-spatial
rich media
…
Enterprise Enterprise Information Information
Sources (EIS)Sources (EIS)
Information Information ConsumersConsumers
Reusable Integrated Reusable Integrated Business ObjectsBusiness Objects
OD
BC
JDB
CS
OA
P
Exposed Exposed Information Information
ServicesServices
<WSDL><WSDL>(contract)
<WSDL><WSDL>(contract)
<WSDL><WSDL>(contract)
Model-Based Approach Maximizes Re-useData Abstraction Without Coding
5
Data
Model
Meta-model
Meta Object Facility (MOF)
6
MetaMatrix MetaBase Modeler• Model disparate
information sources– Relational DBs– Content Management
Systems– Files– Services– Applications
• Uses and retains domain-specific modeling terminology– Relational models
have “Tables”, “Foreign Keys”, “Columns”, etc.
– UML models have “Packages”, “Classes”, “Attributes”, etc.
7
MetaMatrix MetaBase Modeler
• Define reusable data services/ business objects
• Transformations defined with:– Selects– Joins– Criteria– Unions– Functions– User defined
• Perform schema and semantic matching, data type conversion
8
T
Data Sources - Authoritative- Redundant
- Overlapping
Multiple Internal/External Information Sources
Aggregate Data Services:• Relational or XML• Application-specific• Access via ODBC,
JDBC, or SOAP APIs
T T
Virtual XML Document<a>
</a>
<b>
</b>…
TTT
ODBC/JDBC JDBC SOAP
WebServices
WebServices
Portal Applications
Portal Applications
BusinessIntelligence
Applications
BusinessIntelligence
Applications
Enterprise-wide or COI-driven Data Model
• Rationalization and Semantic mediation Layer• Harmonization• Data Catalog/Dictionary
Logical Data Model
Semantic Mediation: The Problem
bldg_id SITENUM Facility_ID
Location_ID
bldg_type Depot_Number
Location_Type
9
J-8 Force Structure
J-7 Operational Plans
J-6 C4CS
TData Sources- Authoritative- Redundant
- Overlapping
Multiple Internal/External Information Sources
T T
ODBC/JDBC JDBC SOAP
WebServices
WebServices
Portal Applications
Portal Applications
BusinessIntelligence
Applications
BusinessIntelligence
Applications
Enterprise-wide or COI-driven Data Models
• Rationalization• Harmonization• Data Catalogs
Building Enterprise Semantic Model(s)
J-5 Plans & Policy
J-4 Logistics (GCSS)
J-3 Operations
J-2 Intelligence
J-1 Manpower / Personnel
10
Biggest Challenge in Creating Data Services?
• Semantics!!!
• Structural differences are straightforward
• Differing definitions among data sources
• Differing vocabularies among COI’s
• Established, emerging, and evolving data standards– C2IEDM, JC3IEDM, GJXDM, NIEM, GFM,
many more
• Not addressed by ETL, EAI, SOA
11
A Previously Intractable Problem
• TWPDES has 1000+ core entities
• NIEM has 100,000+!
• Even a limited program with a dozen data sources could yield 10’s of 1000’s of potential mappings
• Humans cannot address this without help
• Indeed, it has stopped many data integration/reconciliation programs in their tracks.
Automated Semantic Matching
13
DISCLAIMER
• Semantic matching can't really be done automatically yet!
• Requires intelligence to understand the context and semantics.
• So use computers to do most of the work but then have the user confirm or check the result.
14
• Given two symbols, calculate a measure of the relationship between them:
Doesn’t seem so hard…
amount quantity
The Matching Problem
15
ftuqky aqfkyeyr
The Matching Problem
• Given two symbols, calculate a measure of the relationship between them:
This is what a computer “sees.”
16
The Matching Problem
• Even after extracting likely symbols, matching is a difficult problem.
• Symbols alone are not enough to generate good matches: – “ID” -> “SocialSecurityNumber” or “NY”
• The solution relies on context:– “NJ”,”MA”,”CA”,”ID”– “Ego”, “SuperEgo”, “ID”
• MatchIt provides that context
17
MatchIT 1.0
• Integrated component of the MetaMatrix Semantic Data Services product
• Based on ontology-driven semantic knowledge base– Word relationships, dictionaries, lexicons, thesauri
• Plug-in architecture• Standards-compliant:
– OWL– RDF– Inference engines– OSGI– Eclipse– JDBC
18
FBI CBP NYC NY NJ
Data Source Services
Matched (Confidence of 90%)
Gender ID
Person Sex Code
Ontology
“Sex” semantically related to “Gender”
(Semi-)Automated Semantic Mediation
*An extensible semantic knowledge base provides a dictionary and thesaurus like information on “words”, their “meanings”, and their relationships to other words.
*A sophisticated set of matching algorithms provides string similarity matches and semantic matches with confidence ratings and explanations.
19
Matching Techniques
• MatchIT uses two types of matching techniques:– String Matching
• Attempts to determine string similarity based on the lexical distance between them.
– Semantic Matching• Attempts to determine string similarity based on the
ontological distance between them within a semantic ontology.
• Generate Match Sets• Can be run individually or in combinations• Pluggable architecture allows for algorithmic
extendibility
20
String Matching
• What is the lexical distance between two symbols?– “PUZZLE”, “PUZZ”– “ID”,”IDENTIFIER”– “STRONG”,”SONG”
21
Semantic Matching
• How semantically similar are two concepts?
car
motor vehicle
self-propelled vehicle
wheeled vehicle
vehicle
craft
aircraft
heavier-than-air craft
airplanetruck
is a
is a
is a
is a is a
is a
is a
is a
is a
car and truck are very similar
Car and airplane are less similar
22
Semantic Matching Objectives
• Find and rank the potential matches, but let the user review and decide for sure.
• I.e., eliminate 99+% of the things that don't match, and let the user review the <1%.
• Many times, a user can visually scan a small list of the top 1% and very quickly agree or disagree with the results.
• Favor false positives over false negatives.
23
Semantic Matching in MetaMatrix
Ontologies[OWL/RDF]
Relational
XML
XML
XML
XMLDomain[UML/ER]
MetaBase Modeler
Custom
AnySource
XML
File System
JDBC
RDBMS
Instance-levelMatch
Instance-levelMatch
Schema-levelMatch
Schema-levelMatch
MatchIt Ontology
Semantic Knowledge Base
MetaMatrix Connector Framework
MetaMatrix Importer Framework
Models & Files[versioned]
Models & Files[versioned]
Search Index
Search Index
Web Reporting
Web Reporting
MetaBase Repository
Data Harmonization Complete
MetadataAccess
Data/ContentAccess
Ontological Semantics Access
Lexicons
Fact
Repository
Onomasticons
Find Matches
•Analyze
•Visualize
•Collaborate
•Transform
Import Export
Conceptual/Logical/Physical Data ModelsEnterprise Information Sources
Representations
Example
25
Overall process
• Import two nontrivial vocabularies– ERwin model of large data warehouse– TWPDES XML schema
• Extract symbols– Schema-specific tokenization algorithms
• Assign semantics to each– Symbols are keys into dictionaries
• Perform semantic matching between them
• Analyze results
26
ERwin Data Warehouse Model
27
TWPDES XML Schema
Mapping Classes for each XML frag
in hierarchy
28
Generated Symbol Dictionary (TWPDES)
29
Generated Symbol Dictionary (ERwin model)
30
Editing the Dictionary
Modify Definition
31
Editing the Semantics
Control Senses
32
Target Model
Match Results
33
Examine Details
34
Match Details
35
Matches Used to Build Mappings
36
From Pat Cassidy & COSMO
Obligation Duty
GenericObligation
SameAs
SameAs
The Integrating Function of the Common Semantic Model –via Domain-level Mapping
37
MatchIt Semantic Matching Tool
• A way to use ontologies in a world where nearly 100% of what already exists is not in an ontology.
• Map connections between ontologies that are being built and artifacts currently in use:– RDBMs schemas– XML and XSD files– Spreadsheet data– More coming, including ontologies!
• Map an imported model to a Vocabulary, and a Vocabulary to an Ontological structure
Thank you
Recommended