oMAP: An Implemented Framework for Automatically Aligning OWL
Ontologies
SWAP, December, 2005
Raphaël Troncy, Umberto StracciaISTI-CNR
2
Outline• Motivations
• oMAP• A formal framework• The different classifiers used
• Evaluation
• Conclusion
3
Motivations• Heterogeneity of information systems
• Ontologies as a solution to data heterogeneity on the Web
• Ontologies are themselves heterogeneous:• knowledge representation language• degree of formalization
• Semantic Web• More and more OWL/RDF ontologies on the Web• Need for comparing/reusing/merging ontologies
• partially covering the same domain• different version of the same ontology
4
Motivations (cont.)
• Distributed Information Retrieval• Resource selection: The agent has to select a subset of some
relevant resources
• Query reformulation: For every selected resource, the agent has to re-formulate its information need accordingly
• Data fusion & rank aggregation: The results from the selected resources have finally to be merged together.
5
Aligning Ontologies• A matching operator:
• Input: a set of discrete entities (tables, XML elements, classes, properties…)
• Output:• relationship holding between the entities (subsumption,
equivalence, disjointness…)• a confidence measure
• Automatic vs manual techniques• Numerous work from various communities
• schema matching, machine learning, data integration
[ ]1..0∈v
7
oMAP: A Formal Framework• Inspirations:
• Formal work in data exchange [Fagin et al., 2003]
• GLUE: combining several specialized components for finding the best set of mappings [Doan et al., 2003]
• Notations:• A mapping is a tuple: M = (T, S, ∑)
• S et T are the source and target ontologies
• Si is an OWL entity (class, datatype property, object property) of the ontology
• ∑ is a set of mapping rules: αij Tj ← Si
8
oMAP: Overall Strategy• A three step process:
• Form possible ∑ sets and estimate its quality based on the quality measures for its mapping rules
• For each mapping rule Tj ← Si, estimate its confidence αij which also depends on the ∑ it belongs to
• Use heuristics to build iteratively the final set of mappings
9
oMAP: Combining Classifiers• Weight of a mapping rule:
• αij = w (Si,Tj, ∑)
• Using different classifiers:• w (Si,Tj,CLk) is the classifier's approximation of the
rule Tj ← Si
• Combining the approximations:• Use of a priority list: CL1 CL2 … CLn
p p p
10
Terminological Classifiers
• Same entity names (or URI)
• Same entity name stems
⎩⎨⎧
=otherwise 0
name, same have , if 1),,(
ji
Nji
TSCLTSw
⎩⎨⎧
=otherwise 0
stem, same have , if 1),,(
ji
Sji
TSCLTSw
11
Terminological Classifiers• String distance name
• WordNet distance name
• lcs is the longest common substring between Si and Tj
• sim =
))(),(max(
),(),,(
ji
jinLevenshteiLDji TlengthSlength
TSdistCLTSw =
⎪⎩
⎪⎨
⎧
⎟⎟⎠
⎞⎜⎜⎝
⎛
+
=otherwise
)()(
*2,max
synonyms, are , if 1
),,(
ji
ji
WNji
TlengthSlength
lcssim
TS
CLTSw
UI
)()(
)()(
ji
ji
TsynonymSsynonym
TsynonymSsynonym
12
Machine Learning-Based Classifiers
• Collecting individuals:• label for the named individuals• data value for the datatype properties• type for the anonymous individuals and the
range of object properties
• Recursion on the OWL definition:• depth parameter
13
Machine Learning-Based Classifiers
• ExampleIndividual (x1 type (Conference)
value (label "Int. Conf. on WISE") value (location x2) )
Individual (x2 type (Address)
value (city "New York city") value (country "USA") )
u1 = ("Int. Conf. on WISE", "Address")
u2 = ("Address", "New York City", "USA")
• Naïve Bayes text classifier∑ ∏∈ ∈
⋅=jTux um
iiNBji SmSCLTSw),(
)Pr()Pr(),,(
14
Structural and Semantics-Based Classifier
• If Si and Tj are property names:
• If Si and Tj are concept names1:
⎪⎩
⎪⎨⎧
Σ
Σ∉←=Σ
otherwise ),,('
if 0),,(
ji
ij
ji TSw
STTSw
⎪⎪⎪
⎩
⎪⎪⎪
⎨
⎧
⎟⎟⎠
⎞⎜⎜⎝
⎛⎟⎟⎠
⎞⎜⎜⎝
⎛Σ+Σ⋅
+
Σ∈←=Σ
Σ∉←
=Σ
∑∈
otherwise ),,(max),,('1)Set(
1
and 0D if ),,('
if 0
),,(
),(
t
setDjCijisetji
ijji
ij
ji
DCwTSw
STTSw
ST
TSw
1 Where D = D(Si) * D(Tj) ; D(Si) represents the set of concepts directly parent of Si
15
Structural and Semantics-Based Classifier
• Let CS=(QR.C) and DT=(Q’R’.D), then1:
• Let CS=(op C1…Cm) and DT=(op’ D1…Dm), then2:
),,(),',()',(),,( Σ⋅Σ⋅=Σ DCwRRwQQwDCw QTS
),min(
),,(max
)',(),,(),(
nm
DCw
opopwDCwsetDjCi
jiset
opTS
⎟⎟⎠
⎞⎜⎜⎝
⎛Σ
⋅=Σ∑
∈
1 Where Q,Q’ are quantifiers, R,R’ are property names and C,D concept expressions2 Where op, op’ are concept constructors and n,m ≥ 1
16
Structural and Semantics-Based Classifier
• Possible values for wop and wQ weights
wop wQ
⊓ ⊔ ¬
⊓ 1 1/4 0
⊔ 1 0
¬ 1
1 1/4
1
n
n
m
1 1/3
m
1
17
Evaluation• More and more techniques / tools for aligning
ontologies• difficult to compare all the approaches theoretically• pragmatism: evaluation campaign and contest
• I3CON : based on the NIST Text Retrieval Conference model • EON : systematic benchmark tests on all OWL constructs• OAEI : http://oaei.inrialpes.fr
• Alignment API [Euzenat, ISWC 2004]
• common format for representing / exchanging the alignments found
• tools and metrics for evaluating these alignments
18
• 3 series of tests on bibliographic ontologies:• simple tests: identity, specialization/generalization of the
language• systematic tests: some features of the initial ontology
are progressively discarded• complex tests: aligning 4 real ontologies available on
the Web
• The directory real world case consists of aligning web sites directory using the large dataset
20
Conclusion• oMAP: a formal framework for aligning
automatically OWL ontologies• Combining several specific classifiers
• terminological classifiers• machine learning-based classifiers• structural and semantics-based classifier
• Empirical evaluation on benchmark tests• using traditional information retrieval metrics• machine resources, memory, computation time…
not yet considered
21
Future Work• Alignment:
• Using additional classifiers:• kNN, KL-distance, WordNet or other terminological
resources• straightforward theoretically but practically difficult
• Finding complex alignment• name = firstName + lastName
• Distributed Information Retrieval• Automated relevant resource selection
22
Useful Links• oMAP: http://homepages.cwi.nl/~troncy/oMAP/
• Tutorial: Schema and Ontology Matching @ ESWC http://dit.unitn.it/~accord/Presentations/ESWC'05-MatchingHandOuts.pdf
• Alignment API: http://co4.inrialpes.fr/align/align.html
• OAEI: http://oaei.inrialpes.fr/
• State of the Art:• P. Shvaiko and J. Euzenat: A Survey of Shema-based Matching
Approaches. Journal on Data Semantics (JoDS), 2005• KW Consortium: State of the Art on Ontology Alignment. Knowledge
Web D2.2.3, 2004