Folie 1
Combining Ontology Matchersvia Anomaly Detection
Alexander C. Mller and Heiko Paulheim
Motivation
Most high-performing matching systems use multiple matchers
How to combine multiple matchers into a single result?
Common approaches (selection of)average, maximum, minimum matching score
voting
expert modeled weights (0.4m1 + 0.3m2 + 0.3m3)
supervised learning
Proposal:use anomaly detection as an unsupervised aggregation method
Idea
Common definitions anomaly/outlier detection:Outlier or anomaly detection methods are used to that appear to deviate markedly from other members of the same sample", i.e.
that appear to be inconsistent with the remainder of the data"
Rationale:for two ontologies with n and m concepts, there are nxm candidates
the majority are non-matches
the actual matches are a minority (that differ markedly from the rest)
so, we should be able to identify them as outliers
Outlier Detection in a Nutshell
Given a set of instances as feature vectorsoutlier detection assigns an outlier score to each instance
higher outlier scores higher degree of outlierness
Common approachesdistance based
density based
clustering based
model based
Aggregating Matchers via Anomaly Detection
We run a set of base matchers
Each base matcher score becomes a numerical feature
Thus, out feature vectors consist of individual matching scores
Aggregating Matchers via Anomaly Detection
Example from the conference datasetnote: reduced to two dimensions!
COMMAND: Full Pipeline
Run set of element-based matchersfind non-correlated subset
Run set of structure-based matchers on that subset
Collect all results into feature vectors
Perform dimensionality reductionremoving correlated matchers
Principal Component Analysis
Run outlier detection
Perform optional repair step
COMMAND: Full Pipeline
COMMAND: Full Pipeline
Run set of element-based matchers (28 different ones)find non-correlated subset
Run set of structure-based matchers (five different ones)
on that subsetCollect all results into feature vectors
Perform dimensionality reductionremoving correlated matchers
Principal Component Analysis
Run outlier detection
Normalize outlier scores
Select mapping candidates
Perform optional repair setp
COMMAND: Results
Good results on biblio benchmark datasetup to 67% F-measure
Median results on conferenceup to 68% F-measure
Difficulties on anatomy datasetonly a subset of matchers could be run for scalability reasons
Discussion and Conclusion
Proof of ConceptAnomaly detection is suitable
for matcher aggregation
non-trivial combination of
matcher scores (PCA, outlier score)
automatic selection of a suitable
subset of matchers
Future workaddress scalability issues
try more anomaly detection
approaches
Combining Ontology Matchersvia Anomaly Detection
Alexander C. Mller and Heiko Paulheim
Klicken Sie, um die Formate des Gliederungstextes zu bearbeitenZweite GliederungsebeneDritte GliederungsebeneVierte GliederungsebeneFnfte GliederungsebeneSechste GliederungsebeneSiebente Gliederungsebene