Download odp - Combining Ontology Matchers via Anomaly Detection

Folie 1

Combining Ontology Matchersvia Anomaly Detection

Alexander C. Mller and Heiko Paulheim

Motivation

Most high-performing matching systems use multiple matchers

How to combine multiple matchers into a single result?

Common approaches (selection of)average, maximum, minimum matching score

voting

expert modeled weights (0.4m1 + 0.3m2 + 0.3m3)

supervised learning

Proposal:use anomaly detection as an unsupervised aggregation method

Idea

Common definitions anomaly/outlier detection:Outlier or anomaly detection methods are used to that appear to deviate markedly from other members of the same sample", i.e.

that appear to be inconsistent with the remainder of the data"

Rationale:for two ontologies with n and m concepts, there are nxm candidates

the majority are non-matches

the actual matches are a minority (that differ markedly from the rest)

so, we should be able to identify them as outliers

Outlier Detection in a Nutshell

Given a set of instances as feature vectorsoutlier detection assigns an outlier score to each instance

higher outlier scores higher degree of outlierness

Common approachesdistance based

density based

clustering based

model based

Aggregating Matchers via Anomaly Detection

We run a set of base matchers

Each base matcher score becomes a numerical feature

Thus, out feature vectors consist of individual matching scores

Aggregating Matchers via Anomaly Detection

Example from the conference datasetnote: reduced to two dimensions!

COMMAND: Full Pipeline

Run set of element-based matchersfind non-correlated subset

Run set of structure-based matchers on that subset

Collect all results into feature vectors

Perform dimensionality reductionremoving correlated matchers

Principal Component Analysis

Run outlier detection

Perform optional repair step



Run set of element-based matchers (28 different ones)find non-correlated subset

Run set of structure-based matchers (five different ones)
on that subsetCollect all results into feature vectors

Perform dimensionality reductionremoving correlated matchers

Principal Component Analysis

Run outlier detection

Normalize outlier scores

Select mapping candidates

Perform optional repair setp

COMMAND: Results

Good results on biblio benchmark datasetup to 67% F-measure

Median results on conferenceup to 68% F-measure

Difficulties on anatomy datasetonly a subset of matchers could be run for scalability reasons

Discussion and Conclusion

Proof of ConceptAnomaly detection is suitable
for matcher aggregation

non-trivial combination of
matcher scores (PCA, outlier score)

automatic selection of a suitable
subset of matchers

Future workaddress scalability issues

try more anomaly detection
approaches

Combining Ontology Matchersvia Anomaly Detection

Alexander C. Mller and Heiko Paulheim

Klicken Sie, um die Formate des Gliederungstextes zu bearbeitenZweite GliederungsebeneDritte GliederungsebeneVierte GliederungsebeneFnfte GliederungsebeneSechste GliederungsebeneSiebente Gliederungsebene