20
Pay-as-you-go Reconciliation in Schema Matching Networks Nguyen Quoc Viet Hung 1 , Nguyen Thanh Tam 1 , Zoltán Miklós 2 , Karl Aberer 1 , Avigdor Gal 3 , and Matthias Weidlich 4 1 École Polytechnique Fédérale de Lausanne 2 Université de Rennes 1 3 Technion – Israel Institute of Technology 4 Imperial College London

Pay-as-you-go Reconciliation in Schema Matching Networks

Embed Size (px)

DESCRIPTION

Authros: Nguyen Quoc Viet Hung (1), Nguyen Thanh Tam (1), Zoltán Miklós (2), Karl Aberer (1), Avigdor Gal (3), and Matthias Weidlich (4) 1 École Polytechnique Fédérale de Lausanne 2 Université de Rennes 1 3 Technion – Israel Institute of Technology 4 Imperial College London

Citation preview

Page 1: Pay-as-you-go Reconciliation in Schema Matching Networks

Pay-as-you-go Reconciliation in Schema Matching Networks

Nguyen Quoc Viet Hung1, Nguyen Thanh Tam 1, Zoltán Miklós2, Karl Aberer1, Avigdor Gal3, and Matthias Weidlich4

1 École Polytechnique Fédérale de Lausanne2 Université de Rennes 1

3 Technion – Israel Institute of Technology4 Imperial College London

Page 2: Pay-as-you-go Reconciliation in Schema Matching Networks

ICDE | 2014 2

Schema Matching - Where?

WWW

Cloud

Large enterprises

P2P Networks

Collaborative Systems

Schema matching is the process of establishing correspondences between theattributes of schemas, for the purpose of data integration

Page 3: Pay-as-you-go Reconciliation in Schema Matching Networks

Private PhD Thesis Defense | 12.2013 3

Schema Matching Network

Traditional approach:Mediated schema

Our approach:Schema Matching Network

S1 S2 S3 S2 S3

S1

A network of schemas that are matched against each other

Require consensus on schemaUpdated Frequently

Page 4: Pay-as-you-go Reconciliation in Schema Matching Networks

ICDE | 2014 4

Pay-as-you-go Reconciliation

Reconciliation is the process of asking human user to give feedback on correspondences. Need of reconciliation: automatic techniques use heuristics results are inherently uncertain

s1: EoverI

s2: BBC

s3: DVDizzy

a4: productionDate

a1: releaseDatea3: availabilityDate

a2: screeningDate

c4

c2

c1c3

c5

Attribute names are quite similar automatic matching tools often fail to identify the correct correspondences.

Instantiation

Selective matching

Uncertainty Reduction

Pay‐as‐you‐go reconciliation

Incrementally improve matching quality with minimal user effort

Instantiate a single trusted set of correspondences

Page 5: Pay-as-you-go Reconciliation in Schema Matching Networks

ICDE | 2014 5

System Overview

General approach: 1. Develop a probabilistic matching network (pSMN)  can measure the overall 

uncertainty of the network2. Reduce network uncertainty: guide user feedback with minimal effort3. Instantiate a selective matching: maintain a good set of attribute correspondences 

to make the system available at any time

Page 6: Pay-as-you-go Reconciliation in Schema Matching Networks

ICDE | 2014 6

Outline

Probabilistic Schema Matching Network (pSMN): Model Computation

Uncertainty Reduction Instantiation of the selective matching Experimental results Conclusion and future work

Page 7: Pay-as-you-go Reconciliation in Schema Matching Networks

ICDE | 2014 7

pSMN - Modeling Schema matching network is modeled as a quadruple N , , Γ, ,

– set of schemas  ‐ interaction graph: represents the connections in the networks. – set of attribute correspondences Γ – set of integrity constraints

An integrity constraint is the formulation of natural properties 1‐1 constraint Cycle constraint (transitivity) Etc.

p – a set of probabilities. Each probability  is associated with a correspondence  ∈ .

Page 8: Pay-as-you-go Reconciliation in Schema Matching Networks

ICDE | 2014 8

pSMN - Computing Probability of a correspondence

Semantics: indicate the correctness of these correspondences Source: integrity constraints and user input. Idea: a correspondence that involves 

many violations has a high chance of being problematic. Computation:

Step 1: construct all possible matching instances Ω I , … , I . Matching instance is a maximal set of correspondences satisfying all integrity constraints and user input.

Step 2: compute by the formula:# #

(i.e.  ∈ : ∈ )

Challenge: probability computation has a high complexity We use non‐uniform sampling and a view‐maintenance technique to approximate the probability efficiently.

Network Uncertainty: quantify the uncertainty of pSMN based on entropy:

log 1 log 1∈

Page 9: Pay-as-you-go Reconciliation in Schema Matching Networks

ICDE | 2014 9

Outline

Probabilistic Schema Matching Network (pSMN): Model Computation

Uncertainty Reduction Instantiation of the selective matching Experimental results Conclusion and future work

Page 10: Pay-as-you-go Reconciliation in Schema Matching Networks

ICDE | 2014 10

Reduce Network Uncertainty

Goal: guide user to give feedback with minimal user effort

Problem (UNCERTAINTY MINIMIZATION WITH LIMITED EFFORT BUDGET). Given a probabilistic matching network ⟨ , , , Γ, ⟩ and a budget of user effort  , find a set of correspondences  ⊆ with  , such that  , is minimal.

Page 11: Pay-as-you-go Reconciliation in Schema Matching Networks

ICDE | 2014 11

Approach – Use heuristic ordering

Idea:  feed users the correspondences with highest information‐gain first. Information gain: the uncertainty reduction before and after validation:

|:expected network uncertainty when knowing the true value of c

Two possible solutions: {c1,c2,c3} and {c1,c4,c5}. Ask c1 first  the network is unchanged  no uncertainty reduction.

Ask c2 first  only 1 solution left the network becomes certain.

SA

SB

SC

c1 c2

c3

c4

c5

SA

SB

SC

c1 c2

c3

c4

c5

SA

SB

SC

c1 c2

c3

Page 12: Pay-as-you-go Reconciliation in Schema Matching Networks

ICDE | 2014 12

Instantiate a selective matching

Goal:Maintain a single trusted set of correspondences Goodness measurement of a set of correspondences  ⊆ :

Repair distance: information loss of eliminating some correspondences to guarantee integrity constraint

Δ ∖ Likelihood: represents the collective correctness of correspondences:

∈ Instantiation problem: given a schema matching network, identify a set of 

correspondences  ⊆ with minimal repair distance (w.r.t.  ) and maximal likelihood.

Page 13: Pay-as-you-go Reconciliation in Schema Matching Networks

ICDE | 2014 13

Approach

The instantiation problem is NP‐complete  use heuristic approach Algorithm:

Step 1: Initialization ‐ Pickup a sampled matching instance  with minimal repair distance

Step 2: Optimization – Randomized local search

Repair Distance

Likelih

ood

I0

randomized local search

Iopt

matching instances: satisfy all constraints

non‐sampled instance

sampled instance

sampled + minimal repair distance

minimal repair distance + maximal likelihood

Page 14: Pay-as-you-go Reconciliation in Schema Matching Networks

ICDE | 2014 14

Outline

Probabilistic Schema Matching Network (pSMN): Model Computation

Uncertainty Reduction Instantiation of the selective matching Experimental results Conclusion and future work

Page 15: Pay-as-you-go Reconciliation in Schema Matching Networks

ICDE | 2014 15

Experiment – Dataset and Setting

Datasets: Business Partner: schemas from enterprise systems Purchase Order: purchase order e‐business schemas University Application Form: schemas from Web interfaces of American university 

application forms WebForm: schemas from Web forms of different domains Thalia: schemas describing university courses

Metrics: Precision: measures quality improvement at each user interaction step  , with G 

being the exact match.D ∩ /|D |

User effort: the percentage of feedback steps relative to the size of the matcher output.

/| |

Page 16: Pay-as-you-go Reconciliation in Schema Matching Networks

ICDE | 2014 16

Efficiency of guiding strategy on uncertainty reduction

Goal: compare between guiding vs. non‐guiding strategy on uncertainty reduction Evaluation procedure: 

Increases user effort  Upon each user input, measure the network uncertainty and precision

Interesting finding: heuristic ordering strategy achieves savings of up to 48% user effort compared to random ordering.

Page 17: Pay-as-you-go Reconciliation in Schema Matching Networks

ICDE | 2014 17

Efficiency of guiding strategy on instantiation

Goal: compare between guiding vs. non‐guiding strategy on instantiation Evaluation procedure: 

Increases user effort  Measure the precision and recall of the instantiated matching

Interesting finding: heuristic ordering strategy outperforms the baseline with an average difference of 15% (precision) and 14% (recall).

Page 18: Pay-as-you-go Reconciliation in Schema Matching Networks

ICDE | 2014 18

Conclusions

We introduce the concept of schema matching networks and probabilistic matching networks

We define a model for pay‐as‐you‐go reconciliation on top of matching networks. We propose a guiding technique to reduce network uncertainty and a heuristic 

approach to instantiate a selective matching. Through experiments with real‐world schemas, our guiding strategy outperforms the 

baseline: Saving user effort by up to 48% Increasing precision (15%) and recall (14%)

Page 19: Pay-as-you-go Reconciliation in Schema Matching Networks

ICDE | 2014 19

Future Work

Generalizing pay‐as‐you‐go reconciliation for crowdsourced models: Business process matching

Ontology alignment

Page 20: Pay-as-you-go Reconciliation in Schema Matching Networks

ICDE | 2014 20

THANK YOU

Q&A