42
1 Schema Mediation and Query Processing in Peer Data Management Systems Presenter: Jie Zhao Supervisor: Rachel Pottinger Sept. 29, 2006

1 Schema Mediation and Query Processing in Peer Data Management Systems Presenter: Jie Zhao Supervisor: Rachel Pottinger Sept. 29, 2006

Embed Size (px)

Citation preview

Page 1: 1 Schema Mediation and Query Processing in Peer Data Management Systems Presenter: Jie Zhao Supervisor: Rachel Pottinger Sept. 29, 2006

1

Schema Mediation and Query Processing in Peer DataManagement SystemsPresenter: Jie Zhao

Supervisor: Rachel Pottinger

Sept. 29, 2006

Page 2: 1 Schema Mediation and Query Processing in Peer Data Management Systems Presenter: Jie Zhao Supervisor: Rachel Pottinger Sept. 29, 2006

2

Preliminaries Datalog

Q(x) :- Airport(x, Vancouver) Mapping for heterogeneous schemas

Correspondences between two schemas A media for exchanging data, transferring queries,

etc PDMS (Peer Data Management System)

Each peer has a database Peer can leave or join the network voluntarily Mappings between some peers are provided

Code City

SEA Seattle

YVR Vancouver

Airport:

head body

Page 3: 1 Schema Mediation and Query Processing in Peer Data Management Systems Presenter: Jie Zhao Supervisor: Rachel Pottinger Sept. 29, 2006

3

A general query answering case in PDMS

Local Database UBC

Local Database UW

Local Schema UBC Local Schema UW Local Schema UT

Local Database UT

Mapping UBC_UW

Mapping UW_UT

Page 4: 1 Schema Mediation and Query Processing in Peer Data Management Systems Presenter: Jie Zhao Supervisor: Rachel Pottinger Sept. 29, 2006

4

A general query answering case in PDMS

QUW QUT

Query Reformulation

QUBC

Reformulated Results

Local Database UBC

Local Database UW

Local Schema UBC Local Schema UWUser

Local Schema UT

Local Database UT

Mapping UBC_UW

Mapping UW_UT

Query Q over UBCQuery Q’ over UW Query Q” over UT

Reformulated Results

Page 5: 1 Schema Mediation and Query Processing in Peer Data Management Systems Presenter: Jie Zhao Supervisor: Rachel Pottinger Sept. 29, 2006

5

Previous methods can only access in the local schema

Assume relation: conf-paper(title, venue, year, pages)

Local Database UW

Local Database UBC

Local Schema UW Local Schema UBC

Mapping UW_UBC

Assume relation: conf-paper(title, venue, year, URL)

User

Query that a UW user can ask:

q(x) :- conf-paper(t, v, y, x).

He can never ask information about URL !!!

Page 6: 1 Schema Mediation and Query Processing in Peer Data Management Systems Presenter: Jie Zhao Supervisor: Rachel Pottinger Sept. 29, 2006

6

What we’d like to improve… Want to access more information, e.g. url Get rid of the restrictive query format, e.g. loc

al schema only Improve the comprehensibility of the PDMS Reconsider the difficulties and complexity rais

ed by mapping composition Make good use of indirect mapping informatio

n

We have a method for mediated schema creation in PDMS that solves all of these

Page 7: 1 Schema Mediation and Query Processing in Peer Data Management Systems Presenter: Jie Zhao Supervisor: Rachel Pottinger Sept. 29, 2006

7

Challenges

How to create the mediated schema without a centralized authority?

How to result in the same mediated schema wherever mediation starts?

How can an automatically created mediated schema be comprehensible to users?

How can human intervention be minimized? Where to store the mediated schema, and

how to update it?

Page 8: 1 Schema Mediation and Query Processing in Peer Data Management Systems Presenter: Jie Zhao Supervisor: Rachel Pottinger Sept. 29, 2006

8

Related Work Bernstein et al.: a vision to incorporate the database

research into the P2P scenario Piazza project: provides a complete prototype for qu

ery answering in PDMS Fagin et al.: use SO logic as mapping language HePToX: XQuery reformulation Hyperion: uses both data-level and schema-level m

appings to specify the correspondences between acquainted peers

PeerDB: use keywords as the basis for relation matching

Page 9: 1 Schema Mediation and Query Processing in Peer Data Management Systems Presenter: Jie Zhao Supervisor: Rachel Pottinger Sept. 29, 2006

9

Outline

Semantics in Conjunctive Mappings Peer Schema Mediation Updating the mediated schema A Study of Mapping composition Experimental Study

Page 10: 1 Schema Mediation and Query Processing in Peer Data Management Systems Presenter: Jie Zhao Supervisor: Rachel Pottinger Sept. 29, 2006

10

Introducing concept into conjunctive mappings A conjunctive mapping is in the following form:

conf-paper(title,venue,yr) :-

UW.conf-paper(title,venue,yr,pages)

conf-paper(title,venue,yr) :-

UBC.conf-paper(title,venue,yr,URL) IDB name: “conf-paper” Component: each DataLog query above is a compo

nent Subgoal: each relation in the body,

e.g. “UW.conf-paper(title,venue,yr,pages)”

Page 11: 1 Schema Mediation and Query Processing in Peer Data Management Systems Presenter: Jie Zhao Supervisor: Rachel Pottinger Sept. 29, 2006

11

Introducing concept into conjunctive mappings (Cont.) Intuitively, a concept describes the common

object across different schemas Informally, two mappings CM1 and CM2 have the

same concept if: CM1 and CM2 have the same IDB names Q1 and Q2 that are constructed by overlapped

subgoals of CM1 and CM2 are equivalent Subgoals should be compatible

Page 12: 1 Schema Mediation and Query Processing in Peer Data Management Systems Presenter: Jie Zhao Supervisor: Rachel Pottinger Sept. 29, 2006

12

Introducing concept into conjunctive mappings (Cont.)

Mappings that express the same concept: Mapping 1, from UW to UBC:

Paper(title,venue):-UW.paper(title,venue,yr,pages)Paper(title,venue):-UBC.paper(title,venue,author,URL)

Mapping 2, from UBC to UT:Paper(title,author):-UBC.paper(title,venue,author,URL)Paper(title,author):-UT.paper(title,author,area)

Mappings that do not express the same concept: Mapping 1, from A to B

Manager(x, y) :- A.Mgr(x, y)Manager(x, y) :- B.Mgr1(x, y)

Mapping 2, from B to CManager(x) :- B.Mgr1(x, x)Manager(x) :- C.SelfMgr(x)

Mapping Compatible Check before merge

Page 13: 1 Schema Mediation and Query Processing in Peer Data Management Systems Presenter: Jie Zhao Supervisor: Rachel Pottinger Sept. 29, 2006

13

Outline

Semantics in Conjunctive Mappings Peer Schema Mediation Updating the mediated schema A study of Mapping composition Experimental Study

Page 14: 1 Schema Mediation and Query Processing in Peer Data Management Systems Presenter: Jie Zhao Supervisor: Rachel Pottinger Sept. 29, 2006

14

Pottinger’s Schema Mediation Algorithm for DIS

Base of our approach

Local Database UW

Local Database UBC

Local Schema UW Local Schema UBCMapping UW_UBC

Mediated Schema MMapping M_UBC Mapping M_UW

Page 15: 1 Schema Mediation and Query Processing in Peer Data Management Systems Presenter: Jie Zhao Supervisor: Rachel Pottinger Sept. 29, 2006

15

Peer Schema Mediation – How the system works

X

C

B

Start

Peer X:X, MapX_B, MapX_C

Peer B:B, MapX_B, MapB_C, MapB_D

Peer C:MapX_C, MapB_C

t1:X creates Mt1 based on:X, MapX_B, MapX_C

t4:X gets responses from B, CX computes Mt4 containing X, B, C and MapX_B, MapX_C

t5:X broadcasts Mt4 andcorresponding MappingTable

t2: X sends Mt1 to B

t2: X sends Mt1 to C

t3: C checks and updates its local relation information in Mt1 based on C

t3: B checks and updates its local relation information in Mt1 based on B

C confirms or updates Mt1 to X

B confirms or updates Mt1 to X

Mapping with other peers

Mapping with other peers

Page 16: 1 Schema Mediation and Query Processing in Peer Data Management Systems Presenter: Jie Zhao Supervisor: Rachel Pottinger Sept. 29, 2006

16

Schema Mediation Strategy

As explained in previous slide Merging two schemas is based on

MappingTables

Page 17: 1 Schema Mediation and Query Processing in Peer Data Management Systems Presenter: Jie Zhao Supervisor: Rachel Pottinger Sept. 29, 2006

17

MappingTable creation

Purpose: Relate a relation in M for concept with subgoals from mappings Transform unstructured mapping information to structured forms Easy to reconstruct original mapping from the MappingTables Indirect mapping information can easily be represented in Map

pingTable; hard to do by using mappings Example:

Page 18: 1 Schema Mediation and Query Processing in Peer Data Management Systems Presenter: Jie Zhao Supervisor: Rachel Pottinger Sept. 29, 2006

18

Merge Two MappingTables

The MappingTable merging process follows the general principles: Related attributes should be positioned in the

same column Un-related attributes are in different columns Overlapping local relations in the two

MappingTables are how we determine the indirect mapping information

Page 19: 1 Schema Mediation and Query Processing in Peer Data Management Systems Presenter: Jie Zhao Supervisor: Rachel Pottinger Sept. 29, 2006

19

Merge Two MappingTables (Cont.)

M3: result of merging M1 and M2

Page 20: 1 Schema Mediation and Query Processing in Peer Data Management Systems Presenter: Jie Zhao Supervisor: Rachel Pottinger Sept. 29, 2006

20

Compute GLAV Mappings for Each Local Peer

Page 21: 1 Schema Mediation and Query Processing in Peer Data Management Systems Presenter: Jie Zhao Supervisor: Rachel Pottinger Sept. 29, 2006

21

Page 22: 1 Schema Mediation and Query Processing in Peer Data Management Systems Presenter: Jie Zhao Supervisor: Rachel Pottinger Sept. 29, 2006

22

Query Reformulation

Reformulate Queries in both directions Q over E Q’ over M Q’ over M Q over E

Page 23: 1 Schema Mediation and Query Processing in Peer Data Management Systems Presenter: Jie Zhao Supervisor: Rachel Pottinger Sept. 29, 2006

23

Information that each peer maintains in the system set-up phase Each peer stores:

E’s local database schema A list of mappings between E and its acquaintanc

es A current version of mediated schema M MappingTable set corresponds to M GLAV mappings from M to E

Page 24: 1 Schema Mediation and Query Processing in Peer Data Management Systems Presenter: Jie Zhao Supervisor: Rachel Pottinger Sept. 29, 2006

24

Outline

Semantics in Conjunctive Mappings Peer Schema Mediation Updating the mediated schema A study of Mapping composition Experimental Study

Page 25: 1 Schema Mediation and Query Processing in Peer Data Management Systems Presenter: Jie Zhao Supervisor: Rachel Pottinger Sept. 29, 2006

25

Adding a Peer to the Network Some peer builds application over M after system setup phase New peer joins, M will change, how to handle those already-built

applications? Keep transforming info to make old applications still usable

(a) Right after the system setup phase

(b) Sometime later, D joins…

Page 26: 1 Schema Mediation and Query Processing in Peer Data Management Systems Presenter: Jie Zhao Supervisor: Rachel Pottinger Sept. 29, 2006

26

Dropping a Peer from the Network Strategy One: A peer’s leaving the network triggers a schema

mediation process from the very beginning BAD: too much system work assigned for schema mediation only

Strategy Two: Re-do the schema mediation once every assigned period Two ways to know X is leaving:

1. X notifies any other node before departure2. Other peer PINs or communicates with X

BAD: Previously-created mediated schema will be useless Strategy Three:

X leaves without notifying others X’s acquaintance Y will recognize X’s leaving Y compute the new mediated schema BAD:

Y needs to be able to recognize which relation in the MappingTable comes from X

Peers can easily lose connection with others

Page 27: 1 Schema Mediation and Query Processing in Peer Data Management Systems Presenter: Jie Zhao Supervisor: Rachel Pottinger Sept. 29, 2006

27

Dropping a Peer from the Network (Cont.) Strategy Four: X wants to leave:

X calculates a new mediated schema X assigns its acquaintance another acquaintance from its

acquaintance list “Removal” operator: given M and X that is to be removed,

compute the remaining part Removing part: can be relations, attributes in relations• Good because

• All previously constructed applications can still be available

• All peers are still connected• No redundant work will be resulted: won’t start from the

beginning

Page 28: 1 Schema Mediation and Query Processing in Peer Data Management Systems Presenter: Jie Zhao Supervisor: Rachel Pottinger Sept. 29, 2006

28

Information that each peer maintains in the system-steady state Each peer stores the following information:

Local schema Mappings to its acquaintances Current mediated schema, MappingTables, and m

appings to its own schema Previous versions of mediated schema that local

peer has applications built on it, and mappings to the new mediated schema

Page 29: 1 Schema Mediation and Query Processing in Peer Data Management Systems Presenter: Jie Zhao Supervisor: Rachel Pottinger Sept. 29, 2006

29

Outline

Semantics in Conjunctive Mappings Peer Schema Mediation Updating the mediated schema A study of Mapping composition Experimental Study

Page 30: 1 Schema Mediation and Query Processing in Peer Data Management Systems Presenter: Jie Zhao Supervisor: Rachel Pottinger Sept. 29, 2006

30

A study of Mapping Composition MePSys only considers input mappings to be:

Mappings with the same Concept Ignoring such complicated factors as self-join and

self-restrictive components Our approach is transferring the problem of

mapping composition into another: using the mediated schema to relate different schemas

Page 31: 1 Schema Mediation and Query Processing in Peer Data Management Systems Presenter: Jie Zhao Supervisor: Rachel Pottinger Sept. 29, 2006

31

Some facts

[Madhavan and Halevy] The number of composed mappings does not depend on the number of the input mappings

[Madhavan and Halevy] The composition of finite mappings may result in infinite set of composed mappings

[Fagin et al.] The composed mapping of two mappings in first-order logic might not be expressed by first-order logic

Page 32: 1 Schema Mediation and Query Processing in Peer Data Management Systems Presenter: Jie Zhao Supervisor: Rachel Pottinger Sept. 29, 2006

32

Analysis for the Study

We compared Piazza, SO logic algorithm and MePSys Whether Piazza method is expressive or not depends entirely on

whether existential attributes in the second schema are mapped to the third schema

The Second-Order logic Mapping Composition algorithm can handle cases with composed non-identical self-join components However, results are hard to understand

MePSys do not handle patterns with self-restrictive Mappings in such patterns do not support concepts

MePSys has yet to realize the mediation of schemas if mappings contain composed non-identical self-join components

Aside from these two special groups of patterns, using the mediated schema to relate different sources is decidable.

Page 33: 1 Schema Mediation and Query Processing in Peer Data Management Systems Presenter: Jie Zhao Supervisor: Rachel Pottinger Sept. 29, 2006

33

Outline

Semantics in Conjunctive Mappings Peer Schema Mediation Updating the mediated schema A study of Mapping composition Experimental Study

Page 34: 1 Schema Mediation and Query Processing in Peer Data Management Systems Presenter: Jie Zhao Supervisor: Rachel Pottinger Sept. 29, 2006

34

System Settings FreePastry

A P2P network layer, using efficient routing strategy Each node maintains a routing table Keeps track of its immediate neighbors. Provides the functionality of notifying applications of message arri

val, node failures, etc. Emulab

Network emulation testbed Access to different machines to emulate nodes in real network 900M memory with 2992.787 MHz processor

Input schemas and mappings Input schema follows TCP-H standard Avg num of acquaintances per peer Avg num of relations per peer schema Avg num of attributes in a relation

Page 35: 1 Schema Mediation and Query Processing in Peer Data Management Systems Presenter: Jie Zhao Supervisor: Rachel Pottinger Sept. 29, 2006

35

Experiment 1: Schema Mediation in MePSys

Page 36: 1 Schema Mediation and Query Processing in Peer Data Management Systems Presenter: Jie Zhao Supervisor: Rachel Pottinger Sept. 29, 2006

36

Experiment 2: Query Reformulation

For queries with similar size (less than 1k), time can be decidable

Page 37: 1 Schema Mediation and Query Processing in Peer Data Management Systems Presenter: Jie Zhao Supervisor: Rachel Pottinger Sept. 29, 2006

37

Experiment 2: Query Reformulation (Cont.)

In the maximum case, 10 times query reformulation only takes 2% of the total time

Page 38: 1 Schema Mediation and Query Processing in Peer Data Management Systems Presenter: Jie Zhao Supervisor: Rachel Pottinger Sept. 29, 2006

38

Experiment 3: Updating the MediatedSchema

Computing a new mediated schema always takes less than 2% of the total time

Updating almost takes no time

Page 39: 1 Schema Mediation and Query Processing in Peer Data Management Systems Presenter: Jie Zhao Supervisor: Rachel Pottinger Sept. 29, 2006

39

Our contributions MePSys, in which a mediated schema is created dyna

mically and any information in the network can be queried without additional global services

Provide an efficient algorithm PSM to create a mediated schema in PDMS and further create mappings to local sources

Introduce the idea of automatically detecting specific Concepts in mappings

Study on how mapping composition impacts query reformulation with existing approaches

Solve the problem of updating the mediated schema Experiment on the efficiency and scalability of MePSys

Page 40: 1 Schema Mediation and Query Processing in Peer Data Management Systems Presenter: Jie Zhao Supervisor: Rachel Pottinger Sept. 29, 2006

40

Future Work

Explore the semantic issues when a broader range of mappings are considered, i.e., mappings with self-join, mappings with different IDB names, etc

More optimization issues to be considered in the future system

Design better approach to update the mediated schema for local schema evolution

Page 41: 1 Schema Mediation and Query Processing in Peer Data Management Systems Presenter: Jie Zhao Supervisor: Rachel Pottinger Sept. 29, 2006

41

Acknowledgement

Page 42: 1 Schema Mediation and Query Processing in Peer Data Management Systems Presenter: Jie Zhao Supervisor: Rachel Pottinger Sept. 29, 2006

42

Thank you!

Questions?