Upload
gyles-brendan-hart
View
219
Download
0
Tags:
Embed Size (px)
Citation preview
1
Schema Mediation and Query Processing in Peer DataManagement SystemsPresenter: Jie Zhao
Supervisor: Rachel Pottinger
Sept. 29, 2006
2
Preliminaries Datalog
Q(x) :- Airport(x, Vancouver) Mapping for heterogeneous schemas
Correspondences between two schemas A media for exchanging data, transferring queries,
etc PDMS (Peer Data Management System)
Each peer has a database Peer can leave or join the network voluntarily Mappings between some peers are provided
Code City
SEA Seattle
YVR Vancouver
Airport:
head body
3
A general query answering case in PDMS
Local Database UBC
Local Database UW
Local Schema UBC Local Schema UW Local Schema UT
Local Database UT
Mapping UBC_UW
Mapping UW_UT
4
A general query answering case in PDMS
QUW QUT
Query Reformulation
QUBC
Reformulated Results
Local Database UBC
Local Database UW
Local Schema UBC Local Schema UWUser
Local Schema UT
Local Database UT
Mapping UBC_UW
Mapping UW_UT
Query Q over UBCQuery Q’ over UW Query Q” over UT
Reformulated Results
5
Previous methods can only access in the local schema
Assume relation: conf-paper(title, venue, year, pages)
Local Database UW
Local Database UBC
Local Schema UW Local Schema UBC
Mapping UW_UBC
Assume relation: conf-paper(title, venue, year, URL)
User
Query that a UW user can ask:
q(x) :- conf-paper(t, v, y, x).
He can never ask information about URL !!!
6
What we’d like to improve… Want to access more information, e.g. url Get rid of the restrictive query format, e.g. loc
al schema only Improve the comprehensibility of the PDMS Reconsider the difficulties and complexity rais
ed by mapping composition Make good use of indirect mapping informatio
n
We have a method for mediated schema creation in PDMS that solves all of these
7
Challenges
How to create the mediated schema without a centralized authority?
How to result in the same mediated schema wherever mediation starts?
How can an automatically created mediated schema be comprehensible to users?
How can human intervention be minimized? Where to store the mediated schema, and
how to update it?
8
Related Work Bernstein et al.: a vision to incorporate the database
research into the P2P scenario Piazza project: provides a complete prototype for qu
ery answering in PDMS Fagin et al.: use SO logic as mapping language HePToX: XQuery reformulation Hyperion: uses both data-level and schema-level m
appings to specify the correspondences between acquainted peers
PeerDB: use keywords as the basis for relation matching
9
Outline
Semantics in Conjunctive Mappings Peer Schema Mediation Updating the mediated schema A Study of Mapping composition Experimental Study
10
Introducing concept into conjunctive mappings A conjunctive mapping is in the following form:
conf-paper(title,venue,yr) :-
UW.conf-paper(title,venue,yr,pages)
conf-paper(title,venue,yr) :-
UBC.conf-paper(title,venue,yr,URL) IDB name: “conf-paper” Component: each DataLog query above is a compo
nent Subgoal: each relation in the body,
e.g. “UW.conf-paper(title,venue,yr,pages)”
11
Introducing concept into conjunctive mappings (Cont.) Intuitively, a concept describes the common
object across different schemas Informally, two mappings CM1 and CM2 have the
same concept if: CM1 and CM2 have the same IDB names Q1 and Q2 that are constructed by overlapped
subgoals of CM1 and CM2 are equivalent Subgoals should be compatible
12
Introducing concept into conjunctive mappings (Cont.)
Mappings that express the same concept: Mapping 1, from UW to UBC:
Paper(title,venue):-UW.paper(title,venue,yr,pages)Paper(title,venue):-UBC.paper(title,venue,author,URL)
Mapping 2, from UBC to UT:Paper(title,author):-UBC.paper(title,venue,author,URL)Paper(title,author):-UT.paper(title,author,area)
Mappings that do not express the same concept: Mapping 1, from A to B
Manager(x, y) :- A.Mgr(x, y)Manager(x, y) :- B.Mgr1(x, y)
Mapping 2, from B to CManager(x) :- B.Mgr1(x, x)Manager(x) :- C.SelfMgr(x)
Mapping Compatible Check before merge
13
Outline
Semantics in Conjunctive Mappings Peer Schema Mediation Updating the mediated schema A study of Mapping composition Experimental Study
14
Pottinger’s Schema Mediation Algorithm for DIS
Base of our approach
Local Database UW
Local Database UBC
Local Schema UW Local Schema UBCMapping UW_UBC
Mediated Schema MMapping M_UBC Mapping M_UW
15
Peer Schema Mediation – How the system works
X
C
B
Start
Peer X:X, MapX_B, MapX_C
Peer B:B, MapX_B, MapB_C, MapB_D
Peer C:MapX_C, MapB_C
t1:X creates Mt1 based on:X, MapX_B, MapX_C
t4:X gets responses from B, CX computes Mt4 containing X, B, C and MapX_B, MapX_C
t5:X broadcasts Mt4 andcorresponding MappingTable
t2: X sends Mt1 to B
t2: X sends Mt1 to C
t3: C checks and updates its local relation information in Mt1 based on C
t3: B checks and updates its local relation information in Mt1 based on B
C confirms or updates Mt1 to X
B confirms or updates Mt1 to X
Mapping with other peers
Mapping with other peers
16
Schema Mediation Strategy
As explained in previous slide Merging two schemas is based on
MappingTables
17
MappingTable creation
Purpose: Relate a relation in M for concept with subgoals from mappings Transform unstructured mapping information to structured forms Easy to reconstruct original mapping from the MappingTables Indirect mapping information can easily be represented in Map
pingTable; hard to do by using mappings Example:
18
Merge Two MappingTables
The MappingTable merging process follows the general principles: Related attributes should be positioned in the
same column Un-related attributes are in different columns Overlapping local relations in the two
MappingTables are how we determine the indirect mapping information
19
Merge Two MappingTables (Cont.)
M3: result of merging M1 and M2
20
Compute GLAV Mappings for Each Local Peer
21
22
Query Reformulation
Reformulate Queries in both directions Q over E Q’ over M Q’ over M Q over E
23
Information that each peer maintains in the system set-up phase Each peer stores:
E’s local database schema A list of mappings between E and its acquaintanc
es A current version of mediated schema M MappingTable set corresponds to M GLAV mappings from M to E
24
Outline
Semantics in Conjunctive Mappings Peer Schema Mediation Updating the mediated schema A study of Mapping composition Experimental Study
25
Adding a Peer to the Network Some peer builds application over M after system setup phase New peer joins, M will change, how to handle those already-built
applications? Keep transforming info to make old applications still usable
(a) Right after the system setup phase
(b) Sometime later, D joins…
26
Dropping a Peer from the Network Strategy One: A peer’s leaving the network triggers a schema
mediation process from the very beginning BAD: too much system work assigned for schema mediation only
Strategy Two: Re-do the schema mediation once every assigned period Two ways to know X is leaving:
1. X notifies any other node before departure2. Other peer PINs or communicates with X
BAD: Previously-created mediated schema will be useless Strategy Three:
X leaves without notifying others X’s acquaintance Y will recognize X’s leaving Y compute the new mediated schema BAD:
Y needs to be able to recognize which relation in the MappingTable comes from X
Peers can easily lose connection with others
27
Dropping a Peer from the Network (Cont.) Strategy Four: X wants to leave:
X calculates a new mediated schema X assigns its acquaintance another acquaintance from its
acquaintance list “Removal” operator: given M and X that is to be removed,
compute the remaining part Removing part: can be relations, attributes in relations• Good because
• All previously constructed applications can still be available
• All peers are still connected• No redundant work will be resulted: won’t start from the
beginning
28
Information that each peer maintains in the system-steady state Each peer stores the following information:
Local schema Mappings to its acquaintances Current mediated schema, MappingTables, and m
appings to its own schema Previous versions of mediated schema that local
peer has applications built on it, and mappings to the new mediated schema
29
Outline
Semantics in Conjunctive Mappings Peer Schema Mediation Updating the mediated schema A study of Mapping composition Experimental Study
30
A study of Mapping Composition MePSys only considers input mappings to be:
Mappings with the same Concept Ignoring such complicated factors as self-join and
self-restrictive components Our approach is transferring the problem of
mapping composition into another: using the mediated schema to relate different schemas
31
Some facts
[Madhavan and Halevy] The number of composed mappings does not depend on the number of the input mappings
[Madhavan and Halevy] The composition of finite mappings may result in infinite set of composed mappings
[Fagin et al.] The composed mapping of two mappings in first-order logic might not be expressed by first-order logic
32
Analysis for the Study
We compared Piazza, SO logic algorithm and MePSys Whether Piazza method is expressive or not depends entirely on
whether existential attributes in the second schema are mapped to the third schema
The Second-Order logic Mapping Composition algorithm can handle cases with composed non-identical self-join components However, results are hard to understand
MePSys do not handle patterns with self-restrictive Mappings in such patterns do not support concepts
MePSys has yet to realize the mediation of schemas if mappings contain composed non-identical self-join components
Aside from these two special groups of patterns, using the mediated schema to relate different sources is decidable.
33
Outline
Semantics in Conjunctive Mappings Peer Schema Mediation Updating the mediated schema A study of Mapping composition Experimental Study
34
System Settings FreePastry
A P2P network layer, using efficient routing strategy Each node maintains a routing table Keeps track of its immediate neighbors. Provides the functionality of notifying applications of message arri
val, node failures, etc. Emulab
Network emulation testbed Access to different machines to emulate nodes in real network 900M memory with 2992.787 MHz processor
Input schemas and mappings Input schema follows TCP-H standard Avg num of acquaintances per peer Avg num of relations per peer schema Avg num of attributes in a relation
35
Experiment 1: Schema Mediation in MePSys
36
Experiment 2: Query Reformulation
For queries with similar size (less than 1k), time can be decidable
37
Experiment 2: Query Reformulation (Cont.)
In the maximum case, 10 times query reformulation only takes 2% of the total time
38
Experiment 3: Updating the MediatedSchema
Computing a new mediated schema always takes less than 2% of the total time
Updating almost takes no time
39
Our contributions MePSys, in which a mediated schema is created dyna
mically and any information in the network can be queried without additional global services
Provide an efficient algorithm PSM to create a mediated schema in PDMS and further create mappings to local sources
Introduce the idea of automatically detecting specific Concepts in mappings
Study on how mapping composition impacts query reformulation with existing approaches
Solve the problem of updating the mediated schema Experiment on the efficiency and scalability of MePSys
40
Future Work
Explore the semantic issues when a broader range of mappings are considered, i.e., mappings with self-join, mappings with different IDB names, etc
More optimization issues to be considered in the future system
Design better approach to update the mediated schema for local schema evolution
41
Acknowledgement
42
Thank you!
Questions?