Upload
dalia
View
30
Download
1
Tags:
Embed Size (px)
DESCRIPTION
Replication and Query Processing in the APPA Data Management System. Reza AKBARINIA Vidal MARTINS Esther PACITTI Patrick VALDURIEZ. Motivation. Advanced applications They must deal with semantically rich data They use a high-level SQL-like query language Applications - PowerPoint PPT Presentation
Citation preview
1/27
Replication and Query Processing in the APPA Data Management System
Reza AKBARINIAVidal MARTINSEsther PACITTIPatrick VALDURIEZ
2/27
Motivation Advanced applications
They must deal with semantically rich data They use a high-level SQL-like query language Applications
Epidemiological study Astronomic data sharing
Little work on managing data replication in the presence of updates
Gnutella and Kaaza: static files (no updates) Freenet: update propagation downward to close connect peers ActiveXML: on demand (web services) P-Grid: rumor spreading (probabilistic guarantees for
consistency)
3/27
Motivation Replication in distributed systems
Synchronous replication (ROWA) Asynchronous replication
Preventive replication Optimistic replication Rumor spreading
We propose a new P2P system to address Data replication in the context of advanced
applications Query processing in the presence of advanced
replication capabilities
4/27
Outline
Motivation APPA Architecture Data Replication Query Processing Validation Conclusion
Motivation APPA Architecture Data Replication Query Processing Validation Conclusion
5/27
APPA Architecture
APPA
P2P Network
Key-based Storage and Retrieval
Peer Linking
Peer ID Assignment
Peer Communication
Advanced Services
Replication CachingQuery Processing ...
Internet
Basic Services
ConsensusP2P Data Management …
6/27
Outline
Motivation APPA Architecture Data Replication Query Processing Validation Conclusion
7/27
Data Replication
Replication Model Assumptions
Frequent and unpredictable network changes Small world
Based on lazy multi-master scheme Log-based reconciliation to solve replica divergence Schema management
r-lsd: local schema description of relation r r-csd: common schema description of relation r Each peer defines mapping functions between r-lsd and r-csd
Data storage Each peer stores tuples using r-lsd and r-csd schemas Updates in one schema are mapped to the other Multi-master groups
8/27
Data Replication
Reconciliation Properties Eventual consistency: when all clients stop
submission of update actions, all replicas eventually achieve the same values
Mergeability: it is possible to schedule any arbitrary collection of log operations respecting constraints
Eventual decision: a decision is taken for each submitted action
Eventual propagation: actions and constraints known at peer “p” at time “t” are eventually known by an arbitrary peer of the group
Safe decisions: peers may not make conflicting decisions
9/27
Data Replication
Reconciliation Solutions IceCube (Microsoft Research)
Centralized conflict detection and resolution Resolution based on application semantics Non-deterministic resolution
APPA Distributed conflict detection and resolution Resolution based on application semantics Deterministic resolution – enables parallelism Considers dynamique connections and
disconnections
10/27
Data Replication
APPA Distribued Reconciliation Foundation
Use a common action log (P2P data) All tentative actions are stored in the action log Action log actions are grouped by time interval
(log unit) The resolution (deterministic) is made on
demand, comprises a log unit and produces a schedule
The schedules are available (P2P data) to all peers
Parallelism and distribution - scalability
11/27
Data Replication
Distributed Reconciliation
12/27
Data Replication
Distributed Reconciliation Log units assure unique vision over unordered actions Log unit life cycle must be managed Decision factor eliminates non-determinism Several peers can reconcile the same log unit
concurrently A peer can reuse the reconciliation made by another one A peer can finish the reconciliation started by another
one Reconciliation properties are assured Multi-master replication in P2P environment is reached
13/27
Data Replication
Service Architecture
APPA
Replication Service
Log Unit Manager
Basic Services
P2P Data Manager Consensus
P2P Network
KSR Peer Communication
Internet
Local Persistent Data
Local Log
Reconciler
Application
14/27
Outline
Motivation APPA Architecture Data Replication Query Processing Validation Conclusion
15/27
Query Processing
Problem definition Consider that
Each peer has a local schema to describe their data Peers agree on a Common Schema Description (CSD) Each peer maps its local schema to the CSD
Given a user query on a peer schema, the problem is To find the minimum set of peers that should answer the
query To execute the query in these peers and return a list of
(ranked) answers to the user Assumption
A query answer includes data from several multi-master groups (all of them which store relevant data
16/27
Query Processing
Proposed Solution
17/27
Query Processing
Proposed Solution Query reformulation
p:r(A,B,D) csd:r1(A,B,C), csd:r2(C,D,E) select A,D from r where B=b select A,D from r1,r2 where B=b and r1.C=r2.C
Query matching P: set of peers in the P2P system ps(p,r): peer schema of peer “p” involves relation “r” Problem: to find P’ P where each p in P’ has relevant
data Result: P’= { p | p P r R ps(p,r) }
18/27
Query Processing
Proposed Solution
r1
r1 r1,s1 s1
t1,u1 t1,u1 r3,s3 r3,s3
r3,s3t1,u1v
t2,u2 t2,u2
r2 r2,s2 s2
s2r2,s2r1,s1 r2s1
r3,s3 t2,u2 t2,u2
P
Q = join (r,s,v)
Query matchingP’
1 – European data2 – American data3 – African data
19/27
Query Processing
Proposed Solution Query optimization
Consider P’ a set of relevant peers Goal: obtain P’’ P’ such that
For any two peers in P’’, their relevant data are not replicated
The relevant data of peers in P’’ are equal to that in P’ The cost of query execution by peers in P’’ is minimum
Cost function A function of communication, computing power, etc.
Phases of optimization Determining relevant replicas for Q’s relations and their
peers Determining best peer per replica
20/27
Query Processing
Proposed Solution
r1
r1 r1,s1 s1
t1,u1 t1,u1 r3,s3 r3,s3
r3,s3t1,u1v
t2,u2 t2,u2
r2 r2,s2 s2
s2r2,s2r1,s1 r2s1
r3,s3 t2,u2 t2,u2
P’
Q = join (r,s,v)
Query optimization
r3,s3
r1,s1 r2
s2
v
P’’
1 – European data2 – American data3 – African data
21/27
Query Processing
Proposed Solution Algorithms
Cost parameters tcom(r,p): time to send the results of Q concerning to
replica r from a peer p to the query originator tresp(r,p): time which p needs to execute the part of Q
concerning to replica r and start to send the results to the query originator
tdjoin(S): time to join the set of replicas S in a distributed way
Exampler s
p1 p2 p3
4 6 7 5 tcom(r,p1) + tresp(r,p1) = 4tcom(s,p2) + tresp(s,p2) = 7tcom(r,p2) + tresp(r,p2) = 6tcom(s,p3) + tresp(s,p3) = 5
tdjoin({r,s}) = 6
Total Cost = 4 + 5 + 6 = 15Total Cost = 6 + 7 = 13
22/27
Query Processing
Proposed Solution
A none-linear programming model Minimize
Complexity
23/27
Query Processing
Proposed Solution Algorithms
Branch and bound Optimal selection of peers Complexity (worst case): O( )
A heuristic solution While there is an edge in the graph
Select the edge with minimum label Set the peer p as selected peer for the replica r Update the label edges of other peers that hold the
replica r Remove the replica r and its edges from the graph
Complexity: O((ma)2)
kak
m
24/27
Outline
Motivation APPA Architecture Data Replication Query processing Validation Conclusion
25/27
ImplementationJXTA
Community ApplicationsSun JXTA
Applications
JXTA Core
JXTA Applications
JXTA Services
Sun JXTA Services
Indexing
Discover
Search
Membership
JXTA Shell
Peer Commands
Peer Groups
Peer Advertisements
Peer Pipes Peer Monitoring
Peer IDs Security
Any Connected Device
JXTA Community Services
APPA
P2P Network
Key-based Storage and Retrieval
Peer Linking
Peer ID Assignment
Peer Communication
Advanced Services
Replication CachingQuery Processing
...
Basic Services
ConsensusP2P Data Management
GISP
26/27
SimulationJXTA
Community ApplicationsSun JXTA
Applications
JXTA Core
JXTA Applications
JXTA Services
Sun JXTA Services
Indexing
Discover
Search
Membership
JXTA Shell
Peer Commands
Peer Groups
Peer Advertisements
Peer Pipes Peer Monitoring
Peer IDs Security
Any Connected Device
JXTA Community Services
APPA
P2P Network
Key-based Storage and Retrieval
Peer Linking
Peer ID Assignment
Peer Communication
Advanced Services
Replication CachingQuery Processing
...
Basic Services
ConsensusP2P Data Management
GISP
Internet SimulationGT/ITM
P2P SimulationChord Simulator
P2P Network
Key-based Storage and Retrieval
Peer Linking
Peer ID Assignment
Peer Communication APPA Simulation
27/27
Conclusion Summary
Advanced cooperative applications (multi-master replication) A new P2P network-independent data management system A distributed optimistic multi-master replication solution Eventual consistency guarantee A query processing solution based on replication Validation
Future work Consider secondary copies Consider replica quality in query optimization Data caching Implementation over other P2P architectures (e.g., flooding)