Upload
javier
View
17
Download
0
Embed Size (px)
DESCRIPTION
Coordinating Peer-to-Peer information sources. Fausto Giunchiglia, University of Trento. The talk. Intuitions The underlying theory: The Local Relational Model (an application of the Local Models Semantics [Ghidini and Giunchiglia, AIJ 2001]) Some theoretical results - PowerPoint PPT Presentation
Citation preview
AOIS’02 - June 02, 2002Coordinating Peer-to-Peer information sources 1
Fausto Giunchiglia, University of Trento
Coordinating Peer-to-Peer information sources
AOIS’02 - June 02, 2002Coordinating Peer-to-Peer information sources 2
The talk
Intuitions The underlying theory: The Local Relational Model (an
application of the Local Models Semantics [Ghidini and Giunchiglia, AIJ 2001])
Some theoretical results VERY PRELIMINARY logical architecture
… and agents?
AOIS’02 - June 02, 2002Coordinating Peer-to-Peer information sources 3
INTUITIONS
AOIS’02 - June 02, 2002Coordinating Peer-to-Peer information sources 4
Peer to Peer (P2P) Computing
Peers come and go, but must nevertheless be able to interoperate.
There are many examples outside the database field Napster – a shared directory of available music
and client software to read/write the directory and import/export files.
Gnutella – a decentralized group membership and search protocol, mainly used for file sharing.
Groove – a secure shared space among intermit-tantly connected systems with no central server
…
AOIS’02 - June 02, 2002Coordinating Peer-to-Peer information sources 5
Is There a Role for P2P Databases ?
There’s hardly any literature WebDB ’01 paper (Gribble, Halevy, Ives, Rodrig,
Suciu) focuses on data placement This implies some control over data placement They’re serious about building a system (“Piazza”)
Is it a really new research problem? Or only a new application with a lot of hype around it?
Compare it with the work on data integration (Local-as-view, global-as-view approaches). Can’t we just apply the same techniques?
AOIS’02 - June 02, 2002Coordinating Peer-to-Peer information sources 6
Data integration: a snapshot
Global schema (defined at design time). Integration defined at design time by mapping local data bases
into global data base Global schema as primitive (LAV: local-as-view), or local schemas
as primitive (GAV: Global-as-view) In all cases: take one domain of interpretation (as implicitly defined
by the global schema) and MAP all individuals, relations and attributes of databases to integrate into it
Want correctness (query containment) But:
What if a new node comes in? Can we really deal with completely autonomous nodes? What about autonomy at run time (change schema?) ….
AOIS’02 - June 02, 2002Coordinating Peer-to-Peer information sources 7
Coordinating P2P databases: is it a new research problem? - 1 -
Domain Characteristics: Autonomy: peer databases are largely independent (in
their language, contents, in how they answer queries …). They may be incomplete, overlapping, semantically heterogeneous, mutually inconsistent, ..
Dinamicity: nodes come and go … and maybe come again …, schemas, attributes, values may change over time, …
You know something about the peer databases. Almost never you know everything. This knowledge is hard to maintain and may be obsolete
AOIS’02 - June 02, 2002Coordinating Peer-to-Peer information sources 8
Is it a new research problem? - 2 -
Solution desiderata: Need scalability over number of nodes Want “incrementality” as a function of the effort made in
developing a solution (design time) and in getting “good” answers (runtime)
(Design or run time) correctness and completeness should be limit cases (most of the time too costly to be implemented)
Want robustness with respect to autonomy of peer databases
AOIS’02 - June 02, 2002Coordinating Peer-to-Peer information sources 9
Is it a new research problem? - 3 -
Solution characteristics:
Keep autonomy, add coordination, as much as it can be afforded (see incrementality)
Notion of good enough answer, as a function of coordination effort
NOTE: Coordination is NOT (data) integration. Integration is defined once for all at design time. Coordination
may change at run time Differently from data integration, there is no global schema. By
the way, what is a global schema in the P2P domain? How much are we willing to pay to approximate it … and maintain it in time?
…
AOIS’02 - June 02, 2002Coordinating Peer-to-Peer information sources 10
The Local Relational Model
AOIS’02 - June 02, 2002Coordinating Peer-to-Peer information sources 11
A Motivating Example – 1 -
Scenario Databases of medical patients Complete integration is likely to be infeasible But dynamic integration of databases relevant to
one patient could have high value.
AOIS’02 - June 02, 2002Coordinating Peer-to-Peer information sources 12
A Motivating Example –2 - Consider 3 databases, one table per DB:
f: family doctorf:Prescription(PatID,treatment,disease) p: pharmacistp:Medication(PID,Prod,PrescriptionID) h: hospitalh:Patients(PATid,disease,in,out)
A given patient may be described in all 3 databases But the databases might use different patient id
formats and disease descriptions. When a patient is injured on a ski holiday in another
country, yet more databases need to get involved.
AOIS’02 - June 02, 2002Coordinating Peer-to-Peer information sources 13
Domain Relations Each database DBi: its language Li, with a set Ai of unary predicates for
Attributes, a set of constant symbols DOMi for Elements, a set of predicates Ri for Relations
Take a set of such DBi, i in I
Define Domain relation rij as a subset of DOMi x DOMj. rij is the set of pairs <di, dj> where, intuitively di and dj (usually different constants) stand for the same object in the world
Each row <d1,d2> in domain relation rik specifies that value d1 in database i corresponds to value d2 in database k
Clearly, it’s a simplification to have one domain per database. This is just for notational convenience.
AOIS’02 - June 02, 2002Coordinating Peer-to-Peer information sources 14
A Motivating Example – 3- Consider previous 3 databases,
f: family doctor
f:Prescription(p12,Aspirin,Headache) p: pharmacist
p:Medication(31, Aspirin-Bayer,fd23) h: hospital
h:Patients(r3,car_accid,1/1/01,3/1/01) We may have:
<r3,p12> in rhf <31,p12> in rpf <p12,r3> in rfh, if we have inverse mapping
AOIS’02 - June 02, 2002Coordinating Peer-to-Peer information sources 15
Domain relations … more
…. Suppose we have: <r3,p12> in rhf <31,p12> in rpf <p12,r3> in rfh …
NOTE: We do not collapse local domains in the universal domain (as in data integration). We keep them distinct, and introduce mappings between pairs of domains as objects. Domain relations explicitly manipulated at run time to implement coordination between peer databases.
AOIS’02 - June 02, 2002Coordinating Peer-to-Peer information sources 16
Domain Relations – Examples rij may be partial and not surjective (most often the case) rij, rji need not be symmetric: rij (rji(x)) x. For example, consider DBi
containing length measurements in meters and DBj in kilometers. One can have rij(x) = roundToClosestK(x),
e.g., rij(653)=1, rij(453)=0 rji(x) = x*1000
e.g., rji(1)=1000 rij= inverse(rji) : different but equivalent representations of same
domain rij= rji = emptyset : disjoint domains (what if only one being emptyset?) rik=(rij composed rjk) : transitive mappings among domains rij(ds)= emptyset, with ds subset of di: keep ds secret d1,d2 in DOMi,d1<I d2 d1’in rij(d1),d2’in rij(d2).
(d1’<j d2’):preserving order (currency exchange)
AOIS’02 - June 02, 2002Coordinating Peer-to-Peer information sources 17
P2P Coordination
Instead of a global schema, assume each peer has pair-wise coordination fomulas that specify
interdependencies. binary domain relations that specify how the
symbols used in one database translate to symbols used in another database.
Coordination formulas and domain relations can only refer to acquaintances.
Use domain relations and coordination formulas for query and update processing.
AOIS’02 - June 02, 2002Coordinating Peer-to-Peer information sources 18
Coordination Formulas – Examples
(p:x). (p:y).(p: (z).medication(x,z,y) f: treatments(x, home, y) )
(h:x).(h:y).(h:(z1,z2).patient(x,y,z1,z2) f: treatments(x, hospital, y) )
“There’s a row in the treatments table in the family doctor database for each row in the patient and hospital databases”
NOTE: see indexing of formulas and variables
AOIS’02 - June 02, 2002Coordinating Peer-to-Peer information sources 19
Coordination formulas Coordination formulas are built from atomic formulas i:(x),where
(x) is a First Order formula, and using standard connectives: and, or, , , .
Variables quantified on one DB may have to be interpreted on other DBs. Mapping is done exploiting domain relations. Consider, eg.: (i:x).j:P(x)
“for each object di in DOMi, the corresponding object dj =rij(di) in DOMj has the property P”
(i:x).(i:P(x) j:Q(x) and k:R(x))
“for each object di in DOMi, if P holds of di … Quantification is always done with respect to the domain of one
database. However notice difference between (i:x).A(x),with A(x)a coordination formula i: x.B(x), with B(x)a first order formula. It holds iff
(i:x). i:B(x) holds
AOIS’02 - June 02, 2002Coordinating Peer-to-Peer information sources 20
Higher Level Correspondences
One can generalize the domain relation to correspondences at higher meta-levels constant to constant,
e.g., ‘one’ ‘uno’; or
CAN$1.00 US$0.65 table to table,
e.g., Cust Customer column to column,
e.g., name(Cust) nm(Customer)
This is also captured in coordination formulas.
AOIS’02 - June 02, 2002Coordinating Peer-to-Peer information sources 21
Answering Queries
Local queries. Treated as if there exist no peer databases. They are first order formulas of the form
A(x) q(x) with A(x) a first order formula, x and q as below
Global queries. They are coordination formulas of the form A(x) i: q(x)
where A(x) is a coordination formula x has n variables q is a new n-ary predicate symbol i is the database which gets the query
The answer to a global query is
{ddomin such that (i:x).A(x) i:x=d)}
AOIS’02 - June 02, 2002Coordinating Peer-to-Peer information sources 22
Answering Queries – An example
Consider the query below, submitted to database h:
((i:P(x) j:R(y)) k:S(x,y) ) h: q(x,y)
Three steps:
1. Evaluate P,R,S in i,j,k (respectively)
2. map results via rih,rjh,rkh to sets si,sj,sk and then
3. compute ((si sj) sk)
AOIS’02 - June 02, 2002Coordinating Peer-to-Peer information sources 23
SOME THEORETICAL RESULTS
AOIS’02 - June 02, 2002Coordinating Peer-to-Peer information sources 24
Theoretical Results – 1 -
Provide a model theory by defining the Local Relational Model in terms of Relational spaces, where a relational space is defined as a pair:
<set of local databases, set of pairwise domain relations>
Provide a notion of satisfiability and logical consequence of coordination formulas with respect to relational frames
Provide inference rules for using coordination formulas.
Prove them sound and complete with respect to the LRM.
AOIS’02 - June 02, 2002Coordinating Peer-to-Peer information sources 25
Theoretical Results – 2 -
Define a generalized relational theory as a theory with domain closure, distinct domain values, and finite number of possible relation extensions (closed world assumption).
Define relational multi-context system <T,R> as a family of relational languages (one per database) with a generalized relational theory (in T) and set of coordination formulas (in R).
Prove that for any relational multi-context system, there’s a unique maximal relational space that satisfies it. (Generalizes Reiter’s result on CWA and single databases.)
AOIS’02 - June 02, 2002Coordinating Peer-to-Peer information sources 26
Theoretical Results – 3 -
Given a multi-context system <T,R> that represents it, the
answer to a query
A(x) i: q(x)
is the set of all d such that
{i:Ti}iI,R |- (i:x).A(x) i:x=d)
This result is the basis for a correct and complete query answering
mechanism (for a given set of coordination formulas … which
may implement something totally different from the data
integration approach (LAV, GAV))
AOIS’02 - June 02, 2002Coordinating Peer-to-Peer information sources 27
VERY PRELIMINARY HINTS OF A LOGICAL ARCHITECTURE
AOIS’02 - June 02, 2002Coordinating Peer-to-Peer information sources 28
A proposed architecture (prelim.) –1-
Four basic ingredients
1. Interest Group: set of nodes being able to answer queries about a certain topic (e.g., Tourism, medical care). Needed to compute scope of query answering
2. Acquaintance (with respect to a node and a given query): a node which is supposed to have information that can be used to answer the query
3. Coordination rule (with respect to an acquaintance): it says how to propagate query forward and results back
4. Correspondence rule (with respect to an acquaintance): it takes care of semantic heterogeneity problem.
AOIS’02 - June 02, 2002Coordinating Peer-to-Peer information sources 29
A proposed architecture (prelim.) –2-
From theory to practice1. Interest Group: In LRM is the set of databases in a
relational frame2. Acquaintance (of a node n1): In LRM any node n2 for
which there is a coordination formula involving n1 and n2
3. Coordination rule: An implementation of coordination formulas, parametric on correspondence rules.
4. Correspondence rule : A set of rewrite rules which implement the language dependent part of coordination formulas and take care of semantic heterogeneity (domain relations are implemented as special kinds of correspondence rules).
AOIS’02 - June 02, 2002Coordinating Peer-to-Peer information sources 30
Level 1 architecture – The P2P layer
P2P Layer P2P functionality is add-on
Local Data Source Database File system Web site …
User Interface User queries Results …
Query Manager and Update Manager
responsible for query and update propagation
manage coordination and correspondence rules, acquaintances, and interest groups
Wrapper provides a translation layer
between QM and UM, and LDS
AOIS’02 - June 02, 2002Coordinating Peer-to-Peer information sources 31
Level 2 architecture – The Query manager
Propagation Planner Talks to group-manager
Query Formation Responsible of formation of outgoing
queries, as well as querying the local data source
Results Handler Responsible for sending and receiving
query results; Shows results to user
Executed Query History Preventing from duplicate query
execution Acquaintances Interest Groups Group Management
Used only by node-managers for management of groups and query propagation
Coordination and Correspondence Rules
AOIS’02 - June 02, 2002Coordinating Peer-to-Peer information sources 32
Query propagation strategy
1. Node defines query topic2. Node sends Group Manager (GM)
request of Query Scope (QS)3. GM computes QS4. Node 1 sends query to acquaintances,
in QS, namely 2 and 4, and reports this fact to GM.
5. Nodes 2 and 4 send answer to node 16. Node 2 propagates query to its
acquaintances in QS, namely 4 and 6, and reports this fact to GM
7. And so on…8. Nodes which do not propagate any
further, report this fact to GM9. Propagation stops when “no more
propagation” received from all boundary nodes (reached all reachable acquaintances).
1
2
3
4
6
5
10
8
7
9
11
1. Q ()2. Q (, topic)
GM
4. QS (, topic)= (2, 4, 6, 8, 9, 11)
←R
es2
←Res4
AOIS’02 - June 02, 2002Coordinating Peer-to-Peer information sources 33
Summary
Coordinating P2P information sources: keep autonomy, add (run-time) coordination. Be content with good enough answers.
Theoretically, model coordination using four notions: set of local databases, domain relations, coordination formulas, global answer to a query
Implementationally, implement coordination using five notions: interest groups, acquaintances, coordination rules, correspondence rules, coordination algorithm
… and agents?
AOIS’02 - June 02, 2002Coordinating Peer-to-Peer information sources 34
Published work (not much … yet)
Paper on LRM still unpublished, but see project Web page
Paper on basic ideas in WEBDB 2002 Paper on architecture in CIA 2002 These slides soon on my Web page
Project Web page (to be put up soon) will be accessible from my Web page:
http://www.ict.unitn.it/~fausto/