Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
L3S Research Center, University of Hannover
PeerPeer toto Peer Database NetworksPeer Database NetworksPeerPeer--toto--Peer Database NetworksPeer Database Networks
WolfWolf--TiloTilo BalkeBalke and Wolf Siberskiand Wolf Siberski
19.12.200719.12.2007
1Peer-to-Peer Systems and Applications, Springer LNCS 3485Peer-to-Peer Systems and Applications, Springer LNCS 3485
*with slides from J.M.Hellerstein (UC Berkeley), A. Halevi (U Washington), P. Raghavan (Stanford)
Overview
1. Why Peer-to-Peer Databases?1. Federation2. Information integration3. Sensor networks4. ‘New’ internet
2. Distributed Databases
3. P2P Databases1. Challenges2. Design Dimensions
4. Existing P2P Database systems1 Edutella: focus on expressivity
2Peer-to-Peer DatabasesL3S Research Center
1. Edutella: focus on expressivity2. Piazza: focus on integration3. PIER: focus on scalability4. HiSbase: focus on scalability for spatial data
Federation of similar data providers
Examples(Digital) Libraries
Primary Scientific Data Providers (Gene Databases)
News Providers
All nodes offer the same kind of information
Homogeneous network (fixed schema)
3Peer-to-Peer DatabasesL3S Research Center
Non-P2P solutions exist, but not open/scalable
Information Integration
ExamplesFind German professors having published at least three papers at the Conference on Very Large Databases
Fi d i t d t d t b b k i G itt b GFind introductory database book in German, written by a German professor
Find all recordings of Mozarts ‚Magic Flute‘ with conductors who also once conducted Berliner Philharmoniker
Very tedious to find with current search engines
N d d t b lik i biliti
4Peer-to-Peer DatabasesL3S Research Center
Needs database-like querying capabilities
Heterogeneous networkInformation from several databases need to be combined
Sensor Networks
ExamplesNetwork Monitoring:
network maps
t d t tievent detections
...
Car Traffic Monitoring
Huge amount of nodes
Low amount of data
5Peer-to-Peer DatabasesL3S Research Center
Screenshots from project PHI presentation, J. Hellerstein, Berkeley
Low amount of data
Homogeneous network
Overview
1. Why Peer-to-Peer Databases?1. Federation2. Information integration3. Sensor networks4. ‘New’ internet
2. Distributed Databases
3. P2P Databases1. Challenges2. Design Dimensions
4. Existing P2P Database systems1 Edutella: focus on expressivity
6Peer-to-Peer DatabasesL3S Research Center
1. Edutella: focus on expressivity2. Piazza: focus on integration3. PIER: focus on scalability4. HiSbase: focus on scalability for spatial data
Recap: Database basics
Relational modelData stored in tablesDocIdentifier Title Date Language
Database described by schema
Identifier Title Date Language1861978766 Eoite odsifj woifj 1993 en1394875966 Oewr svonwe 2005 en1817305606 Psadoifh sdafns dsf 1999 en1809239086 Vsd sdfokj sfew 2001 en1345398705 Wdfj vspo sdfp dort 1989 en
Doc Author Person
7Peer-to-Peer DatabasesL3S Research Center
DocIdTitleDateLanguage
AuthorDocIdPersonId
PersonIdNameSurname
Relational Queries
SelectionSelects rows (by conditions)
Notation: sLang=“en”(Doc)
DocIdTitleDateLanguage
AuthorDocIdPersonId
PersonIdNameSurname
ProjectionSelects columns (by name)
Notation: pId, Title(Doc)
JoinCreates new table (view) by combining existing ones
8Peer-to-Peer DatabasesL3S Research Center
Notation: Doc «Doc.Id=Author.DocId Author
Relational Query Plans
DocIdTitleLang
AuthorDocIdPersonId
PersonIdNameSurname
Algebraic representation of a query
SELECT Doc.Id, Doc.Title
FROM Doc, Person, Author
WHERE Doc.Lang='en' AND
Author.DocId = Doc.Id AND
LangSubject
Surname
9Peer-to-Peer DatabasesL3S Research Center
Author.PersonId = Person.Id AND
Person.Name = 'Kant'
Logical vs. Physical Data Model
LogicalRelational model
Queries specified on schema
No information about storage format
PhysicalSchema mapped to disk/memory structure
Including index structures
Queries mapped to execution plans
10Peer-to-Peer DatabasesL3S Research Center
Queries mapped to execution plansMap relational operators to executable database operations
Logical vs. Physical Query Operators
Logical Physical
11Peer-to-Peer DatabasesL3S Research Center
Query Execution Components
Query Plan Generation
Parser Rewriter/ Optimizer
Logical Query Plan
Catalog
Code Generator
PhysicalQuery Plan Query
Execution Engine
ExecutableQuery Plan
12Peer-to-Peer DatabasesL3S Research Center
g(Meta Data)
Base Data
Query Plan Search strategies
Exhaustive (with pruning)A fixed set of techniques for each relational operator
Search space = “all” possible QEPs with this set of techniques
Prune search space using heuristics
Choose minimum cost QEP from rest of search space
Hill climbing (greedy)
13Peer-to-Peer DatabasesL3S Research Center•13
Hill Climbing
xInitial plan
1
2
Begin with initial feasible QEP
At each step, generate a set S of new QEPs by applying ‘transformations’ to current QEP
Evaluate cost of each QEP in S
14Peer-to-Peer DatabasesL3S Research Center
Evaluate cost of each QEP in S
Stop if no improvement is possible
Otherwise, replace current QEP by the minimum cost QEP from S and iterate
•14
R S T V
Hill-Climbing: Example
Goal: minimize communication cost
Initial plan: send all relations R S T V Initial plan: send all relations to one siteTo site 1: cost=20+30+40= 90
To site 2: cost=10+30+40= 80
To site 3: cost=10+20+40= 70
To site 4: cost=10+20+30= 60
R S T VA B C
R 1 10
Rel. Site # of tuples
15Peer-to-Peer DatabasesL3S Research Center
Transformation: send a relation to its neighbor
R 1 10 S 2 20 T 3 30 V 4 40
Distributed Databases
Database fragmentationHorizontal (row distribution)
Vertical (column and/or table distribution)
Assumes central coordinator
Distributing databases (top-down) Have a database
How to split and allocate to individual sites
16Peer-to-Peer DatabasesL3S Research Center
Integrating databases (bottom-up)Combine existing databases
How to deal with heterogeneity & autonomy
•16
Fragmentation: Local View
Relational database at each node
Examples (Dublin Core Metadata)
Node1.DocNode2.DocIdentifier Title Date Format Language521354021 Csdoi sdofi sfi sfdsf 1948 Book de593574021 Deor aodfi sdfwe dls 1952 Book de534536021 Toid sdofij cvcdova 1937 Book de528943021 Csdo asofdi weor 1916 Book de529874521 Epodsf csmieo mo 1924 Book de526983221 Awer fzwe xhzpwf 1959 Book de
Identifier Title Date Language Coverage1861978766 Eoite odsifj woifj 1993 en Scotland1394875966 Oewr svonwe 2005 en Wales1817305606 Psadoifh sdafns dsf 1999 en York1809239086 Vsd sdfokj sfew 2001 en West Midlands1345398705 Wdfj vspo sdfp dort 1989 en London
17Peer-to-Peer DatabasesL3S Research Center
Fragmentation: Distributed View
Table fragments distributed over nodes
DocIdentifier Title Date Format Language Coverage521354021 Csdoi sdofi sfi sfdsf 1948 Book de593574021 Deor aodfi sdfwe dls 1952 Book de534536021 Toid sdofij cvcdova 1937 Book de•Peer1
DocIdentifier Title Date Format Language Coverage521354021 Csdoi sdofi sfi sfdsf 1948 Book de593574021 Deor aodfi sdfwe dls 1952 Book de534536021 Toid sdofij cvcdova 1937 Book de534536021 Toid sdofij cvcdova 1937 Book de528943021 Csdo asofdi weor 1916 Book de529874521 Epodsf csmieo mo 1924 Book de526983221 Awer fzwe xhzpwf 1959 Book de1861978766 Eoite odsifj woifj 1993 en Scotland1394875966 Oewr svonwe 2005 en Wales1817305606 Psadoifh sdafns dsf 1999 en York1809239086 Vsd sdfokj sfew 2001 en West Midlands1345398705 Wdfj vspo sdfp dort 1989 en London
•Peer1
•Peer2 •Peer2
•Peer3
•Peer4
j528943021 Csdo asofdi weor 1916 Book de529874521 Epodsf csmieo mo 1924 Book de526983221 Awer fzwe xhzpwf 1959 Book de
18Peer-to-Peer DatabasesL3S Research Center
Goal: View network as one logical database
Distributed Query Processing
Parsing Given SQL query, generate one or more algebraic query trees
LocalizationRewrite query trees, replacing relations by fragments
OptimizationGiven cost model + one or more localized query trees
Produce minimum cost query execution plan
19Peer-to-Peer DatabasesL3S Research Center
Distributed Databases
Parser Rewriter/ Optimizer
Logical Query Plan Code
Generator
PhysicalQuery Plan Query
Execution Engine
ExecutableQuery Plan
Partial Query Plans
Catalog (Meta Data)
Mediator
QEEParser
Data Provider
QEEParser QEEParser
Data Provider Data ProviderCatalog Data
Partial Query Plans
20Peer-to-Peer DatabasesL3S Research Center
QEE
Base Data
Parser
Catalog
QEE
Base Data
Parser
Catalog
QEE
Base Data
Parser
Catalog
Distributed Query Plan
DocumentIdTitleLangSubject
AuthorDocIdPersonId
PersonIdNameSurname
21Peer-to-Peer DatabasesL3S Research Center
Issues
Communication costsConnection characteristicsExpected size of (intermediate) answers
N d h t i tiNode characteristicsSize of fragmentsStorage capacity, storage cost at sitesProcessing power at the nodes
Query processing strategyHow are joins done?
22Peer-to-Peer DatabasesL3S Research Center
Where are answers collected?
Fragment replicationUpdate costConcurrency control overhead
Query Optimization
Generate query execution plans
Estimate cost of each plan
Choose minimum cost plan
What’s different for distributed DB?New strategies for some operations (join, sort, aggregation, etc.)
Many ways to assign and schedule processors
Some factors besides number of IO’s in the cost model
23Peer-to-Peer DatabasesL3S Research Center•23
Cost estimation
In centralized systems - estimate sizes of intermediate relations
For distributed systemsTransmission cost/time may dominate
Account for parallelism
Work at site
Work at site
T1 T2 answer
50 IOsPlan APlan B
24Peer-to-Peer DatabasesL3S Research Center
Data distribution and result re-assembly cost/time
100 IOs
70 IOs20 IOs
Optimization in distributed DBs
Two levels of optimization
Global optimizationGiven localized query and cost function
Output optimized (min. cost) query plan that includes relational and communication operations on fragments
Local optimizationAt each site involved in query execution
Portion of the query plan at a given site optimized using techniques from centralized DB systems
25Peer-to-Peer DatabasesL3S Research Center
from centralized DB systems
•25
Overview
1. Why Peer-to-Peer Databases?1. Federation2. Information integration3. Sensor networks4. ‘New’ internet
2. Distributed Databases
3. P2P Databases1. Challenges2. Design Dimensions
4. Existing P2P Database systems1 Edutella: focus on expressivity
26Peer-to-Peer DatabasesL3S Research Center
1. Edutella: focus on expressivity2. Piazza: focus on integration3. PIER: focus on scalability4. HiSbase: focus on scalability for spatial data
Challenges of Schema-Based Peer-to-Peer Networks
Multi-Dimensional Search SpaceDHTs only work for one dimension (one attribute)
Schema HeterogeneitySources use different database schemas for similar information
Potentially large result setsSELECT * FROM Firewalls.BlockedPackets ...
Range and Aggregate Queries
And the usual P2P challenges...
27Peer-to-Peer DatabasesL3S Research Center
And the usual P P challenges...Trust
Network Churn
Unbalanced Popularity
Design Dimensions
Network PropertiesData Placement
Topology and Routing
Data AccessData Model
Query Language
Integration MechanismMapping Representation
M i C ti
28Peer-to-Peer DatabasesL3S Research Center
Mapping Creation
Integration Method
Data Placement
Placement according to ownershipData stays at information source
Full control of data by owner (access policy, availability, etc.)
More autonomy of single nodes
Placement according to search strategyData is distributed according to later access mechanism (e.g., DHT)
No control over data access
More freedom to optimize query routing
29Peer-to-Peer DatabasesL3S Research Center
More freedom to optimize query routing
Additional caching/replication possibleEssential for load balancing
Topology and Routing (1)
Unstructured NetworksFlooding as routing algorithm
Supports arbitrary expressive queries
Agnostic to schema heterogeneity
Inefficient (filtered flooding can help)
Short-cut networksUnstructured, but continuously optimize network connections
Can develop into regular structures like Small World networks
30Peer-to-Peer DatabasesL3S Research Center
Can develop into regular structures like Small-World networks
Clustering + filtered flooding reduces query distribution
Fireworks routing
Topology and Routing (2)
Super-peer networksInherits advantages and disadvantages of unstructured network
Better efficiency and scaling (but still flooding)
Good match to distributed databases (super-peers become mediators)
DHT NetworksCreate separate overlay for each attribute
Or use Multidimensional DHTs, e.g. Mercury
Limited query expressivity
31Peer-to-Peer DatabasesL3S Research Center
Limited query expressivity
Suitable for homogeneous schema
Not all queries are evaluated efficiently
Topology and Routing - Summary
Local indexing No knowledge about other peers
Doesn‘t scale
Central indexingOne node holds complete index
Distributed indexingDistributed Hash Tables
Single point of control (and failure)
32Peer-to-Peer DatabasesL3S Research Center
Filtered Flooding
Short-cut networks
Super-peer networks
Data Model
Fixed set of attributesAllows for sophisticated topologies
Inflexible
Applicability: custom applications
Relational modelusual database model
not designed for distribution
XML
33Peer-to-Peer DatabasesL3S Research Center
XML
RDFSemantic Web exchange format
very suitable for distributed data
Query Language
NoneFixed set of parameterized queries
Relational query languageAlways subset of SQL
XML query languageXPath or XQuery
34Peer-to-Peer DatabasesL3S Research Center
RDF Query LanguageSPARQL or its predecessors
Logic language
Mapping Representation
DeclarativeTranslation between schema elements
Distributed database approaches applicable
ProceduralImperative description how to translate/transform queries and data
Mapping characteristicsUnidirectional or Bidirectional
Si l ( t ) i l i
35Peer-to-Peer DatabasesL3S Research Center
Simple (one-to-one) mapping or complex mappings
Mapping of objectsstate equality of objects in different sources
Mapping Creation
ManualUsers create mappings
Network distributes mappings and uses them for translation
Semi-automaticSystem proposes mappings, based on heuristics
attribute name
similar data
User feedback used to validate created mappings
36Peer-to-Peer DatabasesL3S Research Center
Automatice.g., probabilistic mapping
similar techniques as for semi-automatic mapping
Integration Mechanism
Query RewritingQuery is translated to target schema
Data is translated back to source schema
Most common approach
Data RewritingData is replicated to source schema
Only feasible for small data sets
37Peer-to-Peer DatabasesL3S Research Center
Existing Systems - Typology
Focus on network scalabilityhomogeneous schema
low query expressivity
DHT as underlying network structure
Focus on expressivitysuper-peer or unstructured
unlimited query complexity
F c i te ati
38Peer-to-Peer DatabasesL3S Research Center
Focus on integrationtypically unstructured
query routing driven by mappings
Existing Systems – Overview
Name Topology Data
Placement
Data
Model
Query Language
Scalability PIER DHT (Bamboo) Distributed Relational SQL subset
RDFPeers DHT (MAAN) Distributed RDF -
Mercury DHT (Symphony) Distributed Tuples -
Expressivity SQPeer Super-peer Owner RDF RQL
PeerDB Unstructured Owner Relational SQL subset
Edutella Super-peer Owner RDF datalog (SQL)
Integration Piazza Unstructured Owner XML XQuery subset
GridVine DHT (P-Grid) Distributed RDF -
39Peer-to-Peer DatabasesL3S Research Center
List not complete
DRAGO Unstructured Owner Descr. Logics OWL subset
Overview
1. Why Peer-to-Peer Databases?1. Federation2. Information integration3. Sensor networks4. ‘New’ internet
2. Distributed Databases
3. P2P Databases1. Challenges2. Design Dimensions
4. Existing P2P Database systems1 Edutella: focus on expressivity
40Peer-to-Peer DatabasesL3S Research Center
1. Edutella: focus on expressivity2. Piazza: focus on integration3. PIER: focus on scalability4. HiSbase: focus on scalability for spatial data
Edutella: Introduction
Initial Goal: Achieve interoperability between heterogeneous metadata-driven (e-learning) systems
Provides metadata only, not the resourcesResources are fetched via http
Query Examples“Find software engineering course lecture notes for undergraduates in German language”
“Find an introduction to Enterprise Java Beans for professionals”
41Peer-to-Peer DatabasesL3S Research Center
“Find a distance course in software requirements analysis from a Swedish university”
Query Service
provides standardized query/retrieval of RDF metadata stored in distributed RDF repositories
Query Exchange LanguageQuery Exchange LanguageBased on Datalog (allows expression of rules)
RDF syntax
For exchange only
Adapters to enable QEL query processing on diverse backends
42Peer-to-Peer DatabasesL3S Research Center
backends
Query processing
Parsers/Formatters convert between query languages
Applications and Backends are shielded from communication layer
Query messages are exchanged in RDF/XML formatEd
utel
laC
onsu
mer
Inte
rfac
e
Que
ryPa
rser
Edut
ella
Prov
ider
Inte
rfac
e
Que
ryFo
rmat
ter
43Peer-to-Peer DatabasesL3S Research Center
Wrappers available for SQL, RDQL, RQL, and others
Edutella Topology
Super-Peers
Content Providers
Content Consumers
Use filtered flooding in super-peer backbone
HyperCuP topology
44Peer-to-Peer DatabasesL3S Research Center
for backbone
Cayley Graphs
Graph representing a permutation group G, described by a set of generators
Regular, vertex-symmetric, recursively decomposable
Optimal routing and broadcast algorithms exist
1
0
10
1
0 1
0
10
1
0
10 10
2 2
2
2
2
2 2
22
a b
2 2dc
2 2ab
1234
2134
3124
1324
2314
3214
4231
2431
3421
4321
2341
3241
3412
14324312
2413
14234213
2
11
3 0
4 5
7
0
11
6 0
2 2
45Peer-to-Peer DatabasesL3S Research Center
0
10
1 0
10
1
222
cd
4132
3142
1342 4123
2143
1243
8 10
22
Hypercube Star Graph
Super-peer Topology: HyperCuP
Super-peers are arranged as hypercube
Broadcast needs n-1 messages, log2(n) hops
High connectivity, resilient against node failures
SP1 SP3
SP2
SP7
SP5
SP6
Minimal spanning tree
46Peer-to-Peer DatabasesL3S Research Center
SP8SP4
Super-Peer-based Query Routing
Database fragment summaries
Index structure and maintenance
Query Routing
47Peer-to-Peer DatabasesL3S Research Center
Peer Fragment Summaries
Peer1.DocIdentifier Title Date Format Language521354021 Csdoi sdofi sfi sfdsf 1948 Book de593574021 Deor aodfi sdfwe dls 1952 Book de
Peer2.DocIdentifier Title Date Language Coverage1861978766 Eoite odsifj woifj 1993 en Scotland1394875966 Oewr svonwe 2005 en Wales
534536021 Toid sdofij cvcdova 1937 Book de528943021 Csdo asofdi weor 1916 Book de529874521 Epodsf csmieo mo 1924 Book de526983221 Awer fzwe xhzpwf 1959 Book de
1817305606 Psadoifh sdafns dsf 1999 en York1809239086 Vsd sdfokj sfew 2001 en West Midlands1345398705 Wdfj vspo sdfp dort 1989 en London
Peer1Doc Identifier
Peer2Doc Identifier
48Peer-to-Peer DatabasesL3S Research Center
Doc.IdentifierDoc.TitleDoc.Date[1916-1959]Doc.Format [Book]Doc.Language[de]
Doc.IdentifierDoc.TitleDoc.Date[1989-2005]Doc.Language[de]Doc.Coverage[UK]
Super-peer/Peer Indices
Peer1 SummaryDoc.IdentifierDoc.Title
Peer2 SummaryDoc.IdentifierDoc.Title
Peers forward summary to super-peer
Super-Peer1 SP/P IndexDoc.Identifier P1, P2Doc.Title P1, P2Doc.Date[1916-1959] P1
Doc.Date[1916-1959]Doc.Format [Book]Doc.Language[de]
Doc.Date[1989-2005]Doc.Language[en]Doc.Coverage[UK]
49Peer-to-Peer DatabasesL3S Research Center
[1989-2005] P2Doc.Format [Book] P1Doc.Language[de]
[en]P1P2
Doc.Coverage[UK] P2
Super-Peer Fragment Summaries
DocIdentifier Title Date Format Language Coverage521354021 Csdoi sdofi sfi sfdsf 1948 Book de593574021 Deor aodfi sdfwe dls 1952 Book de534536021 Toid sdofij cvcdova 1937 Book de528943021 Csdo asofdi weor 1916 Book de•Super-Peer1
SP1 SummaryDoc.Identifier528943021 Csdo asofdi weor 1916 Book de
529874521 Epodsf csmieo mo 1924 Book de526983221 Awer fzwe xhzpwf 1959 Book de1861978766 Eoite odsifj woifj 1993 en Scotland1394875966 Oewr svonwe 2005 en Wales1817305606 Psadoifh sdafns dsf 1999 en York1809239086 Vsd sdfokj sfew 2001 en West Midlands1345398705 Wdfj vspo sdfp dort 1989 en London
p Doc.IdentifierDoc.TitleDoc.Date[1916-2005]Doc.Format [Book]Doc.Language[de, en]Doc.Coverage[UK]
50Peer-to-Peer DatabasesL3S Research Center
Super-peer/Super-peer Indices
0SP1 SP2
SP1 SummaryDoc.IdentifierDoc.TitleDoc.Date[1916-2005]Doc.Format [Book]Doc Language[de en]
Super-Peer2 SP/SP Index… …Doc.Language[de]
[en]SP1SP1
… …
1 1
0
SP1
SP3 SP4
SP2Doc.Language[de, en]Doc.Coverage[UK]
Super-Peer3 SP/SP Index… …Doc.Language[de]
[en]SP1SP1
Super-Peer4 SP/SP Index… …Doc.Language[de]
[en]SP2,SP3SP2,SP3
Super-Peer4 SP/SP Index… …Doc.Language[de]
[en]SP2SP2
51Peer-to-Peer DatabasesL3S Research Center
Naively forwarding is not optimal
[ ] 1… …
[en] SP2,SP3… …
[en] SP2… …
Super-peer/Super-peer Indices
SP1 Summary
Take edge dimension into accountforward SP/SP index entries only along lower edges
Super-Peer2 SP/SP Index
0
1 1
0
SP1
SP3 SP4
SP2
y…Doc.Language[de, en]…
Super-Peer3 SP/SP Index
… …Doc.Language[de]
[en]SP1 (0)SP1 (0)
… …
Super-Peer4 SP/SP IndexSuper-Peer4 SP/SP Index
52Peer-to-Peer DatabasesL3S Research Center
Super Peer3 SP/SP Index… …Doc.Language[de]
[en]SP1 (1)SP1 (1)
… …
Super Peer4 SP/SP Index… …Doc.Language[de]
[en]SP3 (0)SP3 (0)
… …
Super Peer4 SP/SP Index… …Doc.Language[de]
[en]… …
Query Routing
Use SP/P and SP/SP indices as filters
SELECT * FROM Doc WHERE Language=”de“ AND …
Super-Peer1 SP/P Index
Super-Peer3 SP/SP Index… …D L [d ] SP (1)
Super-Peer4 SP/SP Index… …D L [d ] SP (0)
p… …Doc.Language[de]
[en]P1P2
… …
53Peer-to-Peer DatabasesL3S Research Center
Doc.Language[de][en]
SP1 (1)SP1 (1)
… …
Doc.Language[de][en]
SP3 (0)SP3 (0)
… …
Application: P2P Digital Library Network
Large amount of individual DLs
Autonomous institutions
Users have to
•blah
•blah
•blah
find relevant DLs
search separately on every found DL
Violates 4th law of Library Science“Save the time of the reader”(R th 1931)
54Peer-to-Peer DatabasesL3S Research Center
(Ranganathan, 1931)
DL Search Engine Solution
Search engine approach‚Crawl‘ DLs
Copy Content
•blah
•blah
•blah
Offer unified collection
IssuesSearch engine controls content
Proprietary interface(or just Web crawl)
55Peer-to-Peer DatabasesL3S Research Center
(or just Web crawl)
Difficult to preserve metadata
Single point of failure
Open Archive Initiative Solution
Standardize metadata ‚Crawling‘ interfaceOAI-PMH (Protocol for Metadata Harvesting)
•blah
•blah
•blah
Harvesterscollect metadata from DLs
offer search facilities
IssuesN i l t i t
56Peer-to-Peer DatabasesL3S Research Center
No single entry point
Harvesters control content
Points of failure
Incentive for Harvester?
From OAI to P2P
Create ‘peer wrapper’ for existing DLs
Super-peer
backbone
Digital Libraries
57Peer-to-Peer DatabasesL3S Research Center
OAI-PMH Interface
Content
Providers
OAI-P2P – a Digital Library Network
P2P approach:DLs form self-organized network
User queries are distributed
•blah
•blah
•blah
AdvantagesNo dependency on service provider
Each DL still controls its content
No single point of failure
58Peer-to-Peer DatabasesL3S Research Center
5th law of Library Science:“The library is a growing organism”(Ranganathan, 1931)
Edutella – Discussion
Efficiently limits query distribution to relevant peers
Very good scalability in terms of data sizeNo data movement required
Little index maintenance efforts
Flooding limits super-peer backbone scalabilityWill never scale to millions of peers
Mainly query forwarding
59Peer-to-Peer DatabasesL3S Research Center
Initial extension to full query planning exists
No load-balancing mechanisms
Overview
1. Why Peer-to-Peer Databases?1. Federation2. Information integration3. Sensor networks4. ‘New’ internet
2. Distributed Databases
3. P2P Databases1. Challenges2. Design Dimensions
4. Existing P2P Database systems1 Edutella: focus on expressivity
60Peer-to-Peer DatabasesL3S Research Center
1. Edutella: focus on expressivity2. Piazza: focus on integration3. PIER: focus on scalability4. HiSbase: focus on scalability for spatial data