30
L3S Research Center, University of Hannover Peer Peer to to Peer Database Networks Peer Database Networks Peer Peer- -to to- -Peer Database Networks Peer Database Networks Wolf Wolf- -Tilo Tilo Balke Balke and Wolf Siberski and Wolf Siberski 19.12.2007 19.12.2007 1 Peer-to-Peer Systems and Applications, Springer LNCS 3485 *with slides from J.M.Hellerstein (UC Berkeley), A. Halevi (U Washington), P. Raghavan (Stanford) Overview 1. Why Peer-to-Peer Databases? 1. Federation 2. Information integration 3. Sensor networks 4. ‘New’ internet 2. Distributed Databases 3. P2P Databases 1. Challenges 2. Design Dimensions 4. Existing P2P Database systems 1 Edutella: focus on expressivity 2 Peer-to-Peer Databases L3S Research Center 1. Edutella: focus on expressivity 2. Piazza: focus on integration 3. PIER: focus on scalability 4. HiSbase: focus on scalability for spatial data

Peer-to-Peer Database Networksledvina/DHT/Vorlesung_9.pdf · Sensor Networks yExamples fNetwork Monitoring: network maps eventd t tit detections ... fCar Traffic Monitoring yHuge

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

  • L3S Research Center, University of Hannover

    PeerPeer toto Peer Database NetworksPeer Database NetworksPeerPeer--toto--Peer Database NetworksPeer Database Networks

    WolfWolf--TiloTilo BalkeBalke and Wolf Siberskiand Wolf Siberski

    19.12.200719.12.2007

    1Peer-to-Peer Systems and Applications, Springer LNCS 3485Peer-to-Peer Systems and Applications, Springer LNCS 3485

    *with slides from J.M.Hellerstein (UC Berkeley), A. Halevi (U Washington), P. Raghavan (Stanford)

    Overview

    1. Why Peer-to-Peer Databases?1. Federation2. Information integration3. Sensor networks4. ‘New’ internet

    2. Distributed Databases

    3. P2P Databases1. Challenges2. Design Dimensions

    4. Existing P2P Database systems1 Edutella: focus on expressivity

    2Peer-to-Peer DatabasesL3S Research Center

    1. Edutella: focus on expressivity2. Piazza: focus on integration3. PIER: focus on scalability4. HiSbase: focus on scalability for spatial data

  • Federation of similar data providers

    Examples(Digital) Libraries

    Primary Scientific Data Providers (Gene Databases)

    News Providers

    All nodes offer the same kind of information

    Homogeneous network (fixed schema)

    3Peer-to-Peer DatabasesL3S Research Center

    Non-P2P solutions exist, but not open/scalable

    Information Integration

    ExamplesFind German professors having published at least three papers at the Conference on Very Large Databases

    Fi d i t d t d t b b k i G itt b GFind introductory database book in German, written by a German professor

    Find all recordings of Mozarts ‚Magic Flute‘ with conductors who also once conducted Berliner Philharmoniker

    Very tedious to find with current search engines

    N d d t b lik i biliti

    4Peer-to-Peer DatabasesL3S Research Center

    Needs database-like querying capabilities

    Heterogeneous networkInformation from several databases need to be combined

  • Sensor Networks

    ExamplesNetwork Monitoring:

    network maps

    t d t tievent detections

    ...

    Car Traffic Monitoring

    Huge amount of nodes

    Low amount of data

    5Peer-to-Peer DatabasesL3S Research Center

    Screenshots from project PHI presentation, J. Hellerstein, Berkeley

    Low amount of data

    Homogeneous network

    Overview

    1. Why Peer-to-Peer Databases?1. Federation2. Information integration3. Sensor networks4. ‘New’ internet

    2. Distributed Databases

    3. P2P Databases1. Challenges2. Design Dimensions

    4. Existing P2P Database systems1 Edutella: focus on expressivity

    6Peer-to-Peer DatabasesL3S Research Center

    1. Edutella: focus on expressivity2. Piazza: focus on integration3. PIER: focus on scalability4. HiSbase: focus on scalability for spatial data

  • Recap: Database basics

    Relational modelData stored in tablesDocIdentifier Title Date Language

    Database described by schema

    Identifier Title Date Language1861978766 Eoite odsifj woifj 1993 en1394875966 Oewr svonwe 2005 en1817305606 Psadoifh sdafns dsf 1999 en1809239086 Vsd sdfokj sfew 2001 en1345398705 Wdfj vspo sdfp dort 1989 en

    Doc Author Person

    7Peer-to-Peer DatabasesL3S Research Center

    DocIdTitleDateLanguage

    AuthorDocIdPersonId

    PersonIdNameSurname

    Relational Queries

    SelectionSelects rows (by conditions)

    Notation: sLang=“en”(Doc)

    DocIdTitleDateLanguage

    AuthorDocIdPersonId

    PersonIdNameSurname

    ProjectionSelects columns (by name)

    Notation: pId, Title(Doc)

    JoinCreates new table (view) by combining existing ones

    8Peer-to-Peer DatabasesL3S Research Center

    Notation: Doc «Doc.Id=Author.DocId Author

  • Relational Query Plans

    DocIdTitleLang

    AuthorDocIdPersonId

    PersonIdNameSurname

    Algebraic representation of a query

    SELECT Doc.Id, Doc.Title

    FROM Doc, Person, Author

    WHERE Doc.Lang='en' AND

    Author.DocId = Doc.Id AND

    LangSubject

    Surname

    9Peer-to-Peer DatabasesL3S Research Center

    Author.PersonId = Person.Id AND

    Person.Name = 'Kant'

    Logical vs. Physical Data Model

    LogicalRelational model

    Queries specified on schema

    No information about storage format

    PhysicalSchema mapped to disk/memory structure

    Including index structures

    Queries mapped to execution plans

    10Peer-to-Peer DatabasesL3S Research Center

    Queries mapped to execution plansMap relational operators to executable database operations

  • Logical vs. Physical Query Operators

    Logical Physical

    11Peer-to-Peer DatabasesL3S Research Center

    Query Execution Components

    Query Plan Generation

    Parser Rewriter/ Optimizer

    Logical Query Plan

    Catalog

    Code Generator

    PhysicalQuery Plan Query

    Execution Engine

    ExecutableQuery Plan

    12Peer-to-Peer DatabasesL3S Research Center

    g(Meta Data)

    Base Data

  • Query Plan Search strategies

    Exhaustive (with pruning)A fixed set of techniques for each relational operator

    Search space = “all” possible QEPs with this set of techniques

    Prune search space using heuristics

    Choose minimum cost QEP from rest of search space

    Hill climbing (greedy)

    13Peer-to-Peer DatabasesL3S Research Center•13

    Hill Climbing

    xInitial plan

    1

    2

    Begin with initial feasible QEP

    At each step, generate a set S of new QEPs by applying ‘transformations’ to current QEP

    Evaluate cost of each QEP in S

    14Peer-to-Peer DatabasesL3S Research Center

    Evaluate cost of each QEP in S

    Stop if no improvement is possible

    Otherwise, replace current QEP by the minimum cost QEP from S and iterate

    •14

  • R S T V

    Hill-Climbing: Example

    Goal: minimize communication cost

    Initial plan: send all relations R S T V Initial plan: send all relations to one siteTo site 1: cost=20+30+40= 90

    To site 2: cost=10+30+40= 80

    To site 3: cost=10+20+40= 70

    To site 4: cost=10+20+30= 60

    R S T VA B C

    R 1 10

    Rel. Site # of tuples

    15Peer-to-Peer DatabasesL3S Research Center

    Transformation: send a relation to its neighbor

    R 1 10 S 2 20 T 3 30 V 4 40

    Distributed Databases

    Database fragmentationHorizontal (row distribution)

    Vertical (column and/or table distribution)

    Assumes central coordinator

    Distributing databases (top-down) Have a database

    How to split and allocate to individual sites

    16Peer-to-Peer DatabasesL3S Research Center

    Integrating databases (bottom-up)Combine existing databases

    How to deal with heterogeneity & autonomy

    •16

  • Fragmentation: Local View

    Relational database at each node

    Examples (Dublin Core Metadata)

    Node1.DocNode2.DocIdentifier Title Date Format Language521354021 Csdoi sdofi sfi sfdsf 1948 Book de593574021 Deor aodfi sdfwe dls 1952 Book de534536021 Toid sdofij cvcdova 1937 Book de528943021 Csdo asofdi weor 1916 Book de529874521 Epodsf csmieo mo 1924 Book de526983221 Awer fzwe xhzpwf 1959 Book de

    Identifier Title Date Language Coverage1861978766 Eoite odsifj woifj 1993 en Scotland1394875966 Oewr svonwe 2005 en Wales1817305606 Psadoifh sdafns dsf 1999 en York1809239086 Vsd sdfokj sfew 2001 en West Midlands1345398705 Wdfj vspo sdfp dort 1989 en London

    17Peer-to-Peer DatabasesL3S Research Center

    Fragmentation: Distributed View

    Table fragments distributed over nodes

    DocIdentifier Title Date Format Language Coverage521354021 Csdoi sdofi sfi sfdsf 1948 Book de593574021 Deor aodfi sdfwe dls 1952 Book de534536021 Toid sdofij cvcdova 1937 Book de•Peer1

    DocIdentifier Title Date Format Language Coverage521354021 Csdoi sdofi sfi sfdsf 1948 Book de593574021 Deor aodfi sdfwe dls 1952 Book de534536021 Toid sdofij cvcdova 1937 Book de534536021 Toid sdofij cvcdova 1937 Book de528943021 Csdo asofdi weor 1916 Book de529874521 Epodsf csmieo mo 1924 Book de526983221 Awer fzwe xhzpwf 1959 Book de1861978766 Eoite odsifj woifj 1993 en Scotland1394875966 Oewr svonwe 2005 en Wales1817305606 Psadoifh sdafns dsf 1999 en York1809239086 Vsd sdfokj sfew 2001 en West Midlands1345398705 Wdfj vspo sdfp dort 1989 en London

    •Peer1

    •Peer2 •Peer2

    •Peer3

    •Peer4

    j528943021 Csdo asofdi weor 1916 Book de529874521 Epodsf csmieo mo 1924 Book de526983221 Awer fzwe xhzpwf 1959 Book de

    18Peer-to-Peer DatabasesL3S Research Center

    Goal: View network as one logical database

  • Distributed Query Processing

    Parsing Given SQL query, generate one or more algebraic query trees

    LocalizationRewrite query trees, replacing relations by fragments

    OptimizationGiven cost model + one or more localized query trees

    Produce minimum cost query execution plan

    19Peer-to-Peer DatabasesL3S Research Center

    Distributed Databases

    Parser Rewriter/ Optimizer

    Logical Query Plan Code

    Generator

    PhysicalQuery Plan Query

    Execution Engine

    ExecutableQuery Plan

    Partial Query Plans

    Catalog (Meta Data)

    Mediator

    QEEParser

    Data Provider

    QEEParser QEEParser

    Data Provider Data ProviderCatalog Data

    Partial Query Plans

    20Peer-to-Peer DatabasesL3S Research Center

    QEE

    Base Data

    Parser

    Catalog

    QEE

    Base Data

    Parser

    Catalog

    QEE

    Base Data

    Parser

    Catalog

  • Distributed Query Plan

    DocumentIdTitleLangSubject

    AuthorDocIdPersonId

    PersonIdNameSurname

    21Peer-to-Peer DatabasesL3S Research Center

    Issues

    Communication costsConnection characteristicsExpected size of (intermediate) answers

    N d h t i tiNode characteristicsSize of fragmentsStorage capacity, storage cost at sitesProcessing power at the nodes

    Query processing strategyHow are joins done?

    22Peer-to-Peer DatabasesL3S Research Center

    Where are answers collected?

    Fragment replicationUpdate costConcurrency control overhead

  • Query Optimization

    Generate query execution plans

    Estimate cost of each plan

    Choose minimum cost plan

    What’s different for distributed DB?New strategies for some operations (join, sort, aggregation, etc.)

    Many ways to assign and schedule processors

    Some factors besides number of IO’s in the cost model

    23Peer-to-Peer DatabasesL3S Research Center•23

    Cost estimation

    In centralized systems - estimate sizes of intermediate relations

    For distributed systemsTransmission cost/time may dominate

    Account for parallelism

    Work at site

    Work at site

    T1 T2 answer

    50 IOsPlan APlan B

    24Peer-to-Peer DatabasesL3S Research Center

    Data distribution and result re-assembly cost/time

    100 IOs

    70 IOs20 IOs

  • Optimization in distributed DBs

    Two levels of optimization

    Global optimizationGiven localized query and cost function

    Output optimized (min. cost) query plan that includes relational and communication operations on fragments

    Local optimizationAt each site involved in query execution

    Portion of the query plan at a given site optimized using techniques from centralized DB systems

    25Peer-to-Peer DatabasesL3S Research Center

    from centralized DB systems

    •25

    Overview

    1. Why Peer-to-Peer Databases?1. Federation2. Information integration3. Sensor networks4. ‘New’ internet

    2. Distributed Databases

    3. P2P Databases1. Challenges2. Design Dimensions

    4. Existing P2P Database systems1 Edutella: focus on expressivity

    26Peer-to-Peer DatabasesL3S Research Center

    1. Edutella: focus on expressivity2. Piazza: focus on integration3. PIER: focus on scalability4. HiSbase: focus on scalability for spatial data

  • Challenges of Schema-Based Peer-to-Peer Networks

    Multi-Dimensional Search SpaceDHTs only work for one dimension (one attribute)

    Schema HeterogeneitySources use different database schemas for similar information

    Potentially large result setsSELECT * FROM Firewalls.BlockedPackets ...

    Range and Aggregate Queries

    And the usual P2P challenges...

    27Peer-to-Peer DatabasesL3S Research Center

    And the usual P P challenges...Trust

    Network Churn

    Unbalanced Popularity

    Design Dimensions

    Network PropertiesData Placement

    Topology and Routing

    Data AccessData Model

    Query Language

    Integration MechanismMapping Representation

    M i C ti

    28Peer-to-Peer DatabasesL3S Research Center

    Mapping Creation

    Integration Method

  • Data Placement

    Placement according to ownershipData stays at information source

    Full control of data by owner (access policy, availability, etc.)

    More autonomy of single nodes

    Placement according to search strategyData is distributed according to later access mechanism (e.g., DHT)

    No control over data access

    More freedom to optimize query routing

    29Peer-to-Peer DatabasesL3S Research Center

    More freedom to optimize query routing

    Additional caching/replication possibleEssential for load balancing

    Topology and Routing (1)

    Unstructured NetworksFlooding as routing algorithm

    Supports arbitrary expressive queries

    Agnostic to schema heterogeneity

    Inefficient (filtered flooding can help)

    Short-cut networksUnstructured, but continuously optimize network connections

    Can develop into regular structures like Small World networks

    30Peer-to-Peer DatabasesL3S Research Center

    Can develop into regular structures like Small-World networks

    Clustering + filtered flooding reduces query distribution

    Fireworks routing

  • Topology and Routing (2)

    Super-peer networksInherits advantages and disadvantages of unstructured network

    Better efficiency and scaling (but still flooding)

    Good match to distributed databases (super-peers become mediators)

    DHT NetworksCreate separate overlay for each attribute

    Or use Multidimensional DHTs, e.g. Mercury

    Limited query expressivity

    31Peer-to-Peer DatabasesL3S Research Center

    Limited query expressivity

    Suitable for homogeneous schema

    Not all queries are evaluated efficiently

    Topology and Routing - Summary

    Local indexing No knowledge about other peers

    Doesn‘t scale

    Central indexingOne node holds complete index

    Distributed indexingDistributed Hash Tables

    Single point of control (and failure)

    32Peer-to-Peer DatabasesL3S Research Center

    Filtered Flooding

    Short-cut networks

    Super-peer networks

  • Data Model

    Fixed set of attributesAllows for sophisticated topologies

    Inflexible

    Applicability: custom applications

    Relational modelusual database model

    not designed for distribution

    XML

    33Peer-to-Peer DatabasesL3S Research Center

    XML

    RDFSemantic Web exchange format

    very suitable for distributed data

    Query Language

    NoneFixed set of parameterized queries

    Relational query languageAlways subset of SQL

    XML query languageXPath or XQuery

    34Peer-to-Peer DatabasesL3S Research Center

    RDF Query LanguageSPARQL or its predecessors

    Logic language

  • Mapping Representation

    DeclarativeTranslation between schema elements

    Distributed database approaches applicable

    ProceduralImperative description how to translate/transform queries and data

    Mapping characteristicsUnidirectional or Bidirectional

    Si l ( t ) i l i

    35Peer-to-Peer DatabasesL3S Research Center

    Simple (one-to-one) mapping or complex mappings

    Mapping of objectsstate equality of objects in different sources

    Mapping Creation

    ManualUsers create mappings

    Network distributes mappings and uses them for translation

    Semi-automaticSystem proposes mappings, based on heuristics

    attribute name

    similar data

    User feedback used to validate created mappings

    36Peer-to-Peer DatabasesL3S Research Center

    Automatice.g., probabilistic mapping

    similar techniques as for semi-automatic mapping

  • Integration Mechanism

    Query RewritingQuery is translated to target schema

    Data is translated back to source schema

    Most common approach

    Data RewritingData is replicated to source schema

    Only feasible for small data sets

    37Peer-to-Peer DatabasesL3S Research Center

    Existing Systems - Typology

    Focus on network scalabilityhomogeneous schema

    low query expressivity

    DHT as underlying network structure

    Focus on expressivitysuper-peer or unstructured

    unlimited query complexity

    F c i te ati

    38Peer-to-Peer DatabasesL3S Research Center

    Focus on integrationtypically unstructured

    query routing driven by mappings

  • Existing Systems – Overview

    Name Topology Data

    Placement

    Data

    Model

    Query Language

    Scalability PIER DHT (Bamboo) Distributed Relational SQL subset

    RDFPeers DHT (MAAN) Distributed RDF -

    Mercury DHT (Symphony) Distributed Tuples -

    Expressivity SQPeer Super-peer Owner RDF RQL

    PeerDB Unstructured Owner Relational SQL subset

    Edutella Super-peer Owner RDF datalog (SQL)

    Integration Piazza Unstructured Owner XML XQuery subset

    GridVine DHT (P-Grid) Distributed RDF -

    39Peer-to-Peer DatabasesL3S Research Center

    List not complete

    DRAGO Unstructured Owner Descr. Logics OWL subset

    Overview

    1. Why Peer-to-Peer Databases?1. Federation2. Information integration3. Sensor networks4. ‘New’ internet

    2. Distributed Databases

    3. P2P Databases1. Challenges2. Design Dimensions

    4. Existing P2P Database systems1 Edutella: focus on expressivity

    40Peer-to-Peer DatabasesL3S Research Center

    1. Edutella: focus on expressivity2. Piazza: focus on integration3. PIER: focus on scalability4. HiSbase: focus on scalability for spatial data

  • Edutella: Introduction

    Initial Goal: Achieve interoperability between heterogeneous metadata-driven (e-learning) systems

    Provides metadata only, not the resourcesResources are fetched via http

    Query Examples“Find software engineering course lecture notes for undergraduates in German language”

    “Find an introduction to Enterprise Java Beans for professionals”

    41Peer-to-Peer DatabasesL3S Research Center

    “Find a distance course in software requirements analysis from a Swedish university”

    Query Service

    provides standardized query/retrieval of RDF metadata stored in distributed RDF repositories

    Query Exchange LanguageQuery Exchange LanguageBased on Datalog (allows expression of rules)

    RDF syntax

    For exchange only

    Adapters to enable QEL query processing on diverse backends

    42Peer-to-Peer DatabasesL3S Research Center

    backends

  • Query processing

    Parsers/Formatters convert between query languages

    Applications and Backends are shielded from communication layer

    Query messages are exchanged in RDF/XML formatEd

    utel

    laC

    onsu

    mer

    Inte

    rfac

    e

    Que

    ryPa

    rser

    Edut

    ella

    Prov

    ider

    Inte

    rfac

    e

    Que

    ryFo

    rmat

    ter

    43Peer-to-Peer DatabasesL3S Research Center

    Wrappers available for SQL, RDQL, RQL, and others

    Edutella Topology

    Super-Peers

    Content Providers

    Content Consumers

    Use filtered flooding in super-peer backbone

    HyperCuP topology

    44Peer-to-Peer DatabasesL3S Research Center

    for backbone

  • Cayley Graphs

    Graph representing a permutation group G, described by a set of generators

    Regular, vertex-symmetric, recursively decomposable

    Optimal routing and broadcast algorithms exist

    1

    0

    10

    1

    0 1

    0

    10

    1

    0

    10 10

    2 2

    2

    2

    2

    2 2

    22

    a b

    2 2dc

    2 2ab

    1234

    2134

    3124

    1324

    2314

    3214

    4231

    2431

    3421

    4321

    2341

    3241

    3412

    14324312

    2413

    14234213

    2

    11

    3 0

    4 5

    7

    0

    11

    6 0

    2 2

    45Peer-to-Peer DatabasesL3S Research Center

    0

    10

    1 0

    10

    1

    222

    cd

    4132

    3142

    1342 4123

    2143

    1243

    8 10

    22

    Hypercube Star Graph

    Super-peer Topology: HyperCuP

    Super-peers are arranged as hypercube

    Broadcast needs n-1 messages, log2(n) hops

    High connectivity, resilient against node failures

    SP1 SP3

    SP2

    SP7

    SP5

    SP6

    Minimal spanning tree

    46Peer-to-Peer DatabasesL3S Research Center

    SP8SP4

  • Super-Peer-based Query Routing

    Database fragment summaries

    Index structure and maintenance

    Query Routing

    47Peer-to-Peer DatabasesL3S Research Center

    Peer Fragment Summaries

    Peer1.DocIdentifier Title Date Format Language521354021 Csdoi sdofi sfi sfdsf 1948 Book de593574021 Deor aodfi sdfwe dls 1952 Book de

    Peer2.DocIdentifier Title Date Language Coverage1861978766 Eoite odsifj woifj 1993 en Scotland1394875966 Oewr svonwe 2005 en Wales

    534536021 Toid sdofij cvcdova 1937 Book de528943021 Csdo asofdi weor 1916 Book de529874521 Epodsf csmieo mo 1924 Book de526983221 Awer fzwe xhzpwf 1959 Book de

    1817305606 Psadoifh sdafns dsf 1999 en York1809239086 Vsd sdfokj sfew 2001 en West Midlands1345398705 Wdfj vspo sdfp dort 1989 en London

    Peer1Doc Identifier

    Peer2Doc Identifier

    48Peer-to-Peer DatabasesL3S Research Center

    Doc.IdentifierDoc.TitleDoc.Date[1916-1959]Doc.Format [Book]Doc.Language[de]

    Doc.IdentifierDoc.TitleDoc.Date[1989-2005]Doc.Language[de]Doc.Coverage[UK]

  • Super-peer/Peer Indices

    Peer1 SummaryDoc.IdentifierDoc.Title

    Peer2 SummaryDoc.IdentifierDoc.Title

    Peers forward summary to super-peer

    Super-Peer1 SP/P IndexDoc.Identifier P1, P2Doc.Title P1, P2Doc.Date[1916-1959] P1

    Doc.Date[1916-1959]Doc.Format [Book]Doc.Language[de]

    Doc.Date[1989-2005]Doc.Language[en]Doc.Coverage[UK]

    49Peer-to-Peer DatabasesL3S Research Center

    [1989-2005] P2Doc.Format [Book] P1Doc.Language[de]

    [en]P1P2

    Doc.Coverage[UK] P2

    Super-Peer Fragment Summaries

    DocIdentifier Title Date Format Language Coverage521354021 Csdoi sdofi sfi sfdsf 1948 Book de593574021 Deor aodfi sdfwe dls 1952 Book de534536021 Toid sdofij cvcdova 1937 Book de528943021 Csdo asofdi weor 1916 Book de•Super-Peer1

    SP1 SummaryDoc.Identifier528943021 Csdo asofdi weor 1916 Book de

    529874521 Epodsf csmieo mo 1924 Book de526983221 Awer fzwe xhzpwf 1959 Book de1861978766 Eoite odsifj woifj 1993 en Scotland1394875966 Oewr svonwe 2005 en Wales1817305606 Psadoifh sdafns dsf 1999 en York1809239086 Vsd sdfokj sfew 2001 en West Midlands1345398705 Wdfj vspo sdfp dort 1989 en London

    p Doc.IdentifierDoc.TitleDoc.Date[1916-2005]Doc.Format [Book]Doc.Language[de, en]Doc.Coverage[UK]

    50Peer-to-Peer DatabasesL3S Research Center

  • Super-peer/Super-peer Indices

    0SP1 SP2

    SP1 SummaryDoc.IdentifierDoc.TitleDoc.Date[1916-2005]Doc.Format [Book]Doc Language[de en]

    Super-Peer2 SP/SP Index… …Doc.Language[de]

    [en]SP1SP1

    … …

    1 1

    0

    SP1

    SP3 SP4

    SP2Doc.Language[de, en]Doc.Coverage[UK]

    Super-Peer3 SP/SP Index… …Doc.Language[de]

    [en]SP1SP1

    Super-Peer4 SP/SP Index… …Doc.Language[de]

    [en]SP2,SP3SP2,SP3

    Super-Peer4 SP/SP Index… …Doc.Language[de]

    [en]SP2SP2

    51Peer-to-Peer DatabasesL3S Research Center

    Naively forwarding is not optimal

    [ ] 1… …

    [en] SP2,SP3… …

    [en] SP2… …

    Super-peer/Super-peer Indices

    SP1 Summary

    Take edge dimension into accountforward SP/SP index entries only along lower edges

    Super-Peer2 SP/SP Index

    0

    1 1

    0

    SP1

    SP3 SP4

    SP2

    y…Doc.Language[de, en]…

    Super-Peer3 SP/SP Index

    … …Doc.Language[de]

    [en]SP1 (0)SP1 (0)

    … …

    Super-Peer4 SP/SP IndexSuper-Peer4 SP/SP Index

    52Peer-to-Peer DatabasesL3S Research Center

    Super Peer3 SP/SP Index… …Doc.Language[de]

    [en]SP1 (1)SP1 (1)

    … …

    Super Peer4 SP/SP Index… …Doc.Language[de]

    [en]SP3 (0)SP3 (0)

    … …

    Super Peer4 SP/SP Index… …Doc.Language[de]

    [en]… …

  • Query Routing

    Use SP/P and SP/SP indices as filters

    SELECT * FROM Doc WHERE Language=”de“ AND …

    Super-Peer1 SP/P Index

    Super-Peer3 SP/SP Index… …D L [d ] SP (1)

    Super-Peer4 SP/SP Index… …D L [d ] SP (0)

    p… …Doc.Language[de]

    [en]P1P2

    … …

    53Peer-to-Peer DatabasesL3S Research Center

    Doc.Language[de][en]

    SP1 (1)SP1 (1)

    … …

    Doc.Language[de][en]

    SP3 (0)SP3 (0)

    … …

    Application: P2P Digital Library Network

    Large amount of individual DLs

    Autonomous institutions

    Users have to

    •blah

    •blah

    •blah

    find relevant DLs

    search separately on every found DL

    Violates 4th law of Library Science“Save the time of the reader”(R th 1931)

    54Peer-to-Peer DatabasesL3S Research Center

    (Ranganathan, 1931)

  • DL Search Engine Solution

    Search engine approach‚Crawl‘ DLs

    Copy Content

    •blah

    •blah

    •blah

    Offer unified collection

    IssuesSearch engine controls content

    Proprietary interface(or just Web crawl)

    55Peer-to-Peer DatabasesL3S Research Center

    (or just Web crawl)

    Difficult to preserve metadata

    Single point of failure

    Open Archive Initiative Solution

    Standardize metadata ‚Crawling‘ interfaceOAI-PMH (Protocol for Metadata Harvesting)

    •blah

    •blah

    •blah

    Harvesterscollect metadata from DLs

    offer search facilities

    IssuesN i l t i t

    56Peer-to-Peer DatabasesL3S Research Center

    No single entry point

    Harvesters control content

    Points of failure

    Incentive for Harvester?

  • From OAI to P2P

    Create ‘peer wrapper’ for existing DLs

    Super-peer

    backbone

    Digital Libraries

    57Peer-to-Peer DatabasesL3S Research Center

    OAI-PMH Interface

    Content

    Providers

    OAI-P2P – a Digital Library Network

    P2P approach:DLs form self-organized network

    User queries are distributed

    •blah

    •blah

    •blah

    AdvantagesNo dependency on service provider

    Each DL still controls its content

    No single point of failure

    58Peer-to-Peer DatabasesL3S Research Center

    5th law of Library Science:“The library is a growing organism”(Ranganathan, 1931)

  • Edutella – Discussion

    Efficiently limits query distribution to relevant peers

    Very good scalability in terms of data sizeNo data movement required

    Little index maintenance efforts

    Flooding limits super-peer backbone scalabilityWill never scale to millions of peers

    Mainly query forwarding

    59Peer-to-Peer DatabasesL3S Research Center

    Initial extension to full query planning exists

    No load-balancing mechanisms

    Overview

    1. Why Peer-to-Peer Databases?1. Federation2. Information integration3. Sensor networks4. ‘New’ internet

    2. Distributed Databases

    3. P2P Databases1. Challenges2. Design Dimensions

    4. Existing P2P Database systems1 Edutella: focus on expressivity

    60Peer-to-Peer DatabasesL3S Research Center

    1. Edutella: focus on expressivity2. Piazza: focus on integration3. PIER: focus on scalability4. HiSbase: focus on scalability for spatial data