32
©2012, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis Distributed Data Management Part 3 - Peer-2-Peer Systems (cont)

P2P Systems

Embed Size (px)

Citation preview

Page 1: P2P Systems

©2012, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis

Distributed Data ManagementPart 3 - Peer-2-Peer Systems (cont)

Page 2: P2P Systems

©2012, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis

Overview

1.P2P Systems and Resource Location2.Unstructured P2P Overlay Networks3.Hierarchical P2P Overlay Networks4.Structured P2P Overlay Networks5.Small World Graphs6.P2P Data Management

Page 3: P2P Systems

©2012, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis

4. Structured P2P Overlay Networks• Unstructured overlay networks – what we

learned– simplicity (simple protocol) – robustness (almost impossible to “kill” – no central

authority)

• Performance– search latency O(log n), n number of peers– update and maintenance cost low

Page 4: P2P Systems

©2014, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis

Structured P2P Overlay Networks

• Drawbacks– high bandwidth consumption for search – free riding

• Can we do better?

Page 5: P2P Systems

©2014, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis

Efficient Resource Location

search cost

maximal bandwidth

update cost

low low

low

high

high

high

UNSTRUCTURED P2POVERLAY NETWORKS(e.g. Gnutella)

SERVER, SUPERPEERS(e.g. Napster)

FULL REPLICATION

STRUCTURED P2P OVERLAYNETWORKS(e.g. prefix routing)

Page 6: P2P Systems

©2012, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis

Structured P2P Overlay Networks• Goal: efficient search using few messages

without designated servers• Easy: distribution of index information over all

peers– every peer maintains and provides part of the

index information (k, p)• Difficult: distributing the access structure to

support efficient search– Realized by an “overlay network”

Page 7: P2P Systems

©2014, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis

Structured P2P Overlay Networks

• Problem Illustration:

index information I

server

accessstructure

peers (storing resources)

peers (storing resources and index information)

I1 I2 I3 I4

?Search starts here

Where to start the search?How to locate the index information?

Overlay network = Access structure

Page 8: P2P Systems

©2012, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis

Example: Scalable Distributed Tries (P-Grid)

• Search trie: search keys are binary keys

000 001 010 011 100 101 110 111

00? 01? 10? 11?

0?? 1??

???

access structure

101?

101?

101?

101!

indexitems

Page 9: P2P Systems

©2012, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis

Non-scalable Distribution of Search Tree

• Distribute search tree over peers

000 001 010 011 100 101 110 111

00? 01? 10? 11?

0?? 1??

???

peer 1 peer 2 peer 3 peer 4

bottleneck

Page 10: P2P Systems

©2012, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis

"Napster"bottleneck

Scalable Distribution of Search Tree

000 001 010 011 100 101 110 111

00? 01? 10? 11?

0?? 1??

???

peer 1 peer 2 peer 3 peer 4

Associate each peer with a complete path

Page 11: P2P Systems

©2012, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis

Routing Information

100 101

10?

1??

???

peer 1 peer 2

peer 3

peer 4

know more about this part of the tree

knows more about this part of the tree

Page 12: P2P Systems

©2012, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis

Prefix Routing

11?

1??

???

peer 4

peer 1peer 2

peer 3

110 111

100 101

10?

1??

???

peer 1peer 2

c3

peer 4

101?

101?

101?

101?

101!

Messageto peer 3

101?

prefix peer

0?? peer1 peer2

10? peer3

routing tableof peer 4

Page 13: P2P Systems

P-Grid Routing Tables and Search

©2012, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis

search(p, k)if k=path(p) then return(p) //found else find in routing table peeri with longest prefix matching k search(peeri, k)

Peer with path p = p1,..,pl ,pi {0,1}, stores routing table:• For prefix p1,..,pj, j=1,..,l a constant

number r of references to peers with path p1,..,1-pj

• A constant number r of references to replicas with the same path

0 1

01 00

Example: routing table of apeer with path 01101

011 010

0110 0111

01101 01100

P1: 100P2: 1100

P3: 00110P4: 0000

P5: 01011P6: 0100

P7: 01110P8: 01111

P9: 01100P10: 01100

P11: 01101P12: 01101

Search cost bound by routing table size: log2(n) for balanced tree

replicas

Page 14: P2P Systems

Questions• The index information in a structured overlay network

1. Provides references to route a search request within the overlay network2. Provides for a given key the reference to the peer that stores the resource3. Is replicated in routing tables to support redundant search paths

• For the given routing table, the search request for the key 0101 is routed

1. Always to peer P52. Either to peer P5 or P63. Either to peer P3, P4, P5 or P6

©2012, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis

0 1

01 00

011 010

0110 0111

01101 01100

P1: 100P2: 1100

P3: 00110P4: 0000

P5: 01011P6: 0100

P7: 01110P8: 01111

P9: 01100P10: 01100

P11: 01101P12: 01101replicas

Page 15: P2P Systems

©2012, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis

Structured P2P Overlay Network Approaches

• Different strategies– P-Grid: distributing a binary search tree– Chord: constructing a distributed hash table– CAN: Routing in a d-dimensional space– FreeNet: caching index information along search

paths

Page 16: P2P Systems

©2014, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis

Structured P2P Overlay Network Approaches

• Commonalities– each peer maintains a small part of the index

information – each peer maintains a small routing table for

routing in the overlay network– searches are performed by directed message

forwarding• Differences– performance and qualitative criteria

Page 17: P2P Systems

©2012, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis

Example 2: Distributed Hash Tables (Chord)

• Hashing of search keys AND peer addresses on binary keys of length m– e.g. m=5, key("jingle-bells.mp3")=4, key(196.178.0.1)=19

• Data keys are stored at peer with next larger peer keypeer with hashed identifier p, data with hashed identifier k, if k ] predecessor(p), p ]then k stored at p

m=532 keys

p1 = predecessor(k)

p2 = successor(k)p3

k

storedat

Search strategies1. every peer knows all others

O(n) routing table size2. peers know successor only

O(n) search cost

0 1 2

Page 18: P2P Systems

©2012, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis

Chord Routing Tables

• Idea: every peer knows m peers at exponentially increasing distancePeer p stores its successor(p) and the

first peer with hashed identifier si such that si =successor(p+2i-1) for i=1,..,mWe write also si = finger(i, p)

p p+2p+4

p+1

p+8

p+16

s1, s2, s3

s4

s5

p2

p3p4

i si

1 p2

2 p2

3 p2

4 p3

5 p4

Routing table size: m

Page 19: P2P Systems

©2012, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis

Search in Chordsearch(p, k)find in routing table largest (i, p*) such that p* [p,k[if such a p* exists then search(p*, k)else return (successor(p)) // found

p p+2p+4

p+1

p+8

p+16

s1, s2, s3

s4s5

k1k2Expected search cost: O(log n)

p2

p3p4

Page 20: P2P Systems

©2012, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis

Length of Search Paths

Network size n=2^12

100 2^12 keys

Path length ½ log2(n)

Page 21: P2P Systems

Maintenance of Chord• Maintain the integrity of routing tables if peers join or leave• Example Chord: New node q joining the network

– Successor nodes need to be updated: successor(p) = q, successor(q) = p2– Finger tables need to be updated: both at new and existing peers

©2012, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis

p p+2p+4

p+1

p+8

p+16

p2

p3p4

q

i si

1 q

2 q

3 p2

4 p3

5 p4

i si

1 p2

2 p2

3 p3

4 p3

5 p4

routing tableof p

routing tableof q

Expected cost: O(log^2 n)

Page 22: P2P Systems

Question• When routing in Chord

1. The next hop is always uniquely determined2. The next hop can be chosen among a constant

number of possible candidates3. The next hop can be chosen among log n possible

candidates

©2012, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis

Page 23: P2P Systems

©2014, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis

Question• When adding q to the Chord ring: in the

routing table of p1.Entries for i=1,2,3,4 change2.The entry for i=4 changes3.The entry for i=5 changes4.No entry changes

p p+2p+4

p+1

p+8

p+16

p2

p3p4

q

i si

1 p2

2 p2

3 p2

4 p3

5 p4

routing tableof p

Page 24: P2P Systems

©2012, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis

Example 3: Topological Routing (CAN)

• Based on hashing of keys into a d-dimensional space (a torus)– Each peer is responsible for keys of a subvolume of the space (a zone)– Each peer stores the addresses of peers responsible for the

neighboring zones for routing– Search requests are greedily forwarded to the peers in the closest

zones

Page 25: P2P Systems

CAN Zones• Example: d=2– Space is recursively split along each dimensions as

more peers join

– Peers maintain references to their neighboring zones: neighbors(p1) = {p2,p3}©2012, Karl Aberer, EPFL-IC, Laboratoire de

systèmes d'informations répartis

p1 p1 p2 p1p2

p3

etc.

Page 26: P2P Systems

CAN Routing Tables and Search

©2012, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis

Each peer p stores a routing table with 2d entries containing the two closest 2 neighbors in each dimension

neighbors(p1) = {p2,p3,p4,p5}neighbors(p3) = {p1,p6,p7,p4}

Example: search starting at p7, p8 for p6

p1p2

p3

p4

p5

p6

p8 p7search(p,k)if p=k then found elsefind among neighbors p* with minimal Euclidean distance to k, search(p*,k)

Routing table size:2dExpected search cost: O(d n^(1/d))

Page 27: P2P Systems

Network Join in CAN• Node joining the network

– Choses address (coordinate in d-dim. space)– Performs search for the address– Splits the region with the node currently managing it– Updates to own and neighboring nodes routing tables

©2012, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis

p1p2

p3

p4

p5

p6

p8 p7p1p2

p3

p4

p5

p6

p7p8 p9

*

*

*

*

Neighbors(p8) = {p5,p9,*,*}Neighbors(p9) = {p8,p7,*,*}Neighbors(p8) = {p5,p7,*,*}

Expected cost: O(d n^(1/d))

Page 28: P2P Systems

Multiple Realities• r different coordinate spaces

– Peers hold a zone in each of them– Creates r replicas of the (key, value) pairs and increases robustness– Reduces path length as search can be continued in the reality where the target

is closest

©2012, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis

p1

p1

Page 29: P2P Systems

©2012, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis

CAN Path Length

Page 30: P2P Systems

©2012, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis

Increasing Dimensions and Realities

Page 31: P2P Systems

Question• When adding n peers to CAN the number of

zones1. Is exactly n2. It depends what the keys of the peers were3. It depends on the dimensionality of the key space

©2012, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis

Page 32: P2P Systems

©2014, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis

Question

• In CAN, for a fixed dimensionality d>2, when moving from 1 to 2 realities1. The number of entries in the routing table

increases by 22. The number of entries in the routing table

increases by d3. The number of entries in the routing table

doubles