73
1 Improve search in unstructured P2P overlay

1 Improve search in unstructured P2P overlay. 2 Peer-to-peer Networks Peers are connected by an overlay network. Users cooperate to share files (e.g.,

Embed Size (px)

DESCRIPTION

3 (Search in) Basic P2P Architectures Centralized : central directory server. (Napster) Decentralized: search is performed by probing peers  Structured (DHTs): (Can, Chord,…) location is coupled with topology - search is routed by the query. Only exact-match queries, tightly controlled overlay.  Unstructured: (Gnutella) search is “blind” - probed peers are unrelated to query.

Citation preview

Page 1: 1 Improve search in unstructured P2P overlay. 2 Peer-to-peer Networks Peers are connected by an overlay network. Users cooperate to share files (e.g.,

1

Improve search in unstructured P2P overlay

Page 2: 1 Improve search in unstructured P2P overlay. 2 Peer-to-peer Networks Peers are connected by an overlay network. Users cooperate to share files (e.g.,

2

Peer-to-peer NetworksPeers are connected by an

overlay network.Users cooperate to share

files (e.g., music, videos, etc.)

Page 3: 1 Improve search in unstructured P2P overlay. 2 Peer-to-peer Networks Peers are connected by an overlay network. Users cooperate to share files (e.g.,

3

(Search in) Basic P2P ArchitecturesCentralized: central directory server. (Napster)Decentralized: search is performed by probin

g peersStructured (DHTs): (Can, Chord,…) location i

s coupled with topology - search is routed by the query.

Only exact-match queries, tightly controlled overlay.Unstructured: (Gnutella) search is “blind” - pr

obed peers are unrelated to query.

Page 4: 1 Improve search in unstructured P2P overlay. 2 Peer-to-peer Networks Peers are connected by an overlay network. Users cooperate to share files (e.g.,

4

Topics

Search strategies Beverly Yang and Hector Garcia-Molina, “Improving Search in Peer-to-Peer Networks”, ICDCS 2002 Arturo Crespo, Hector Garcia-Molina, “Routing Indices For Peer-to-Peer Systems”, ICDCS 2002

Short cuts Kunwadee Sripanidkulchai, Bruce Maggs and Hui Zhang, “Efficient Content Location Using Interest-based Locality in Peer-to-Peer Systems”, infocom 2003.

Replication Edith Cohen and Scott Shenker, “Replication Strategies in Unstructured Peer-to-Peer Networks”, SIGCOMM 2002.

Page 5: 1 Improve search in unstructured P2P overlay. 2 Peer-to-peer Networks Peers are connected by an overlay network. Users cooperate to share files (e.g.,

5

Improving Search in Peer-to-Peer Networks

ICDCS 2002

Beverly YangHector Garcia-Molina

Page 6: 1 Improve search in unstructured P2P overlay. 2 Peer-to-peer Networks Peers are connected by an overlay network. Users cooperate to share files (e.g.,

6

Motivation The propose of a data-sharing P2P system is to

accept queries from users, and locate and return data (or pointers to the data).

Metrics Cost

Average aggregate bandwidth Average aggregate processing cost

Quality of results Number of results Satisfaction : a query is satisfied if Z (a value specified by

user) or more results are returned. Time to satisfaction

Page 7: 1 Improve search in unstructured P2P overlay. 2 Peer-to-peer Networks Peers are connected by an overlay network. Users cooperate to share files (e.g.,

7

Current Techniques

Gnutella BFS with depth limit D. Waste bandwidth and processing resources

Freenet DFS with depth limit D. Poor response time.

Page 8: 1 Improve search in unstructured P2P overlay. 2 Peer-to-peer Networks Peers are connected by an overlay network. Users cooperate to share files (e.g.,

8

Broadcast policies

Iterative deepeningDirected BFSLocal Indices

Page 9: 1 Improve search in unstructured P2P overlay. 2 Peer-to-peer Networks Peers are connected by an overlay network. Users cooperate to share files (e.g.,

9

Iterative DeepeningIn system where satisfaction is the metric of

choice, iterative deepening is a good technique

Under policy P= { a, b, c} ;waiting time W A source node S first initiates a BFS of depth a The query is processed and then becomes

frozen at all nodes that are a hops from the source

S waiting for a time period W

Page 10: 1 Improve search in unstructured P2P overlay. 2 Peer-to-peer Networks Peers are connected by an overlay network. Users cooperate to share files (e.g.,

10

Iterative Deepening

If query is not satisfied, S will start the next iteration, initiating a BFS of depth b.

S send a “Resend” with a TTL of a A node that receives a Resend message will simply forward t

he message or if the node is at depth a, it will drop the resend message and unfreeze the corresponding query by forwarding the query message with a TTL of b-a to all its neighbors

A node need only freeze a query for slightly more than W time units before deleting it

Page 11: 1 Improve search in unstructured P2P overlay. 2 Peer-to-peer Networks Peers are connected by an overlay network. Users cooperate to share files (e.g.,

11

Directed BFS

If minimizing response time is important to an application, iterative deepening may not be appropriate

A source send query messages to just a subset of its neighbors

A node maintains simple statistics on its neighbors Number of results received from each neighbor Latency of connection

Page 12: 1 Improve search in unstructured P2P overlay. 2 Peer-to-peer Networks Peers are connected by an overlay network. Users cooperate to share files (e.g.,

12

Directed BFS (cont)

Candidate nodes Returned the Highest number of results The neighbor that returns response messages

that have taken the lowest average number of hops

High message count

Page 13: 1 Improve search in unstructured P2P overlay. 2 Peer-to-peer Networks Peers are connected by an overlay network. Users cooperate to share files (e.g.,

13

Local Indices

Each node n maintains an index over the data of all nodes within r hops radius.

All nodes at depths not listed in the policy simply forward the query.

Example: policy P= { 1, 5}

Page 14: 1 Improve search in unstructured P2P overlay. 2 Peer-to-peer Networks Peers are connected by an overlay network. Users cooperate to share files (e.g.,

14

Experimental result

Page 15: 1 Improve search in unstructured P2P overlay. 2 Peer-to-peer Networks Peers are connected by an overlay network. Users cooperate to share files (e.g.,

15

Routing Indices For Peer-to-Peer Systems

Arturo Crespo, Hector Garcia-Molina Stanford University {crespo,hector}@db.Stanford.edu

Page 16: 1 Improve search in unstructured P2P overlay. 2 Peer-to-peer Networks Peers are connected by an overlay network. Users cooperate to share files (e.g.,

16

Motivation A key part of a P2P system is document

discovery The goal is to help users find documents with

content of interest across potential P2P sources efficiently

The mechanisms for searching can be classified in three categories Mechanisms without an index Mechanisms with specialized index nodes (centralized

search) Mechanisms with indices at each node (distributed

search)

Page 17: 1 Improve search in unstructured P2P overlay. 2 Peer-to-peer Networks Peers are connected by an overlay network. Users cooperate to share files (e.g.,

17

Motivation (cont.)

Gnutella uses a mechanism where nodes do not have an index Queries are propagated from node to node until matching

documents are found Although this approach is simple and robust, it has the

disadvantage of the enormous cost of flooding the network every time a query is generated

Centralized-search systems use specialized nodes that maintain an index of the documents available in the P2P system like Napster The user queries an index node to identify nodes having

documents with the content A centralized system is vulnerable to attack and it is difficult

to keep the indices up-to-date

Page 18: 1 Improve search in unstructured P2P overlay. 2 Peer-to-peer Networks Peers are connected by an overlay network. Users cooperate to share files (e.g.,

18

Motivation (cont.)

A distributed-index mechanism Routing Indices (RIs) Give a “direction” towards the document, rather than its actual lo

cation By using “routes” the index size is proportional to the nu

mber of neighbors

Page 19: 1 Improve search in unstructured P2P overlay. 2 Peer-to-peer Networks Peers are connected by an overlay network. Users cooperate to share files (e.g.,

19

Peer-to-peer Systems A P2P system is formed by a large number of nodes that

can join or leave the system at any time Each node has a local document database that can be

accessed through a local index The local index receives content queries and returns

pointers to the documents with the requested content

Page 20: 1 Improve search in unstructured P2P overlay. 2 Peer-to-peer Networks Peers are connected by an overlay network. Users cooperate to share files (e.g.,

20

Query Processing in a Distributed Search P2P System

In a distributed-search P2P system, users submit queries to any node along with a stop condition A node receiving a query first evaluates the query against

its own database, returns to the user pointers to any results If the stop condition has not been reached, the node selects

one or more of its neighbors and forwards the query to them Queries can be forwarded to the best neighbors in parallel

or sequentially A parallel approach yields better response time, but

generates higher traffic and may waste resources

Page 21: 1 Improve search in unstructured P2P overlay. 2 Peer-to-peer Networks Peers are connected by an overlay network. Users cooperate to share files (e.g.,

21

Routing indices The objective of a Routing Index (RI) is to allow a node to

select the “best” neighbors to send a query A RI is a data structure that, given a query, returns a list of

neighbors, ranked according to their goodness for the query Each node has a local index for quickly finding local

documents when a query is received. Nodes also have a CRI containing the number of documents along each path the number of documents on each topic

Page 22: 1 Improve search in unstructured P2P overlay. 2 Peer-to-peer Networks Peers are connected by an overlay network. Users cooperate to share files (e.g.,

22

Routing indices (cont.)

Thus, the number of results in a path can be estimated as :

CRI(si) is the value for the cell at the column for topic si and at the row for a neighbor

The goodness of B: 6 C: 0 D: 75

Note that these numbers are just estimates and they are subject to overcounts and/or undercounts

A limitation of using CRIs is that they do not take into account the difference in cost due to the number of “hops” necessary to reach a document

Page 23: 1 Improve search in unstructured P2P overlay. 2 Peer-to-peer Networks Peers are connected by an overlay network. Users cooperate to share files (e.g.,

23

Using Routing Indices

Page 24: 1 Improve search in unstructured P2P overlay. 2 Peer-to-peer Networks Peers are connected by an overlay network. Users cooperate to share files (e.g.,

24

Using Routing Indices (cont.) The storage space required by an RI in a node is m

odest as we are only storing index information for each neighbor

t is the counter size in bytes, c is the number of categories, N the number of nodes, and b the branching factor Centralized index would require t × (c + 1) × N bytes the total for the entire distributed system is t × (c + 1) ×

b × N bytes the RIs require more storage space overall than a c

entralized index, the cost of the storage space is shared among the network nodes

Page 25: 1 Improve search in unstructured P2P overlay. 2 Peer-to-peer Networks Peers are connected by an overlay network. Users cooperate to share files (e.g.,

25

Creating Routing Indices

Page 26: 1 Improve search in unstructured P2P overlay. 2 Peer-to-peer Networks Peers are connected by an overlay network. Users cooperate to share files (e.g.,

26

Maintaining Routing Indices

Maintaining RIs is identical to the process used for creating them

For efficiency, we may delay exporting an update for a short time so we can batch several updates, thus, trading RI freshness for a reduced update cost

We can also choose sending minor updates, but reduce accuracy of the RI

Page 27: 1 Improve search in unstructured P2P overlay. 2 Peer-to-peer Networks Peers are connected by an overlay network. Users cooperate to share files (e.g.,

27

Hop-count Routing Indices

Page 28: 1 Improve search in unstructured P2P overlay. 2 Peer-to-peer Networks Peers are connected by an overlay network. Users cooperate to share files (e.g.,

28

Hop-count Routing Indices (cont.)

The estimator of a hop-count RI needs a cost model to compute the goodness of a neighbor

We assumes that document results are uniformly distributed across the network and that the network is a regular tree with fanout F

We define the goodness (goodness hc) of Neighbor i with respect to query Q for hop-count RI as:

If we assume F = 3, the goodness of X for a query about “DB” documents would be 13+10/3 = 16.33 and for Y would be 0+31/3 = 10.33

Page 29: 1 Improve search in unstructured P2P overlay. 2 Peer-to-peer Networks Peers are connected by an overlay network. Users cooperate to share files (e.g.,

29

Exponentially aggregated RI

Each entry of the ERI for node N contains a value computed as: th is the height and F the fanout of the assumed regular tree, go

odness() is the Compound RI estimator , N[j] is the summary of the local index of neighbor j of N, and T is the topic of interest of the entry

Problems?!

Page 30: 1 Improve search in unstructured P2P overlay. 2 Peer-to-peer Networks Peers are connected by an overlay network. Users cooperate to share files (e.g.,

30

Exponentially aggregated RI (cont.)

Page 31: 1 Improve search in unstructured P2P overlay. 2 Peer-to-peer Networks Peers are connected by an overlay network. Users cooperate to share files (e.g.,

31

Cycles in the P2P Network There are three general approaches for dealing with cycles:

No-op solution: No changes are made to the algorithms Cycle avoidance solution: In this solution we do not allow nodes to

create an “update” connection to other nodes if such connection would create a cycle

Absence of global information Cycle detection and recovery: This solution detects cycles sometime

after they are formed and, after that, takes recovery actions to eliminate the effect of the cycles

Page 32: 1 Improve search in unstructured P2P overlay. 2 Peer-to-peer Networks Peers are connected by an overlay network. Users cooperate to share files (e.g.,

32

Experimental Results

Modeling search mechanisms in a P2P system: We consider three kinds of network topologies:

a tree because it does not have cycles we start with a tree and we add extra vertices at random

(creating cycles) a power-law graph, is considered a good model for P2P

systems and allows us to test our algorithms against a “realistic” topology

We model the location of document results using two distributions: uniform and an 80/20 biased distribution

80/20 assigns uniformly 80% of the document results to 20% of the nodes

In this paper we focus on the network and we use the number of messages generated by each algorithm as a measure of cost

Page 33: 1 Improve search in unstructured P2P overlay. 2 Peer-to-peer Networks Peers are connected by an overlay network. Users cooperate to share files (e.g.,

33

Experimental Results (cont.)

Page 34: 1 Improve search in unstructured P2P overlay. 2 Peer-to-peer Networks Peers are connected by an overlay network. Users cooperate to share files (e.g.,

34

Experimental Results (cont.) In particular, CRI uses all nodes in the network, HRI uses nodes

within a predefined a horizon, and ERI uses nodes until the exponentially decayed value of an index entry reaches a minimum value

In the case of the No-RI approach, an 80/20 document distribution penalizes performance as the search mechanism needs to visit a number of nodes until it finds a content-loaded node

Page 35: 1 Improve search in unstructured P2P overlay. 2 Peer-to-peer Networks Peers are connected by an overlay network. Users cooperate to share files (e.g.,

35

Experimental Results (cont.) RIs perform better in a power-law network than i

n a tree network (Query) In a power-law network a few nodes have a significantly

higher connectivity than the rest Power-law distributions generate network topologies wh

ere the average path length between two nodes is lower than in tree topologies

Page 36: 1 Improve search in unstructured P2P overlay. 2 Peer-to-peer Networks Peers are connected by an overlay network. Users cooperate to share files (e.g.,

36

Experimental Results (cont.) The tradeoff between query and update costs for RIs

The cost of CRI is much higher when compared with HRI and ERI

ERI only propagate the update to a subset of the network

Page 37: 1 Improve search in unstructured P2P overlay. 2 Peer-to-peer Networks Peers are connected by an overlay network. Users cooperate to share files (e.g.,

37

Conclusions

Achieve greater efficiency by placing Routing Indices in each node. Three possible RIs: compound RIs, hopcount RIs, and exponential RIs

From experiments, ERIs and HRI offer significant improvements versus not using an RI, while keeping update costs low

Page 38: 1 Improve search in unstructured P2P overlay. 2 Peer-to-peer Networks Peers are connected by an overlay network. Users cooperate to share files (e.g.,

38

Efficient Content Location Using Interest-based Locality in Peer-to-Peer

Systems

Page 39: 1 Improve search in unstructured P2P overlay. 2 Peer-to-peer Networks Peers are connected by an overlay network. Users cooperate to share files (e.g.,

39

Background Each peer is connected randomly, and searching is done

by flooding. Allow keyword search

Example of searching a mp3 file in Gnutella network. The query is flooded across the network.

Page 40: 1 Improve search in unstructured P2P overlay. 2 Peer-to-peer Networks Peers are connected by an overlay network. Users cooperate to share files (e.g.,

40

Background DHT (Chord):

Given a key, Chord will map the key to the node. Each node need to maintain O(log N) information Each query use O(log N) messages. Key search means searching by exact name

An chord with about 50 nodes.

The black lines point to adjacent nodes while the red lines are “finger” pointers that allow a node to find key in O(log N) time.

Page 41: 1 Improve search in unstructured P2P overlay. 2 Peer-to-peer Networks Peers are connected by an overlay network. Users cooperate to share files (e.g.,

41

Interest-based Locality

Peers have similar interest will share similar contents

Page 42: 1 Improve search in unstructured P2P overlay. 2 Peer-to-peer Networks Peers are connected by an overlay network. Users cooperate to share files (e.g.,

42

Architecture Shortcuts are modular. Shortcuts are performance enhancement hints.

Page 43: 1 Improve search in unstructured P2P overlay. 2 Peer-to-peer Networks Peers are connected by an overlay network. Users cooperate to share files (e.g.,

43

Creation of shortcuts The peer use the underlying topology (e.g.

Gnutella) for the first few searches. One of the return peers is selected from random

and added to the shortcut lists. Each shortcut will be ordered by the metric, e.g.

success rate, path latency. Subsequent queries go through the shortcut lists

first. If fail, lookup through underlying topology.

Page 44: 1 Improve search in unstructured P2P overlay. 2 Peer-to-peer Networks Peers are connected by an overlay network. Users cooperate to share files (e.g.,

44

Performance Evaluation Performance metric:

success rate load characteristics (query packets per peers process in

the system) query scope (the fraction of peers in each query) minimum reply path length additional state kept in each node

Page 45: 1 Improve search in unstructured P2P overlay. 2 Peer-to-peer Networks Peers are connected by an overlay network. Users cooperate to share files (e.g.,

45

Methodology – query workload

Create traffic trace from the real application traffic: Boeing firewall proxies Microsoft firewall proxies Passively collect the web traffic between CMU and the Internet Passively collect typical P2P traffic (Kazza, Gnutella)

Use exact matching rather than keyword matching in the simulation. “song.mp3” and “my artist – song.mp3” will be treated as different.

Page 46: 1 Improve search in unstructured P2P overlay. 2 Peer-to-peer Networks Peers are connected by an overlay network. Users cooperate to share files (e.g.,

46

Methodology – Underlying peers topology

Based on the Gnutella connectivity graph in 2001, with 95% nodes about 7 hops away.

Searching TTL is set to 7. For each kind of traffic (Boeing, Microsoft… etc),

run 8 times simulations, each with 1 hour.

Page 47: 1 Improve search in unstructured P2P overlay. 2 Peer-to-peer Networks Peers are connected by an overlay network. Users cooperate to share files (e.g.,

47

Simulation Results – success rate

Page 48: 1 Improve search in unstructured P2P overlay. 2 Peer-to-peer Networks Peers are connected by an overlay network. Users cooperate to share files (e.g.,

48

Simulation Results –load and path length

-- Query load for Boeing and Microsoft Traffic:

-- Average path length of the traces:

Page 49: 1 Improve search in unstructured P2P overlay. 2 Peer-to-peer Networks Peers are connected by an overlay network. Users cooperate to share files (e.g.,

49

Increase Number of Shortcuts

7 ~ 12 % performance gain

Diminished return

Add all shortcut at a time, no limit on the shortcut size

Add k shortcut at a time, only 100 shortcuts are used.

Enhancement of Interest-based Locality

Page 50: 1 Improve search in unstructured P2P overlay. 2 Peer-to-peer Networks Peers are connected by an overlay network. Users cooperate to share files (e.g.,

50

Using Shortcuts’ Shortcuts Idea:

Add the shortcut’s shortcut

Performance gain of 7% on average

Enhancement of Interest-based Locality

Page 51: 1 Improve search in unstructured P2P overlay. 2 Peer-to-peer Networks Peers are connected by an overlay network. Users cooperate to share files (e.g.,

51

Interest-based Structures When viewed as an undirected graph:

In the first 10 minutes, there are many connected components, each component has a few peers in between.

At the end of simulation, there are few connected components, each component has several hundred peers. Each component is well connected.

The clustering coefficient is about 0.6 ~ 0.7, which is higher than that in Web graph.

Page 52: 1 Improve search in unstructured P2P overlay. 2 Peer-to-peer Networks Peers are connected by an overlay network. Users cooperate to share files (e.g.,

52

Sensitivity of Shortcuts Run Interest based shortcuts over DHT (Chord)

instead of Gnutella.

Query load is reduced by a factor 2 – 4.

Query scope is reduced from 7/N to 1.5/N

Page 53: 1 Improve search in unstructured P2P overlay. 2 Peer-to-peer Networks Peers are connected by an overlay network. Users cooperate to share files (e.g.,

53

Conclusion Interest based shortcuts are modular and

performance enhancement hints over existing P2P topology.

Shortcuts are proven can enhance the searching efficiencies.

Shortcuts form clusters within a P2P topology, and the clusters are well connected.

Page 54: 1 Improve search in unstructured P2P overlay. 2 Peer-to-peer Networks Peers are connected by an overlay network. Users cooperate to share files (e.g.,

54

Replication Strategies in Unstructured Peer-to-Peer

Networks

Edith CohenAT&T Labs-research

Scott ShenkerICIR

Page 55: 1 Improve search in unstructured P2P overlay. 2 Peer-to-peer Networks Peers are connected by an overlay network. Users cooperate to share files (e.g.,

55

(replication in) P2P architectures

No proactive replication (Gnutella) Hosts store and serve only what they

requested A copy can be found only by probing a host

with a copy

Page 56: 1 Improve search in unstructured P2P overlay. 2 Peer-to-peer Networks Peers are connected by an overlay network. Users cooperate to share files (e.g.,

56

Question: how to use replication to improve search efficiency in unstructured networks with a proactive replication mechanism ?

Page 57: 1 Improve search in unstructured P2P overlay. 2 Peer-to-peer Networks Peers are connected by an overlay network. Users cooperate to share files (e.g.,

57

Search and replication model

Search: probe hosts, uniformly at random, until the query is satisfied (or the search max size is exceeded)

Goal: minimize average search size (number of probes till query is satisfied)

• Replication: Each host can store up to copies of items.

Unstructured networks with replication of keys or copies. Peers probed (in the search and replication process) are unrelated to query/item

Page 58: 1 Improve search in unstructured P2P overlay. 2 Peer-to-peer Networks Peers are connected by an overlay network. Users cooperate to share files (e.g.,

58

What is the search size of a query ?Soluble queries: number of probes until

answer is found. We look at the Expected Search Size (ESS)

of each item. The ESS is inversely proportional to the fraction of peers with a copy of the item.

Search size

Page 59: 1 Improve search in unstructured P2P overlay. 2 Peer-to-peer Networks Peers are connected by an overlay network. Users cooperate to share files (e.g.,

59

Search Example

2 probes 4 probes

Page 60: 1 Improve search in unstructured P2P overlay. 2 Peer-to-peer Networks Peers are connected by an overlay network. Users cooperate to share files (e.g.,

60

Expected Search Size (ESS)

n nodes, capacity R=n* ri= number of copies of the i’th items Allocation : p1(=r1/R), p2, p3,…, pm i pi = 1 ith item is allocated pi fraction of storage.

• m items with relative query rates q1 > q2 > q3 > … > qm. i qi = 1

• Search size for ith item is a Geometric r.v. with mean Ai = 1/( pi ).• ESS is i qi Ai = (i qi / pi)/

Page 61: 1 Improve search in unstructured P2P overlay. 2 Peer-to-peer Networks Peers are connected by an overlay network. Users cooperate to share files (e.g.,

61

Uniform and Proportional Replication

Two natural strategies:• Uniform Allocation: pi = 1/m

•Simple, resources are divided equally• Proportional Allocation: pi = qi

•“Fair”, resources per item proportional to demand• Reflects current P2P practices

Example: 3 items, q1=1/2, q2=1/3, q3=1/6Uniform Proportional

Page 62: 1 Improve search in unstructured P2P overlay. 2 Peer-to-peer Networks Peers are connected by an overlay network. Users cooperate to share files (e.g.,

62

Basic Questions

How do Uniform and Proportional allocations perform/compare ?

Which strategy minimizes the Expected Search Size (ESS) ?

Is there a simple protocol that achieves optimal replication in decentralized unstructured networks ?

Page 63: 1 Improve search in unstructured P2P overlay. 2 Peer-to-peer Networks Peers are connected by an overlay network. Users cooperate to share files (e.g.,

63

ESS under Uniform and Proportional Allocations (soluble queries)

• Lemma: The ESS under either Uniform or Proportional allocations is m/– Independent of query rates (!!!)– Same ESS for Proportional and Uniform (!!!)

Proportional:ASS is (i qi / pi)/(i qi / qi)/m/Uniform:ASS is (i qi / pi)/(i m qi)/m/i qi m/ pi=(R/m)/R

• Proof…

Page 64: 1 Improve search in unstructured P2P overlay. 2 Peer-to-peer Networks Peers are connected by an overlay network. Users cooperate to share files (e.g.,

64

Space of Possible Allocations

Definition: Allocation p1, p2, p3,…, pm is “in-between” Uniform and Proportional if for 1i <m, q i+1/q i < p i+1/p i < 1

Theorem1: All (strictly) in-between strategies are (strictly) better than Uniform and Proportional

Theorem2: p is worse than Uniform/Proportional if for all i, p i+1/p i > 1 (more popular gets less) OR for all i, q i+1/q i > p i+1/p i (less popular gets less than

“fair share”)Proportional and Uniform are the worst “reasonable” strategies (!!!)

Page 65: 1 Improve search in unstructured P2P overlay. 2 Peer-to-peer Networks Peers are connected by an overlay network. Users cooperate to share files (e.g.,

65

So, what is the best strategy for soluble queries ?

Page 66: 1 Improve search in unstructured P2P overlay. 2 Peer-to-peer Networks Peers are connected by an overlay network. Users cooperate to share files (e.g.,

66

Square-Root Allocationpi is proportional to square-root(qi)

m

jj

ii

q

qp

1

• Lies “In-between” Uniform and Proportional• Theorem: Square-Root allocation minimizes the ESS (on soluble queries) Minimize i qi / pi such that i pi = 1

Page 67: 1 Improve search in unstructured P2P overlay. 2 Peer-to-peer Networks Peers are connected by an overlay network. Users cooperate to share files (e.g.,

67

How much can we gain by using SR ?w

i iq Zipf-like query rates

Page 68: 1 Improve search in unstructured P2P overlay. 2 Peer-to-peer Networks Peers are connected by an overlay network. Users cooperate to share files (e.g.,

68

Replication Algorithms

Fully distributed where peers communicate through random probes; minimal bookkeeping; and no more communication than what is needed for search.

Converge to/obtain SR allocation when query rates remain steady.

• Uniform and Proportional are “easy” :-– Uniform: When item is created, replicate its key

in a fixed number of hosts.– Proportional: for each query, replicate the key

in a fixed number of hostsDesired properties of algorithm:

Page 69: 1 Improve search in unstructured P2P overlay. 2 Peer-to-peer Networks Peers are connected by an overlay network. Users cooperate to share files (e.g.,

69

Model for Copy Creation/Deletion Creation: after a successful search, C(s) new

copies are created at random hosts.Deletion: is independent of the identity of the

item; copy survival chances are non-decreasing with creation time. (i.e., FIFO at each node)

<Ci> average value of C used to replicate ith item.Claim: If <Ci>/<Cj> remains fixed over time, and <Ci>, <Cj> , then pi/pj qi <Ci>/qj <Cj>

Property of the process:

Page 70: 1 Improve search in unstructured P2P overlay. 2 Peer-to-peer Networks Peers are connected by an overlay network. Users cooperate to share files (e.g.,

70

Creation/Deletion Process

ii qC 1 jiji qqpp If then

Corollary:

Algorithm for square-root allocation needs to have <Ci> equal to or converge to a value inversely proportional toiq

Page 71: 1 Improve search in unstructured P2P overlay. 2 Peer-to-peer Networks Peers are connected by an overlay network. Users cooperate to share files (e.g.,

71

SR Replication Algorithms Path replication: number of new copies C(s) is

proportional to the size of the search

Probe memory: each peer records number and combined search size of probes it sees for each item. C(S) is determined by collecting this info from number of peers proportional to search size. Extra communication (proportional to that needed for search).

Page 72: 1 Improve search in unstructured P2P overlay. 2 Peer-to-peer Networks Peers are connected by an overlay network. Users cooperate to share files (e.g.,

72

Path Replication Number of new copies produced per query, <Ci>,

is proportional to search size 1/pi Creation rate is proportional to qi <Ci> Steady state: creation rate proportional to allocati

on pi, thus

iiiii ppqCq

ii qp

Page 73: 1 Improve search in unstructured P2P overlay. 2 Peer-to-peer Networks Peers are connected by an overlay network. Users cooperate to share files (e.g.,

73

Summary

• Random Search/replication Model: probes to “random” hosts

• Soluble queries: • Proportional and Uniform allocations are two

extremes with same average performance• Square-Root allocation minimizes Average

Search Size• OPT (all queries) lies between SR and Uniform• SR/OPT allocation can be realized by simple

algorithms.