Large Scale Sharing

Large Scale SharingMarco F. Duarte

COMP 520: Distributed SystemsSeptember 19, 2004

Introduction P2P sharing systems are very popular In P2P, all nodes have identical capabilities

and responsibilities Popular approaches are partially centralized,

do not scale well, or do not provide desired anonymity

Scalability of systems critical Need for decentralized, load-balancing

architectures

Features desired in a P2P sharing system Decentralized architecture – no single point

of failure Scalability – bandwidth and load balancing Fault tolerance – content replication Anonymity for users – posters, readers,

storers Resilient against DoS attacks

Freenet provides anonymity No requester, provider information implicit in

communication Presence of a file in a node does not imply

authorship Popular files are replicated to improve locality Does not intend to provide

permanent storage

Freenet Queries Files receive FileIDs (160-

bit SHA-1 hash of “file identifier”)

Queries have pseudo-unique random identifiers (QueryIDs) and hops-to-live count.

Routing tables contain table of previously retrieved FileIDs and their locations

Queries are routed to location with closest FileID at each stage; loops are detected with QueryID

FileID Node Address

00231311 192.168.3.24

11310231 192.168.52.111

20130102 192.168.122.38

23102312 192.168.213.231

30002312 192.168.58.47

32302132 192.168.33.241

32320303 192.168.194.28

33103123 192.168.12.242

31302313?

Freenet Queries: Lookups and Stores

•Copies of the file are stored at all nodes•File record for a is added to routing tables•Writes perform lookup, insert file along path if no match found

a

e

b

Freenet Properties FileID-based clustering allows for improved routing

as usage increases LRU-like capacity management: rarely used files are

purged from the system Random nature of FileIDs allow for diversity of

information at nodes Attempts to supplant existing files will lead to real file

propagation Anonymity features:

File ownership assumed randomly by other nodes Minimal routing information necessary at each hop Hops-to-live count of 1 updated randomly

Freenet Problems Files that are stored in the network may not

be found. Freenet does not provide reliable storage No notion of locality in routing Simulations do not involve file insertion or

node discovery

PAST: Reliable Distributed Storage Customizable file persistence High availability and load balancing Efficient Routing and Storage Allocation Uses FileIDs generated from hashes like in

Freenet Uses owner credentials to verify identity of

authors Interface: Insert, Lookup, Reclaim

PAST Architecture FileID computed from hash of filename,

owner’s public key and a random salt. Each node receives a pseudorandom

NodeID, independent of the node properties. Owner specifies number k of replicas of a file

to store in the system on insert. File is stored in the k nodes with NodeIDs

closest to the FileID. Routing provided by Pastry.

Pastry: Routing for P2P Networks Paths with less than hops Delivery guaranteed under at most node

failures Flexible proximity metric. Each node contains:

Leaf set – l nodes with closest NodeIDs Routing table – set of neighbors organized by NodeIDs Neighborhood set – l closest nodes Each NodeID is paired with its network

address Direct routes to neighbors and l closest

NodeIDs

Nb2log

2/l

Pastry: Example Routing table

organized by similarity to NodeID.

Neighborhood set used for node addition/recovery.

Queries are forwarded to a numerically closer node (by shared NodeID header, and NodeID proximity).

Pastry Routing Table0=2M

Leaf Set

Neighborhood Set

2300

0302

1033

1123

1202

1311 2031

2121

0231

3013

3321

3133

Pastry Routing Example0=2M

0302

1033

1123

1202

1311 2031

2121

0231 3321

3133

2300

30133133

?

Other nodes exist but are not shown

Pastry Node Insertion Example0=2M

0302

1033

1123

1202

1311 2031

2121

0231 3321

3133

2300

3013

3130

NeighborhoodSet

3130

Leaf Set

Pastry Node Removal Example0=2M

3321

3133

3013

PAST Insertions0=2M

0302

1033

1123

1202

1311 2031

2121

0231 3321

3133

2300

3013

Insert File, FileID 3130

Owner

3130: File,Certificate



fileID = Insert(name, owner-credentials, k, file)

Insert File K times

PAST Insertions0=2M

0302

1033

1123

1202

1311 2031

2121

0231 3321

3133

2300

3013

Owner

k Store Receipts

k StoreReceipts

k StoreReceipts

fileID = Insert(name, owner-credentials, k, file)

PAST Semantics fileID = lookup(fileID)

Routed to NodeID = FileID First of k closest nodes found returns file, credentials

Reclaim(fileID, owner-credentials) Same semantics as Insert Owner issues Reclaim Certificate Storing nodes issue Reclaim Receipt

Changes in leaf sets will trigger changes in replica locations A new node creates “pointers” to files it should contain;

migration is gradual

Load Balancing in PAST: Replica Diversion

3130 Leaf Set

3201Leaf Set

Load Balancing in PAST: File Diversion

3130 Leaf Set

3201Leaf Set

Change ID by changing salt

Policies for acceptance of replicas and diverted replicas, and selection of diverted replica node.Maximum ratio of file size to free space for insertion tpri, tdiv

Caching in PAST Highly popular files might demand more

replicas than specified. Files located “far away” only need to be

fetched once locally Unused disk space is allocated as cache. Caching performance degrades gradually

with increased utilization Cache insertion policy similar to diversion

policies.

PAST Performance: tpri comparison, tdiv =0.05

82.00%84.00%86.00%88.00%90.00%92.00%94.00%96.00%98.00%

100.00%

0.05 0.1 0.2 0.5

t_pri

Perc

enta

ge

SucceedUtilization

PAST Performance: tpri comparison, tdiv =0.05

PAST Performance:Ratio of File Diversions

PAST Performance: Ratio of Replica Diversions

PAST Performance: Failed Insertions

PAST Performance: Cache Hits

Conclusions Content based routing improves scalability of

distributed storage systems. Need for user authentication in distributed

systems. Caching is crucial for system performance. Diversion allows for graceful performance

degradation. Need file mutability, file search or indexing

services

Documents

Large Scale Sharing