29
Large Scale Sharing Marco F. Duarte COMP 520: Distributed Systems September 19, 2004

Large Scale Sharing

Embed Size (px)

DESCRIPTION

Large Scale Sharing. Marco F. Duarte COMP 520: Distributed Systems September 19, 2004. Introduction. P2P sharing systems are very popular In P2P, all nodes have identical capabilities and responsibilities - PowerPoint PPT Presentation

Citation preview

Page 1: Large Scale Sharing

Large Scale SharingMarco F. Duarte

COMP 520: Distributed SystemsSeptember 19, 2004

Page 2: Large Scale Sharing

Introduction P2P sharing systems are very popular In P2P, all nodes have identical capabilities

and responsibilities Popular approaches are partially centralized,

do not scale well, or do not provide desired anonymity

Scalability of systems critical Need for decentralized, load-balancing

architectures

Page 3: Large Scale Sharing

Features desired in a P2P sharing system Decentralized architecture – no single point

of failure Scalability – bandwidth and load balancing Fault tolerance – content replication Anonymity for users – posters, readers,

storers Resilient against DoS attacks

Page 4: Large Scale Sharing

Freenet provides anonymity No requester, provider information implicit in

communication Presence of a file in a node does not imply

authorship Popular files are replicated to improve locality Does not intend to provide

permanent storage

Page 5: Large Scale Sharing

Freenet Queries Files receive FileIDs (160-

bit SHA-1 hash of “file identifier”)

Queries have pseudo-unique random identifiers (QueryIDs) and hops-to-live count.

Routing tables contain table of previously retrieved FileIDs and their locations

Queries are routed to location with closest FileID at each stage; loops are detected with QueryID

FileID Node Address

00231311 192.168.3.24

11310231 192.168.52.111

20130102 192.168.122.38

23102312 192.168.213.231

30002312 192.168.58.47

32302132 192.168.33.241

32320303 192.168.194.28

33103123 192.168.12.242

31302313?

Page 6: Large Scale Sharing

Freenet Queries: Lookups and Stores

•Copies of the file are stored at all nodes•File record for a is added to routing tables•Writes perform lookup, insert file along path if no match found

a

e

b

Page 7: Large Scale Sharing

Freenet Properties FileID-based clustering allows for improved routing

as usage increases LRU-like capacity management: rarely used files are

purged from the system Random nature of FileIDs allow for diversity of

information at nodes Attempts to supplant existing files will lead to real file

propagation Anonymity features:

File ownership assumed randomly by other nodes Minimal routing information necessary at each hop Hops-to-live count of 1 updated randomly

Page 8: Large Scale Sharing

Freenet Problems Files that are stored in the network may not

be found. Freenet does not provide reliable storage No notion of locality in routing Simulations do not involve file insertion or

node discovery

Page 9: Large Scale Sharing

PAST: Reliable Distributed Storage Customizable file persistence High availability and load balancing Efficient Routing and Storage Allocation Uses FileIDs generated from hashes like in

Freenet Uses owner credentials to verify identity of

authors Interface: Insert, Lookup, Reclaim

Page 10: Large Scale Sharing

PAST Architecture FileID computed from hash of filename,

owner’s public key and a random salt. Each node receives a pseudorandom

NodeID, independent of the node properties. Owner specifies number k of replicas of a file

to store in the system on insert. File is stored in the k nodes with NodeIDs

closest to the FileID. Routing provided by Pastry.

Page 11: Large Scale Sharing

Pastry: Routing for P2P Networks Paths with less than hops Delivery guaranteed under at most node

failures Flexible proximity metric. Each node contains:

Leaf set – l nodes with closest NodeIDs Routing table – set of neighbors organized by NodeIDs Neighborhood set – l closest nodes Each NodeID is paired with its network

address Direct routes to neighbors and l closest

NodeIDs

Nb2log

2/l

Page 12: Large Scale Sharing

Pastry: Example Routing table

organized by similarity to NodeID.

Neighborhood set used for node addition/recovery.

Queries are forwarded to a numerically closer node (by shared NodeID header, and NodeID proximity).

Page 13: Large Scale Sharing

Pastry Routing Table0=2M

Leaf Set

Neighborhood Set

2300

0302

1033

1123

1202

1311 2031

2121

0231

3013

3321

3133

Page 14: Large Scale Sharing

Pastry Routing Example0=2M

0302

1033

1123

1202

1311 2031

2121

0231 3321

3133

2300

30133133

?

Other nodes exist but are not shown

Page 15: Large Scale Sharing

Pastry Node Insertion Example0=2M

0302

1033

1123

1202

1311 2031

2121

0231 3321

3133

2300

3013

3130

NeighborhoodSet

3130

Leaf Set

Page 16: Large Scale Sharing

Pastry Node Removal Example0=2M

3321

3133

3013

Page 17: Large Scale Sharing

PAST Insertions0=2M

0302

1033

1123

1202

1311 2031

2121

0231 3321

3133

2300

3013

Insert File, FileID 3130

Owner

3130: File,Certificate

3130: File,Certificate

3130: File,Certificate

fileID = Insert(name, owner-credentials, k, file)

Insert File K times

Page 18: Large Scale Sharing

PAST Insertions0=2M

0302

1033

1123

1202

1311 2031

2121

0231 3321

3133

2300

3013

Owner

k Store Receipts

k StoreReceipts

k StoreReceipts

fileID = Insert(name, owner-credentials, k, file)

Page 19: Large Scale Sharing

PAST Semantics fileID = lookup(fileID)

Routed to NodeID = FileID First of k closest nodes found returns file, credentials

Reclaim(fileID, owner-credentials) Same semantics as Insert Owner issues Reclaim Certificate Storing nodes issue Reclaim Receipt

Changes in leaf sets will trigger changes in replica locations A new node creates “pointers” to files it should contain;

migration is gradual

Page 20: Large Scale Sharing

Load Balancing in PAST: Replica Diversion

3130 Leaf Set

3201Leaf Set

Page 21: Large Scale Sharing

Load Balancing in PAST: File Diversion

3130 Leaf Set

3201Leaf Set

Change ID by changing salt

Policies for acceptance of replicas and diverted replicas, and selection of diverted replica node.Maximum ratio of file size to free space for insertion tpri, tdiv

Page 22: Large Scale Sharing

Caching in PAST Highly popular files might demand more

replicas than specified. Files located “far away” only need to be

fetched once locally Unused disk space is allocated as cache. Caching performance degrades gradually

with increased utilization Cache insertion policy similar to diversion

policies.

Page 23: Large Scale Sharing

PAST Performance: tpri comparison, tdiv =0.05

82.00%84.00%86.00%88.00%90.00%92.00%94.00%96.00%98.00%

100.00%

0.05 0.1 0.2 0.5

t_pri

Perc

enta

ge

SucceedUtilization

Page 24: Large Scale Sharing

PAST Performance: tpri comparison, tdiv =0.05

Page 25: Large Scale Sharing

PAST Performance:Ratio of File Diversions

Page 26: Large Scale Sharing

PAST Performance: Ratio of Replica Diversions

Page 27: Large Scale Sharing

PAST Performance: Failed Insertions

Page 28: Large Scale Sharing

PAST Performance: Cache Hits

Page 29: Large Scale Sharing

Conclusions Content based routing improves scalability of

distributed storage systems. Need for user authentication in distributed

systems. Caching is crucial for system performance. Diversion allows for graceful performance

degradation. Need file mutability, file search or indexing

services