Peer-to-Peer Systems Chapter 25. What is Peer-to-Peer (P2P)? Napster? Gnutella? Most people think of P2P as music sharing

  • Published on
    14-Dec-2015

  • View
    226

  • Download
    0

Embed Size (px)

Transcript

  • Slide 1

Peer-to-Peer Systems Chapter 25 Slide 2 What is Peer-to-Peer (P2P)? Napster? Gnutella? Most people think of P2P as music sharing Slide 3 What is a peer? Contrasted with Client-Server model Servers are centrally maintained and administered Client has fewer resources than a server Slide 4 What is a peer? A peers resources are similar to the resources of the other participants P2P peers communicating directly with other peers and sharing resources Slide 5 P2P Concepts Client-client as opposed to client-server File sharing: I get a copy from someone, and now make it available for others to download---copies are/workload is spread out Advantages: Scalable, stable, self-repairing Process: A peer joins the system when a user starts the application, contributes some resources while making use of the resources provided by others, and leaves the system when the user exits the application. Session: One such join-participate-leave cycle Churn: The independent arrival and departure by thousandsor millions of peers creates the collective effect we call churn. The user-driven dynamics of peer participation must be taken into account in both the design and evaluation of any P2P application. For example, the distribution of session length can affect the overlay structure, the resiliency of the overlay, and the selection of key design parameters. Slide 6 Types of clients Based on the client behavior, there are three types of clients: True clients (not active participants; take but dont give; short duration of stay) Peers: Clients that stay long enough and well-connected enough to participate actively (Take and give) Servers (Give, but dont take) Safe vs. probabilistic protocols Mostly logarithmic order of performance/cost Slide 7 Levels of P2P-ness P2P as a mindset Slashdot P2P as a model Gnutella P2P as an implementation choice Application-layer multicast P2P as an inherent property Ad-hoc networks Slide 8 P2P Goals/Benefits Cost sharing Resource aggregation Improved scalability/reliability Increased autonomy Anonymity/privacy Dynamism Ad-hoc communication Slide 9 P2P File Sharing Content exchange Gnutella File systems Oceanstore Filtering/mining Opencola Slide 10 P2P File Sharing Benefits Cost sharing Resource aggregation Improved scalability/reliability Anonymity/privacy Dynamism Slide 11 P2P Application Taxonomy P2P Systems Distributed Computing SETI@home File Sharing Gnutella Collaboration Jabber Platforms JXTA Slide 12 Management/Placement Challenges Per-node state Bandwidth usage Search time Fault tolerance/resiliency Slide 13 Approaches Centralized Flooding Document Routing Slide 14 Centralized Napster model Benefits: Efficient search Limited bandwidth usage No per-node state Drawbacks: Central point of failure Limited scale BobAlice JaneJudy Slide 15 Flooding Gnutella model Benefits: No central point of failure Limited per-node state Drawbacks: Slow searches Bandwidth intensive Bob Alice Jane Judy Carl Slide 16 Connectivity Slide 17 Napster Uses a centralized directory mechanism To control the selection of peers To generate other revenue-generating activities In addition is has several regional servers Users first connect to the Napsters centralized server to one of the regional servers Basically, each client system has a Napster proxy that keeps track of the local shared files and informs the regional server Napster uses some heuristic evaluation mechanisms about the reliability of a client before it starts using it as a shared workspace Slide 18 Gnutella and Kazaa Unlike Napster, it is a pure P2P with no centralized component---all peers are completely equal Protocol: Ensures that each user system is concerned with a few Gnutella nodes Search for files: if the distance specified is 4, then all machines within 4 hops of the client will be probed (1 st all M/C within 1 hop; then 2 hops; and so on) The anycast mechanism becomes extremely costly as system scales up. Kaaza also does not have a centralized control (as Gnutella); it uses Plaxton trees. Slide 19 CAN Content Addressable Network Each object is expected to have a unique system wide name or identifier The name is hashed into a d-tuple--- identifier is converted into a random- looking number using some cryptographic hash function In a 2-dimensional CAN the id is hashed to a 2-dimensional tuple: (x,y) Same scheme is used to convert machine IDs Recursively subdivide the space of possible d-dimensional identifiers, storing each object at the node owning the part of the space (zone) that objects ID falls in. When a new node is added, it shares its space with the new node; similarly when a node leaves, its space is owned by a nearby node Once a user provides the search key, it is converted to (x,y); the receiving CAN node finds a path from itslef to the node having (x,y) space. If d is the dimensions, and N is the #of nodes, then the number of hops is (d/4)*N 1/d TO take care of node failures, there will be backups. Cost is high when there are frequent joins/leaves Slide 20 Document Routing FreeNet, Chord, CAN, Tapestry, Pastry model Benefits: More efficient searching Limited per-node state Drawbacks: Limited fault-tolerance vs redundancy 001 012 212 305 332 212 ? Slide 21 Document Routing CAN Associate to each node and item a unique id in an d-dimensional space Goals Scales to hundreds of thousands of nodes Handles rapid arrival and failure of nodes Properties Routing table size O(d) Guarantees that a file is found in at most d*n 1/d steps, where n is the total number of nodes Slide modified from another presentation Slide 22 CAN Example: Two Dimensional Space Space divided between nodes All nodes cover the entire space Each node covers either a square or a rectangular area of ratios 1:2 or 2:1 Example: Node n1:(1, 2) first node that joins cover the entire space 1 234 5 670 1 2 3 4 5 6 7 0 n1 Slide modified from another presentation Slide 23 CAN Example: Two Dimensional Space Node n2:(4, 2) joins space is divided between n1 and n2 1 234 5 670 1 2 3 4 5 6 7 0 n1 n2 Slide modified from another presentation Slide 24 CAN Example: Two Dimensional Space Node n2:(4, 2) joins space is divided between n1 and n2 1 234 5 670 1 2 3 4 5 6 7 0 n1 n2 n3 Slide modified from another presentation Slide 25 CAN Example: Two Dimensional Space Nodes n4:(5, 5) and n5:(6,6) join 1 234 5 670 1 2 3 4 5 6 7 0 n1 n2 n3 n4 n5 Slide modified from another presentation Slide 26 CAN Example: Two Dimensional Space Nodes: n1:(1, 2); n2:(4,2); n3:(3, 5); n4:(5,5);n5:(6,6) Items: f1:(2,3); f2:(5,1); f3:(2,1); f4:(7,5); 1 234 5 670 1 2 3 4 5 6 7 0 n1 n2 n3 n4 n5 f1 f2 f3 f4 Slide modified from another presentation Slide 27 CAN Example: Two Dimensional Space Each item is stored by the node who owns its mapping in the space 1 234 5 670 1 2 3 4 5 6 7 0 n1 n2 n3 n4 n5 f1 f2 f3 f4 Slide modified from another presentation Slide 28 CAN: Query Example Each node knows its neighbors in the d-space Forward query to the neighbor that is closest to the query id Example: assume n1 queries f4 Can route around some failures some failures require local flooding 1 234 5 670 1 2 3 4 5 6 7 0 n1 n2 n3 n4 n5 f1 f2 f3 f4 Slide modified from another presentation Slide 29 CAN: Query Example Each node knows its neighbors in the d-space Forward query to the neighbor that is closest to the query id Example: assume n1 queries f4 Can route around some failures some failures require local flooding 1 234 5 670 1 2 3 4 5 6 7 0 n1 n2 n3 n4 n5 f1 f2 f3 f4 Slide modified from another presentation Slide 30 CAN: Query Example Each node knows its neighbors in the d-space Forward query to the neighbor that is closest to the query id Example: assume n1 queries f4 Can route around some failures some failures require local flooding 1 234 5 670 1 2 3 4 5 6 7 0 n1 n2 n3 n4 n5 f1 f2 f3 f4 Slide modified from another presentation Slide 31 CAN: Query Example Each node knows its neighbors in the d-space Forward query to the neighbor that is closest to the query id Example: assume n1 queries f4 Can route around some failures some failures require local flooding 1 234 5 670 1 2 3 4 5 6 7 0 n1 n2 n3 n4 n5 f1 f2 f3 f4 Slide modified from another presentation Slide 32 CFS and PAST Files are replicated prior to storage--- copies are stored at adjacent locations in the hashed-id space Make use of indexing systems to locate nodes on which they store objects or from which they retrieve copies IDs are hashed to a 1-dimensional space Leaves/Joins result in several file copies--- could be a bottleneck Slide 33 OceanStore Focused on long term archival storage (rather than file sharing)---e.g., digital libraries Ensure codes --- class of error-correcting codes that can reconstruct a valid copy of a file given some percentage of copies Slide 34 Distributed Indexing in P2P Two requirements: A lookup mechanism to track down a node holding an object A superimposed file system that knows how to store and retrieve files DNS---a distributed object locator: M/C names to IP addresses P2P indexing tools let users store (key, value) pairs---a distributed hash system Slide 35 Chord It is a major DHT architecture Forms a massive virtual ring in which every node in the distributed system is a member---each owning part of a periphery. If hash value of a node is h, and the lower value is hL, and the higher is hH, then the node with h owns objects in the range: hL< k

Recommended

View more >