Peer-to-Peer Structured Overlay Networks Antonino Virgillito

  • Published on

  • View

  • Download

Embed Size (px)


<ul><li> Slide 1 </li> <li> Peer-to-Peer Structured Overlay Networks Antonino Virgillito </li> <li> Slide 2 </li> <li> Background Peer-to-peer systems distribution symmetry (communication, node roles) decentralized control self-organization dynamicity </li> <li> Slide 3 </li> <li> Data Lookup in P2P Systems Data items spread over a large number of nodes Which node stores which data item? A lookup mechanism needed Centralized directory -&gt; bottleneck/single point of failure Query Flooding -&gt; scalability concerns Need more structure! </li> <li> Slide 4 </li> <li> More Issues Organize, maintain overlay network node arrivals node failures Resource allocation/load balancing Resource location Network proximity routing </li> <li> Slide 5 </li> <li> What is a Distributed HashTable? Exactly that A service, distributed over multiple machines, with hash table semantics put(key, value), Value = get(key) Designed to work in a peer-to-peer (P2P) environment No central control Nodes under different administrative control But of course can operate in an infrastructure sense </li> <li> Slide 6 </li> <li> What is a DHT? Hash table semantics: put(key, value), Value = get(key) Key is a single flat string Limited semantics compared to keyword search Put() causes value to be stored at one (or more) peer(s) Get() retrieves value from a peer Put() and Get() accomplished with unicast routed messages In other words, it scales Other API calls to support application, like notification when neighbors come and go </li> <li> Slide 7 </li> <li> Distributed Hash Tables (DHT) k6,v6 k1,v1 k5,v5 k2,v2 k4,v4 k3,v3 nodes Operations: put(k,v) get(k) P2P overlay network p2p overlay maps keys to nodes completely decentralized and self-organizing robust, scalable </li> <li> Slide 8 </li> <li> Popular DHTs Tapestry (Berkeley) Based on Plaxton trees---similar to hypercube routing The first* DHT Complex and hard to maintain (hard to understand too!) CAN (ACIRI), Chord (MIT), and Pastry (Rice/MSR Cambridge) Second wave of DHTs (contemporary with and independent of each other) </li> <li> Slide 9 </li> <li> DHTs Basics Node IDs can be mapped to the hash key space Given a hash key as a destination address, you can route through the network to a given node Always route to the same node no matter where you start from Requires no centralized control (completely distributed) Small per-node state is independent of the number of nodes in the system (scalable) Nodes can route around failures (fault-tolerant) </li> <li> Slide 10 </li> <li> Things to look at What is the structure? How does routing work in the structure? How does it deal with node joins and departures (structure maintenance)? How does it scale? How does it deal with locality? What are the security issues? </li> <li> Slide 11 </li> <li> The Chord Approach Consistent Hashing Logical Ring Finger Pointers </li> <li> Slide 12 </li> <li> The Chord Protocol Provides: A mapping successor: key -&gt; node To lookup key K, go to node successor(K) successor defined using consistent hashing: Key hash Node hash Both Keys and Nodes hash to same (circular) identifier space successor(K)=first node with hash ID equal to or greater than hash(K) </li> <li> Slide 13 </li> <li> Example: The Logical Ring Nodes 0, 1, 3 Keys 1, 2, 6 </li> <li> Slide 14 </li> <li> Consistent Hashing [Karger et al. 97] Some Nice Properties: Smoothness: minimal key movement on node join/leave Load Balancing: keys equitably distributed over nodes </li> <li> Slide 15 </li> <li> Mapping Details Range of Hash Function Circular ID space module 2 m Compute 160 bit SHA-1 hash, and truncate to m-bits Chance of collision rare if m is large enough Deterministic, but hard for an adversary to subvert </li> <li> Slide 16 </li> <li> Chord State Successor/Predecessor in the Ring Finger Pointers n.finger[i] = successor (n+2 i-1 ) Each node knows more about portion of circle close to it! </li> <li> Slide 17 </li> <li> Example: Finger Tables </li> <li> Slide 18 </li> <li> Chord: routing protocol - A set of nodes towards id are contacted remotely - Each node is queried for the known node which is closest to id - Process stops when a node is found having successor &gt; id Notation ) stands for a remote call to node n. </li> <li> Slide 19 </li> <li> Example: Chord Routing Finger Pointers for Node 1 </li> <li> Slide 20 </li> <li> Lookup Complexity With high probability: O(log(N)) Proof Intuition: Being p the successor of the targeted key, distance to p reduces by at least half in each step In m steps, would reach p Stronger claim: In O(log(N)) steps, distance 2 m /N Thereafter even linear advance will suffice to give O(log(N)) lookup complexity </li> <li> Slide 21 </li> <li> Chord invariants Every key in the network can be located as long as the following invariants are preserved after joins and leaves: Each nodes successor is correctly maintained For every key k, node successor(k) is responsible for k </li> <li> Slide 22 </li> <li> Chord: Node Joins New node B learns of at least one existing node A via external means B asks A to lookup its finger-table information Given that Bs hash-id is b, A does lookup for B.finger[i] = successor ( b + 2 i-1 ) if interval not already included in finger[i-1] B stores all finger information and sets up pred/succ pointers </li> <li> Slide 23 </li> <li> Node Joins (contd.) Update of finger table of existing nodes p such that: 1.p precedes b by at least 2 i-1 2.the i-th finger of node p succeeds b Starts from p = predecessor( b - 2 i-1 ) and proceeds in counter-clock-wise direction while 2. is true Transferring keys: Only from successor(b) to b Must send notification to the application </li> <li> Slide 24 </li> <li> Example: finger table update Node 6 joins </li> <li> Slide 25 </li> <li> Example: transferring keys Node 1 leaves </li> <li> Slide 26 </li> <li> Concurrent Joins/Leaves Need a stabilization protocol to guard against inconsistency Note: Incorrect finger pointers may only increase latency, but incorrect successor pointers may cause lookup failure! Nodes periodically run stabilization protocol Finds successors predecessor Repair if this isnt self This algorithm is also run at join </li> <li> Slide 27 </li> <li> Example: node 25 joins </li> <li> Slide 28 </li> <li> Example: node 28 joins before 20 stabilizes (1) </li> <li> Slide 29 </li> <li> Example: node 28 joins before 20 stabilizes (2) </li> <li> Slide 30 </li> <li> CAN Virtual d-dimensional Cartesian coordinate system on a d-torus Example: 2-d [0,1]x[1,0] Dynamically partitioned among all nodes Pair (K,V) is stored by mapping key K to a point P in the space using a uniform hash function and storing (K,V) at the node in the zone containing P Retrieve entry (K,V) by applying the same hash function to map K to P and retrieve entry from node in zone containing P If P is not contained in the zone of the requesting node or its neighboring zones, route request to neighbor node in zone nearest P </li> <li> Slide 31 </li> <li> Routing in a CAN Follow straight line path through the Cartesian space from source to destination coordinates Each node maintains a table of the IP address and virtual coordinate zone of each local neighbor Use greedy routing to neighbor closest to destination For d-dimensional space partitioned into n equal zones, nodes maintain 2d neighbors Average routing path length: </li> <li> Slide 32 </li> <li> CAN Construction Joining node locates a bootstrap node using the CAN DNS entry Bootstrap node provides IP addresses of random member nodes Joining node sends JOIN request to random point P in the Cartesian space Node in zone containing P splits the zone and allocates half to joining node (K,V) pairs in the allocated half are transferred to the joining node Joining node learns its neighbor set from previous zone occupant Previous zone occupant updates its neighbor set </li> <li> Slide 33 </li> <li> Departure, Recovery and Maintenance Graceful departure: node hands over its zone and the (K,V) pairs to a neighbor Network failure: unreachable node(s) trigger an immediate takeover algorithm that allocate failed nodes zone to a neighbor Detect via lack of periodic refresh messages Neighbor nodes start a takeover timer initialized in proportion to its zone volume Send a TAKEOVER message containing zone volume to all of failed nodes neighbors If received TAKEOVER volume is smaller kill timer, if not reply with a TAKEOVER message Nodes agree on neighbor with smallest volume that is alive </li> <li> Slide 34 </li> <li> Pastry Generic p2p location and routing substrate Self-organizing overlay network Lookup/insert object in &lt; log 16 N routing steps (expected) O(log N) per-node state Network proximity routing </li> <li> Slide 35 </li> <li> Pastry: Object distribution objId Consistent hashing 128 bit circular id space nodeIds (uniform random) objIds (uniform random) Invariant: node with numerically closest nodeId maintains object nodeIds O 2 128 -1 </li> <li> Slide 36 </li> <li> Pastry: Object insertion/lookup X Route(X) Msg with key X is routed to live node with nodeId closest to X Problem: complete routing table not feasible O 2 128 -1 </li> <li> Slide 37 </li> <li> Pastry: Routing table (# 65a1fc) log 16 N rows Row 0 Row 1 Row 2 Row 3 </li> <li> Slide 38 </li> <li> Pastry: Leaf sets Each node maintains IP addresses of the nodes with the L/2 numerically closest larger and smaller nodeIds, respectively. routing efficiency/robustness fault detection (keep-alive) application-specific local coordination </li> <li> Slide 39 </li> <li> Pastry: Routing procedure if (destination is within range of our leaf set) forward to numerically closest member else let l = length of shared prefix let d = value of l-th digit in Ds address if ( R l d exists) forward to R l d else forward to a known node that (a) shares at least as long a prefix (b) is numerically closer than this node </li> <li> Slide 40 </li> <li> Pastry: Routing Properties log 16 N steps O(log N) state d46a1c Route(d46a1c) d462ba d4213f d13da3 65a1fc d467c4 d471f1 </li> <li> Slide 41 </li> <li> Pastry: Performance Integrity of overlay message delivery: guaranteed unless L/2 simultaneous failures of nodes with adjacent nodeIds Number of routing hops: No failures: &lt; log 16 N expected, 128/b + 1 max During failure recovery: O(N) worst case, average case much better </li> <li> Slide 42 </li> <li> Pastry Join X = new node, A = bootstrap, Z = nearest node A finds Z for X In process, A, Z, and all nodes in path send state tables to X X settles on own table Possibly after contacting other nodes X tells everyone who needs to know about itself </li> <li> Slide 43 </li> <li> Pastry Leave Noticed by leaf set neighbors when leaving node doesnt respond Neighbors ask highest and lowest nodes in leaf set for new leaf set Noticed by routing neighbors when message forward fails Immediately can route to another neighbor Fix entry by asking another neighbor in the same row for its neighbor If this fails, ask somebody a level up </li> </ul>


View more >