Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
P2P NetworkStructured Networks:
Distributed Hash Tables
Pedro García López
Universitat Rovira I Virgili
Index
• Description of CHORD’s Location and routing mechanisms
• Symphony: Distributed Hashing in a Small World
Description of CHORD’sLocation and routing mechanisms
Vincent MatossianOctober 12th 2001
ECE 579
Overview
• Maps keys onto nodes in a 1D circular space• Uses consistent hashing –D.Karger, E.Lehman
• Aimed at large-scale peer-to-peer applications
• Consistent hashing• Algorithm for key location• Algorithm for node joining• Algorithm for stabilization• Failures and replication
Chord:
Talk
Consistent hashing
• Distributed caches to relieve hotspots on the web
• Node identifier hash = hash(IP address)• Key identifier hash = hash(key)• Designed to let nodes enter and leave the
network with minimal disruption
In Chord hash function is Secure Hash SHA-1
A key is stored at its successor: node with next higher ID
Key Location
• Finger tables allow faster location by providing additional routing information than simply successor node
Notation Definitionfinger[k].start (n+2k-1)mod 2m, 1=<k=<m.interval [finger[k].start,finger[k+1].start).node first node>=n.finger[k].startsuccessor the next node on the identifier circle;
finger[1].nodepredecessor the previous node on the identifier circle
k is the finger table index
Lookup(id)
Finger tables and key locations with nodes 0,1,3 and keys 1,2 and 6
Finger table for Node 1
Lookup PseudoCode
To find the successor of an id :
Chord returns the successor of the closest preceding finger to this id.
Finding successor of identifier 1
Lookup cost
• The finger pointers at repeatedly doubling distances around the circle cause each iteration of the loop in find_predecessorto halve the distance to the target identifier.In an N node Network the number of messages is of
O(Log N)
Node Join/Leave
Finger Tables and key locations after Node 6 joins After Node 3 leaves
Changed values are in black, unchanged in gray
Join PseudoCode
Three steps:
1- Initialize finger and predecessor of new node n
2- Update finger and predecessor of existing nodes to reflect the addition of n
n becomes ith finger of node p if:
• p precedes n by at least 2i-1
• ith finger of node p succeeds n
3- Transfer state associated with keys that node n is now responsible for
New node n only needs to contact node that immediately forwards it to transfer responsibility for all relevant keys
Join/leave cost
Number of nodes that need to be updated when a node joins is
O(Log N)
Finding and updating those nodes takes
O(Log2 N)
Stabilization
• If nodes join and stabilization not completed 3 cases are possible– finger tables are current lookup successful– successors valid, fingers not lookup successful
(because find_successor succeeds) but slower – successors are invalid or data hasn’t migrated
lookup fails
Stabilization cont’d
n
np
Node n joins
n acquires ns as successor
np runs stabilize:
• asks ns for its predecessor (n)
• np acquires n as its successor
• np notifies n which acquires npas predecessor
ns
Predecessors and successors are correct
Failures and replication
• Key step in failure recovery is correct successor pointers
• Each node maintains a successor-list of rnearest successors
• Knowing r allows Chord to inform the higher layer software when successors come and go when it should propagate new replicas
SymphonySymphonyDistributed Hashing in a Small WorldDistributed Hashing in a Small World
Gurmeet Singh MankuStanford University
with Mayank Bawa and Prabhakar Raghavan
DHTs: The Big PictureLoad BalanceLoad Balance
““How do we splice the hash table evenly?How do we splice the hash table evenly?””Nodes choose their ID in the hash spaceuniformly at random”.
Caching, Hotspots, Fault Tolerance, Caching, Hotspots, Fault Tolerance, Replication, ...Replication, ...
x x ------ xx
Topology EstablishmentTopology Establishment
““How do we route with small state per node?How do we route with small state per node?””
Deterministic Randomized(CAN/Chord) (Pastry/ Tapestry) (Symphony)
Spectrum of DHT Protocols
Protocol #links latency
CAN O(log n) O(log n)Chord O(log n) O(log n)
Viceroy O(1) O(log n)Tapestry O(log n) O(log n)Pastry O(log n) O(log n)
DeterministicTopology
PartlyRandomizedTopology
CompletelyRandomizedTopology
Symphony 2k+2 O((log2 n)/k)
Symphony in a NutshellNodes arranged in a unit circle (perimeter = 1)
Arrival --> Node chooses position along circleuniformly at random
Each node has 1 short link (next node on circle)and k long links
node long link short link
A typical Symphony network
?
Fault Tolerance:No backups for long links! Only short linksare fortified for fault tolerance.
Adaptation of Small World Idea: [Kleinberg00]Long links chosen from a probability distributionfunction: p(x) = 1 / x log n where n = #nodes.
Simple greedy routing:“Forward along that link that minimizes the absolute distance to the destination.”
Average lookup latency = O((log2 n) / k) hops
Network Size Estimation ProtocolProblem: What is the current value of n, the total number of nodes?
x = Length of arc1/x = Estimate of n
(Idea from Viceroy)
- 3 arcs are enough.- Re-linking Protocol not worthwhile.
Intuition Behind Symphony’s PDF
Distance to long distance neighbour
Prob
abili
ty D
istr
ibut
ion
0 ¼ ½ 1
Chord
Distance to long distance neighbour
Prob
abili
ty D
istr
ibut
ion
0 ¼ ½ 1
Distance to long distance neighbour
Prob
abili
ty D
istr
ibut
ion
0 ¼ ½ 1
0 ¼ ½ 1
Prob
abili
ty
Dis
trib
utio
n
Symphony
Distance to long distance neighbour
Step 0: Symphony
0 ¼ ½ 1
p(x) = 1 / (x log n)
Symphony:“Draw from the PDF log n times”
Prob
abili
ty D
istr
ibut
ion
Distance to long distance neighbour
Step 1: Step-Symphony
0 ¼ ½ 1
p(x) = 1 / x log n
Step-Symphony:“Draw from the discretized PDF log n times”
Prob
abili
ty D
istr
ibut
ion
Distance to long distance neighbour
Step 2: Divide PDF into log n Equal Bins
Prob
abili
ty D
istr
ibut
ion
Step-Partitioned-Symphony:“Draw exactly once from each of log n bins”
0 ¼ ½ 1Distance to long distance neighbour
Step 3: Discrete PDFPr
obab
ility
Dis
trib
utio
n
Chord:“Draw exactly once from each of log n bins”Each bin is essentially a point.
0 ¼ ½ 1Distance to long distance neighbour
Two Optimizations
Bi-directional Routing
- Exploit both outgoing and incoming links!- Route to the neighbor that
minimizes absolute distance to destination- Reduces avg latency by 25-30%
1-Lookahead
- List of neighbor’s neighbors- Reduces avg latency by 40%
Latency vs State MaintenanceA
vera
ge L
aten
cy5
10
15
0 10 20 30 40 50 60
Viceroyx x CAN
Pastryx x Chord
TapestryX Pastry
Network size: n=2Network size: n=21515 nodesnodes
Symphony x x x x
xx x x
x
x
+ Bidirectional Links+ 1-Lookahead
# TCP ConnectionsMany more graphs in the paper.
Why Symphony?1. Low state maintenance
Low degree --> Fewer pings/keep-alives, less control trafficLow degree --> Distributed locking and coordination overhead
over smaller sets of nodesLow degree --> Smaller bootstrapping time when a node joins
Smaller recovery time when a node leaves
2. Fault toleranceOnly short links are bolstered. No backups for long links !
3. Smooth out-degree vs latency tradeoff Only protocol that offers this tuning knob even at run time!Out-degree is not fixed at runtime, or as a function of network size.
4. Flexibility and support for heterogeneityDifferent nodes can have different #links !