View
235
Download
2
Embed Size (px)
Citation preview
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications
吳俊興國立高雄大學 資訊工程學系
Spring 2006
EEF582 – Internet Applications and Services
網路應用與服務
參考:「同儕計算網路及其應用」課程
Chord Chord provides support for just one operation: Chord provides support for just one operation:
given a key, it maps the key onto a node. given a key, it maps the key onto a node. Applications can be easily implemented on top Applications can be easily implemented on top
of Chord.of Chord. Cooperative File System (CFS)Cooperative File System (CFS) DNSDNS
Chord-based distributed storage system: CFS
File System
Block Store
CHORD
Block Store
CHORD
Block Store
CHORD
peer peer peer
The Block Store Layer
A CFS File Structure exampleA CFS File Structure example The root-block is identified by a public-key and signed by The root-block is identified by a public-key and signed by
corresponding private keycorresponding private key Other blocks are identified by cryptographic hashes of their Other blocks are identified by cryptographic hashes of their
contentscontents
Chord properties
Efficient: O(Log N) messages per lookupEfficient: O(Log N) messages per lookup N is the total number of servers N is the total number of servers
Scalable: O(Log N) state per nodeScalable: O(Log N) state per node Robust: survives massive changes in Robust: survives massive changes in
membership membership
Hashing Hashing is generally used to distribute objects evenly into Hashing is generally used to distribute objects evenly into
a set of serversa set of servers E.g., the liner congruential function E.g., the liner congruential function h(x)= ax+b (mod p)h(x)= ax+b (mod p) SHA-1SHA-1
When the number of servers changes (When the number of servers changes (pp in the above case), in the above case), then almost every item would be hashed to a new locationthen almost every item would be hashed to a new location
Cached objects become useless in each server when a server Cached objects become useless in each server when a server is removed or introduced to the system.is removed or introduced to the system.
001
012
103
303
637
044
0
1
2
3
4
mod 5
5
6
Add two new buckets (now mod 7)mod 7
Consistent Hashing Load is balancedLoad is balanced Relocation is minimumRelocation is minimum
When an When an NN th server joins/leaves the system, with hig th server joins/leaves the system, with high probability only an O(1/h probability only an O(1/NN) fractions of the data obj) fractions of the data objects need to be relocated ects need to be relocated
A possible implementation
0
1
2
3
001
012
103
303
637
044
Some interval
Objects are servers are first mapped (hashed) to points in the Objects are servers are first mapped (hashed) to points in the same intervalsame interval
Then objects are actually placed into the servers that are Then objects are actually placed into the servers that are closest to them w.r.t. the mapped points in the interval.closest to them w.r.t. the mapped points in the interval.
E.g., D001E.g., D001S0, D012S0, D012S1, D303S1, D303S3S3
objectsobjects serversservers
When server 4 joins
0
1
2
3
001
012
103
303
637
044
Some interval
Only D103 needs to be moved from S3 to S4. The rest remains Only D103 needs to be moved from S3 to S4. The rest remains unchanged.unchanged.
objectsobjects serversservers
4
When server 3 leaves
0
1
2
3
001
012
103
303
637
044
Some interval
Only D313 and D044 need to be moved from S3 to S4.Only D313 and D044 need to be moved from S3 to S4.
objectsobjects serversservers
4
Consistent Hashing in Chord Node’s ID = SHA-1 (IP address)Node’s ID = SHA-1 (IP address) Key’s ID = SHA-1 (object’s key/name)Key’s ID = SHA-1 (object’s key/name) Chord views the ID’s as Chord views the ID’s as
Uniformly distributedUniformly distributed occupying a circular identifier spaceoccupying a circular identifier space
Keys are placed at the node whose Ids are the Keys are placed at the node whose Ids are the closest to the (ids of) the keys in the clockwise closest to the (ids of) the keys in the clockwise direction.direction.
successorsuccessor((kk): the first node clockwise from ): the first node clockwise from kk.. Place object Place object kk to to successorsuccessor((kk).).
An ID Ring of length 26-1
k10
CircularID Space
N1
N8
N14
N21
N32N38
N42
N48
N51
N56
k24
k30k38
k54
Simple Lookup
Lookup correct if successors are correctLookup correct if successors are correct Average of n/2 message exchangesAverage of n/2 message exchanges
CircularID Space
N1
N8
N14
N21
N32N38
N42
N48
N51
N56k54
lookup(k54)
Scalable Lookup
The ith entry in the finger table points to successor(n+2i1 (mod 26))
k10
N1
N8
N14
N21
N32N38
N42
N48
N51
N56
k24
k30k38
k54
Finger tableFinger table
N8+1N8+1 N14N14
N8+2N8+2 N14N14
N8+4N8+4 N14N14
N8+8N8+8 N21N21
N8+16N8+16 N32N32
N8+32N8+32 N42N42+1
+2+4+8+16
+32
Scalable Lookup
Look in local finger table for the largest Look in local finger table for the largest nn s.t. s.t.my_id < my_id < nn < ket_id < ket_id
If If nn exists, call exists, call n.loopupn.loopup(key_id), else return (key_id), else return successorsuccessor(my_id)(my_id)
N1
N8
N14
N21
N32N38
N42
N48
N51
N56k54
Finger table at N8Finger table at N8
N8+1N8+1 N14N14
N8+2N8+2 N14N14
N8+4N8+4 N14N14
N8+8N8+8 N21N21
N8+16N8+16 N32N32
N8+32N8+32 N42N42+1
+2+4+8+16
+32
lookup(k54)
Scalable LookupN1
N8
N14
N21
N32N38
N42
N48
N51
N56k54
Finger table at N42Finger table at N42
N42+1N42+1 N48N48
N42+2N42+2 N48N48
N42+4N42+4 N48N48
N42+8N42+8 N51N51
N42+16N42+16 N1N1
N42+32N42+32 N14N14
lookup(k54)
Scalable Lookup
Each node can forward a query at least halfway along the remaining distance between the node and the target identifier.
Lookup takes O(log N) steps.
N1
N8
N14
N21
N32N38
N42
N48
N51
N56k54
Finger table at N51Finger table at N51
N51+1N51+1 N56N56
N51+2N51+2 N56N56
N51+4N51+4 N56N56
N51+8N51+8 N1N1
N51+16N51+16 N8N8
N51+32N51+32 N21N21
lookup(k54)
Node joins When a node When a node ii joins the system from any joins the system from any
existing node existing node jj:: NodeNode j j finds finds successor(i)successor(i) for for i, i, saysay k k i i sets its successor to sets its successor to kk, and informs , and informs kk to set its to set its
predecessor to predecessor to ii.. kk’s old predecessor learns the existence of ’s old predecessor learns the existence of ii by by
running, periodically, a stabilization algorithm to running, periodically, a stabilization algorithm to check if check if kk’s predecessor is still it.’s predecessor is still it.
Node joins (cont.)
CircularID Space
N1
N8
N14
N21
N32N38
N42
N48
N51
N56
Finger tableFinger table
N8+1N8+1 N14N14
N8+2N8+2 N14N14
N8+4N8+4 N14N14
N8+8N8+8 N21N21
N8+16N8+16 N32N32
N8+32N8+32 N42N42
N25 joins via N8
N25
k24
k30 aggressive mechanisms requires too many messages and updates
Node Fails
Can be handled simply as the invert of node Can be handled simply as the invert of node joins; I.r., by running stabilization algorithm.joins; I.r., by running stabilization algorithm.
Handling Failures Use successor listUse successor list
Each node knows Each node knows rr immediate successors immediate successors After failure, will know first live successorAfter failure, will know first live successor Correct successors guarantee correct lookupsCorrect successors guarantee correct lookups
Guarantee is with some probabilityGuarantee is with some probability Can choose Can choose rr to make probability of lookup to make probability of lookup
failure arbitrarily smallfailure arbitrarily small
Weakness NOT that simple (compared to CAN)NOT that simple (compared to CAN) Member joining is complicatedMember joining is complicated
aggressive mechanisms requires too many messages aggressive mechanisms requires too many messages and updatesand updates
no analysis of convergence in lazy finger mechanismno analysis of convergence in lazy finger mechanism Key management mechanism mixed between layersKey management mechanism mixed between layers
upper layer does insertion and handle node failuresupper layer does insertion and handle node failures Chord transfer keys when node joins (no leave Chord transfer keys when node joins (no leave
mechanism!)mechanism!) Routing table grows with # of members in groupRouting table grows with # of members in group Worst case lookup can be slowWorst case lookup can be slow
Chord Summary AdvantagesAdvantages
Filed guaranteed to be found in O(log(N)) stepsFiled guaranteed to be found in O(log(N)) steps Routing table size O(log(N))Routing table size O(log(N)) Robust, handles large number of concurrent join and Robust, handles large number of concurrent join and
leavesleaves DisadvantagesDisadvantages
Performance: routing in the overlay network can be Performance: routing in the overlay network can be more expensive than in the underlying networkmore expensive than in the underlying network
No correlation between node ids and their locality; a query can No correlation between node ids and their locality; a query can repeatedly jump from Taiwan to America, though both the initiator repeatedly jump from Taiwan to America, though both the initiator and the node that store the item are in Taiwan!and the node that store the item are in Taiwan!
Partial solution: Weight neighbor nodes by Round Partial solution: Weight neighbor nodes by Round Trip Time (RTT)Trip Time (RTT)
when routing, choose neighbor who is closer to destination with when routing, choose neighbor who is closer to destination with lowest RTT from me » reduces path latencylowest RTT from me » reduces path latency