Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications 吳俊興國立高雄大學資訊工程學系 Spring 2006 EEF582 – Internet Applications and Services 網路應用與服務

Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications

吳俊興國立高雄大學資訊工程學系

Spring 2006

EEF582 – Internet Applications and Services

網路應用與服務

參考：「同儕計算網路及其應用」課程

Chord Chord provides support for just one operation: Chord provides support for just one operation:

given a key, it maps the key onto a node. given a key, it maps the key onto a node. Applications can be easily implemented on top Applications can be easily implemented on top

of Chord.of Chord. Cooperative File System (CFS)Cooperative File System (CFS) DNSDNS

Chord-based distributed storage system: CFS

File System

Block Store

CHORD

Block Store

CHORD

Block Store

CHORD

peer peer peer

The Block Store Layer

A CFS File Structure exampleA CFS File Structure example The root-block is identified by a public-key and signed by The root-block is identified by a public-key and signed by

corresponding private keycorresponding private key Other blocks are identified by cryptographic hashes of their Other blocks are identified by cryptographic hashes of their

contentscontents

Chord properties

Efficient: O(Log N) messages per lookupEfficient: O(Log N) messages per lookup N is the total number of servers N is the total number of servers

Scalable: O(Log N) state per nodeScalable: O(Log N) state per node Robust: survives massive changes in Robust: survives massive changes in

membership membership

Hashing Hashing is generally used to distribute objects evenly into Hashing is generally used to distribute objects evenly into

a set of serversa set of servers E.g., the liner congruential function E.g., the liner congruential function h(x)= ax+b (mod p)h(x)= ax+b (mod p) SHA-1SHA-1

When the number of servers changes (When the number of servers changes (pp in the above case), in the above case), then almost every item would be hashed to a new locationthen almost every item would be hashed to a new location

Cached objects become useless in each server when a server Cached objects become useless in each server when a server is removed or introduced to the system.is removed or introduced to the system.

001

012

103

303

637

044

0

1

2

3

4

mod 5

5

6

Add two new buckets (now mod 7)mod 7

Consistent Hashing Load is balancedLoad is balanced Relocation is minimumRelocation is minimum

When an When an NN th server joins/leaves the system, with hig th server joins/leaves the system, with high probability only an O(1/h probability only an O(1/NN) fractions of the data obj) fractions of the data objects need to be relocated ects need to be relocated

A possible implementation

0

1

2

3

001

012

103

303

637

044

Some interval

Objects are servers are first mapped (hashed) to points in the Objects are servers are first mapped (hashed) to points in the same intervalsame interval

Then objects are actually placed into the servers that are Then objects are actually placed into the servers that are closest to them w.r.t. the mapped points in the interval.closest to them w.r.t. the mapped points in the interval.

E.g., D001E.g., D001S0, D012S0, D012S1, D303S1, D303S3S3

objectsobjects serversservers

When server 4 joins

0

1

2

3

001

012

103

303

637

044

Some interval

Only D103 needs to be moved from S3 to S4. The rest remains Only D103 needs to be moved from S3 to S4. The rest remains unchanged.unchanged.


4

When server 3 leaves

0

1

2

3

001

012

103

303

637

044

Some interval

Only D313 and D044 need to be moved from S3 to S4.Only D313 and D044 need to be moved from S3 to S4.


4

Consistent Hashing in Chord Node’s ID = SHA-1 (IP address)Node’s ID = SHA-1 (IP address) Key’s ID = SHA-1 (object’s key/name)Key’s ID = SHA-1 (object’s key/name) Chord views the ID’s as Chord views the ID’s as

Uniformly distributedUniformly distributed occupying a circular identifier spaceoccupying a circular identifier space

Keys are placed at the node whose Ids are the Keys are placed at the node whose Ids are the closest to the (ids of) the keys in the clockwise closest to the (ids of) the keys in the clockwise direction.direction.

successorsuccessor((kk): the first node clockwise from ): the first node clockwise from kk.. Place object Place object kk to to successorsuccessor((kk).).

An ID Ring of length 26-1

k10

CircularID Space

N1

N8

N14

N21

N32N38

N42

N48

N51

N56

k24

k30k38

k54

Simple Lookup

Lookup correct if successors are correctLookup correct if successors are correct Average of n/2 message exchangesAverage of n/2 message exchanges

CircularID Space

N1

N8

N14

N21

N32N38

N42

N48

N51

N56k54

lookup(k54)

Scalable Lookup

The ith entry in the finger table points to successor(n+2i1 (mod 26))

k10

N1

N8

N14

N21

N32N38

N42

N48

N51

N56

k24

k30k38

k54

Finger tableFinger table

N8+1N8+1 N14N14

N8+2N8+2 N14N14

N8+4N8+4 N14N14

N8+8N8+8 N21N21

N8+16N8+16 N32N32

N8+32N8+32 N42N42+1

+2+4+8+16

+32

Scalable Lookup

Look in local finger table for the largest Look in local finger table for the largest nn s.t. s.t.my_id < my_id < nn < ket_id < ket_id

If If nn exists, call exists, call n.loopupn.loopup(key_id), else return (key_id), else return successorsuccessor(my_id)(my_id)

N1

N8

N14

N21

N32N38

N42

N48

N51

N56k54

Finger table at N8Finger table at N8

N8+1N8+1 N14N14

N8+2N8+2 N14N14

N8+4N8+4 N14N14

N8+8N8+8 N21N21

N8+16N8+16 N32N32

N8+32N8+32 N42N42+1

+2+4+8+16

+32

lookup(k54)

Scalable LookupN1

N8

N14

N21

N32N38

N42

N48

N51

N56k54


N42+1N42+1 N48N48

N42+2N42+2 N48N48

N42+4N42+4 N48N48

N42+8N42+8 N51N51

N42+16N42+16 N1N1

N42+32N42+32 N14N14

lookup(k54)

Scalable Lookup

Each node can forward a query at least halfway along the remaining distance between the node and the target identifier.

Lookup takes O(log N) steps.

N1

N8

N14

N21

N32N38

N42

N48

N51

N56k54


N51+1N51+1 N56N56

N51+2N51+2 N56N56

N51+4N51+4 N56N56

N51+8N51+8 N1N1

N51+16N51+16 N8N8

N51+32N51+32 N21N21

lookup(k54)

Node joins When a node When a node ii joins the system from any joins the system from any

existing node existing node jj:: NodeNode j j finds finds successor(i)successor(i) for for i, i, saysay k k i i sets its successor to sets its successor to kk, and informs , and informs kk to set its to set its

predecessor to predecessor to ii.. kk’s old predecessor learns the existence of ’s old predecessor learns the existence of ii by by

running, periodically, a stabilization algorithm to running, periodically, a stabilization algorithm to check if check if kk’s predecessor is still it.’s predecessor is still it.

Node joins (cont.)

CircularID Space

N1

N8

N14

N21

N32N38

N42

N48

N51

N56

Finger tableFinger table

N8+1N8+1 N14N14

N8+2N8+2 N14N14

N8+4N8+4 N14N14

N8+8N8+8 N21N21

N8+16N8+16 N32N32

N8+32N8+32 N42N42

N25 joins via N8

N25

k24

k30 aggressive mechanisms requires too many messages and updates

Node Fails

Can be handled simply as the invert of node Can be handled simply as the invert of node joins; I.r., by running stabilization algorithm.joins; I.r., by running stabilization algorithm.

Handling Failures Use successor listUse successor list

Each node knows Each node knows rr immediate successors immediate successors After failure, will know first live successorAfter failure, will know first live successor Correct successors guarantee correct lookupsCorrect successors guarantee correct lookups

Guarantee is with some probabilityGuarantee is with some probability Can choose Can choose rr to make probability of lookup to make probability of lookup

failure arbitrarily smallfailure arbitrarily small

Weakness NOT that simple (compared to CAN)NOT that simple (compared to CAN) Member joining is complicatedMember joining is complicated

aggressive mechanisms requires too many messages aggressive mechanisms requires too many messages and updatesand updates

no analysis of convergence in lazy finger mechanismno analysis of convergence in lazy finger mechanism Key management mechanism mixed between layersKey management mechanism mixed between layers

upper layer does insertion and handle node failuresupper layer does insertion and handle node failures Chord transfer keys when node joins (no leave Chord transfer keys when node joins (no leave

mechanism!)mechanism!) Routing table grows with # of members in groupRouting table grows with # of members in group Worst case lookup can be slowWorst case lookup can be slow

Chord Summary AdvantagesAdvantages

Filed guaranteed to be found in O(log(N)) stepsFiled guaranteed to be found in O(log(N)) steps Routing table size O(log(N))Routing table size O(log(N)) Robust, handles large number of concurrent join and Robust, handles large number of concurrent join and

leavesleaves DisadvantagesDisadvantages

Performance: routing in the overlay network can be Performance: routing in the overlay network can be more expensive than in the underlying networkmore expensive than in the underlying network

No correlation between node ids and their locality; a query can No correlation between node ids and their locality; a query can repeatedly jump from Taiwan to America, though both the initiator repeatedly jump from Taiwan to America, though both the initiator and the node that store the item are in Taiwan!and the node that store the item are in Taiwan!

Partial solution: Weight neighbor nodes by Round Partial solution: Weight neighbor nodes by Round Trip Time (RTT)Trip Time (RTT)

when routing, choose neighbor who is closer to destination with when routing, choose neighbor who is closer to destination with lowest RTT from me » reduces path latencylowest RTT from me » reduces path latency

Documents

Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications 吳俊興 國立高雄大學 資訊工程學系 Spring 2006 EEF582 – Internet Applications and Services 網路應用與服務

Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications 吳俊興國立高雄大學資訊工程學系 Spring 2006 EEF582 – Internet Applications and Services 網路應用與服務