Structured P2P Networks

Structured P2P Networks

Guo ShuqiaoYao Zhen Rakesh Kumar Gupta

CS6203 Advanced Topics in Database Systems

Introduction-P2P Network

A peer-to-peer (P2P) network is a distributed system in which peers employ distributed resources to perform a critical function in a decentralized fashion [LW2004]

Classification of P2P networksUnstructured and Structured Centralized and DecentralizedHierarchical and Non-Hierarchical

Structured P2P network

Distributed hash table (DHT)DHT is a structured overlay that offers

extreme scalability and hash-table-like lookup interface

CAN, Chord, Pastry

Other techniquesSkip list

Skipgraph, SkipNet

Outline Hashed based techniques in P2P

Hashed based structured P2P system Pastry P-Grid

Two important issues Load balancing Neighbor table consistency preserving

Comparison of DHT techniques

Skip-list based system SkipNet

Conclusion






Conclusion

Pastry [RD2001]

Pastry is a P2P object location and routing schemeHash-based

Properties Completely decentralizedScalableSelf-organizedFault-resilientEfficient search

Design of Pastry

nodeID: each node has a unique numeric identifier (128 bit)Assigned randomly

Nodes with adjacent nodeIDs are diverse in geography, ownership, etc

Assumption: nodeID is uniform in the ID space

Presented as a sequence of digits with base 2b

b is a configuration parameter (4)

Design of Pastry (cont’)

Message/query has a numeric key of same length with nodeIDsKey is presented as a sequence of digits wit

h base 2b

Route: a message is routed to the node with a nodeID that is numerically closest to the key

MessageKey = 10

Destination of Routing

20 31

23 03 1212

Destination node

Pastry Schema

Given a message of key k, a node A forwards the message to a node whose ID is numerically closest to k among all nodes known to A

Each node maintains some routing state

Pastry Node State

A leaf set L A routing table A neighborhood

set M

1023323210233122

102332301023300010233020

1023300110233033 10233120

LARGER

-0-2212102

SMALLER

10-0-312031-1-301233 1-3-0210221-2-230203

-3-1203203-2-2301203

10-1-32102

10

233

10

102-0-02301023-0-322

2

1023-1-000102-2-23021023-2-120

10233-0-01

10-3-23302

10233-2-32102331-2-0

3321332131301233

312032032230120310200230

0221210213021022 11301233

NodeID 10233102

Routing table

Leaf set

Neighborhood set

102-1-1302

Meanings of ‘Close’Closest according to proximity metric (real distance )

Nearest Neighbor

Closest according to numerical meaning

Node with closet nodeID

20 31

23 03 12

31

23

Pastry Node State

A leaf set |L| nodes with closest nodeIDs

|L|/2 larger ones and |L|/2 smaller ones

Useful in message routing A neighborhood set

|M| nearest neighborsUseful in maintaining locality properties

1023323210233122

102332301023300010233021

1023300110233033 10233120

LARGER

-0-2212102

SMALLER

10-0-312031-1-301233 1-3-0210221-2-230203

-3-1203203-2-2301203

10-1-32102

10

233

10

102-0-02301023-0-322

2

1023-1-000102-2-23021023-2-120

10233-0-01

10-3-23302

10233-2-32102331-2-0

3321332131301233

312032032230120310200230

0221210213021022 11301233

NodeID 10233102

Routing table

Leaf set

Neighborhood set

102-1-1302

Leaf Set and Neighborhood Set

In this example

b=2, l=8 |L| = 2 × 2b

= 8 |M| = 2 × 2b

= 8

SMALLER LARGER

A

Routing Table

l rows and 2b columns ith row: i-prefix jth column: next

digit after the prefix is j

b=2 l=8－ > 8 rows and 4 col

umns

1023323210233122

102332301023300010233021

1023300110233033 10233120

LARGER

-0-2212102

SMALLER

10-0-312031-1-301233 1-3-0210221-2-230203

-3-1203203-2-2301203

10-1-32102

10

233

10

102-0-02301023-0-322

2

1023-1-000102-2-23021023-2-120

10233-0-01

10-3-23302

10233-2-32102331-2-0

3321332131301233

312032032230120310200230

0221210213021022 11301233

NodeID 10233102

Routing table

Leaf set

Neighborhood set

102-1-1302

2nd

10-0-31203 10-1-32102 10-3-2330210-0-31203 10-1-32102 10-3-23302

NodeID 10233102

j=0 j=1 j=3

A

1023323210233122

102332301023300010233021

1023300110233033 10233120

LARGER

-0-2212102

SMALLER

10-0-312031-1-301233 1-3-0210221-2-230203

-3-1203203-2-2301203

10-1-32102

10

233

10

102-0-02301023-0-322

2

1023-1-000102-2-23021023-2-120

10233-0-01

10-3-23302

10233-2-32102331-2-0

3321332131301233

312032032230120310200230

0221210213021022 11301233

NodeID 10233102

Routing table

Leaf set

Neighborhood set

102-1-1302

Routing Step1: If k falls within the r

ange of nodeIDs covered by A’s leaf set, forwarded it to a node in the leaf set whose nodeID is closest to k

Eg. k = 10233022 falls in the range (10233000,10233232)

Forword it to node10233021 If k is not covered by the

leaf set, go to step2

A

1023323210233122

102332301023300010233021

1023300110233033 10233120

LARGER

-0-2212102

SMALLER

10-0-312031-1-301233 1-3-0210221-2-230203

-3-1203203-2-2301203

10-1-32102

10

233

10

102-0-02301023-0-322

2

1023-1-000102-2-23021023-2-120

10233-0-01

10-3-23302

10233-2-32102331-2-0

3321332131301233

312032032230120310200230

0221210213021022 11301233

NodeID 10233102

Routing table

Leaf set

Neighborhood set

102-1-1302

Routing Step2: The routing table is us

ed and the message is forwarded to a node whose ID shares a longer prefix with the k than A’s nodeID does

Eg. k = 10223220 forward

it to node 10222302 102-2-2302

If the appropriate entry in the routing table is empty, go to step3

A

Step3: The message is forwarded to a node in the leaf set, whose ID has the same shared prefix as A but is numerically closer to k than A

Eg. k = 10233320

If such a node does not exist, A is the destination node

1023323210233122

102332301023300010233021

1023300110233033 10233120

LARGER

-0-2212102

SMALLER

10-0-312031-1-301233 1-3-0210221-2-230203

-3-1203203-2-2301203

10-1-32102

10

233

10

102-0-02301023-0-322

2

1023-1-000102-2-23021023-2-120

10233-0-01

10-3-23302

10233-2-32102331-2-0

3321332131301233

312032032230120310200230

0221210213021022 11301233

NodeID 10233102

Routing table

Leaf set

Neighborhood set

102-1-1302

RoutingA

forward it to node10233232

Routing

The routing procedure always converges, since each step chooses a node that Shares a longer prefix Shares the same long prefix, but is numerically

closer

Routing performance The expected number of routing steps is log2

bN Assumption: accurate routing tables and no recent

node failures

Performance

Average number of routing hops versus number of Pastry nodesb = 4, |L| = 16, |M| =32 and 200,000 lookups.

Discussion of Pastry

Pastry: the parameters make it flexible

b is the most important parameter that determines the power of the system

Trade-off between the routing efficient (log2bN)

and routing table size (log2bN×2b)

Each node can choose its own |L| and |M| based on the node situation

Local optimal??

Eg. k = 10233200

Discussion of Pastry – routing schema

1023313310233122

102331321023300010233021

1023300110233033 10233120

LARGER

-0-2212102

SMALLER

10-0-312031-1-301233 1-3-0210221-2-230203

-3-1203203-2-2301203

10-1-32102

10

233

10

102-0-02301023-0-322

2

1023-1-000102-2-23021023-2-120

10233-0-01

10-3-23302

10233-2-32102331-2-0

3321332131301233

312032032230120310200230

0221210213021022 11301233

NodeID 10233102

Routing table

Leaf set

Neighborhood set

102-1-1302

A

Y’ nodeID = 10233133Dis(k, X’ID) =

(10233200, 10233232) = 32

Dis(k, Y’ID) =

(10233200, 10233133) = 1

X’ nodeID = 10233232

Local optimal node is Y

Pastry forward to node X

P-Grid [Aberer2001]

P-Grid is a scalable access structure for P2P Hash-based & virtual binary search tree Randomized algorithms are used for constructing the

access structure

6 54321

0 1

00 01 10 11Virtual binary tree

1 :301:2

1 :501:2

0 :611:5

0 :211:5

1 :400:6

0 :610:4

Queryk=100

4

P-Grid (cont’)

Properties Complete decentralizedScalable with the total number of nodes and

data itemsFault-resilient, search is robust against

failures of nodesEfficient search

Discussion of Pastry and P-Grid

The two system both make uniform assumptionPastry: ID spaceP-Grid: data distribution and behavior on

peer

If data/message/query distribution is skewed, Pastry and P-Grid are not able to balance the load






Conclusion

Load Balancing Consider a DHT P2P system with N nodes

Θ(logN) imbalance factor if items IDs are uniformly distributed [SMKKB2001]

Even worse if applications associate semantics with the item IDs

IDs would no longer be uniformly distributed

How to Minimize the load imbalance?Minimize the amount of load moved?

Load Balancing

ChallengesData items are continuously inserted/deletedNodes join and depart continuouslyThe distribution of data item IDs and item

sizes can be skewed Solution—[GLSKS2004]

Load Balancing Virtual server

Represents a peer in the DHT rather than physical node

A physical node hosts one or more virtual server Total load of virtual servers = load of node E.g., in Chord

01

6

4

2

7

5 3

Virtual Server

FT1

FT3

Node:Physical Node

Load Balancing Basic idea

Directories To store load information of the peer nodes Periodically schedule reassignments of virtual

servers

Distributed load balancing problem

Centralized problem at each directory

reduced to

Load Balancing Load balancing algorithm

DirectoryID (known to

all nodes)

Node

Computes a schedule of virtual server transfers among nodes contacting it in order to reduce their maximal utilization

Delay T time

Receives information from nodes

Randomly chooses a directory

Send to directory:(1)Loads of all virtual servers that it is responsible for (2)Capacity

directory innew cycle OR utilization>Ke

yes

Emergency load balancing

Load Balancing Load balancing algorithm (cont.)

Computing optimal reassignment is NP-complete

Greedy algorithm O(mlogm) For each heavily loaded node, move the least

loaded virtual server to pool For each virtual server in pool, from heaviest to

lightest, assign to a node n which minimizes the resulting load

Load Balancing Performance

Tradeoff: Load movement vs. Load balancing Load balancing: max node utilization When T decreases

Max node utilization decreases Load movement increases

Effective in achieving load balancing for System utilization as high as 90% Only transfer 8% of the load that arrives in the

system

Emergency load balancing is necessary

Consistency Preserving

Neighbor tableA table of neighbor pointersFor efficient routing in a P2P system

ChallengeHow to maintain consistent neighbor tables in

a dynamic network where nodes may join, leave and fail concurrently and frequently?


Consistent networkFor every entry in neighbor tables, if there

exists at least one qualified node in the network, then the entry stores at least one qualified node

Qualified node for an entry of a node’s neighbor table: the node whose ID has suffix same as the required suffix of that entry

Otherwise, the entry is empty


K-consistent networkFor every entry in neighbor tables, if there exist

H qualified nodes in the network, then the entry stores at least min(K,H) qualified nodes

Otherwise, the entry is empty For K>0, K-consistency => consistency 1-consistency = consistency


General strategy Identify a consistent subnet as large as possibleOnly replace a neighbor with a closer one if

both of them belong to the subnetExpand the consistent subnet after new nodes

joinMaintain consistency of the subnet when nodes

fail


Approach of [LL2004b] To design a join protocol such that

An initially K-consistent network remains K-consistent after a set of nodes join process terminate

The termination of join implies the node joined belong to this consistent subnet

To design a failure recovery protocol that Recovers K-consistency of the subnet by repairing

holes left by failed neighbors with qualified nodes in the subnet

Protocol is presented in the paper [LL2004a], but integrated with join in experiment of this paper


Join protocolEach node has a status

copying, waiting, notifying, cset_waiting, in_system S-node: node in status in_system

T-node: otherwise

All S-nodes form a consistent subnet

Consistency Preservingcopying

waiting

notifying

cnet_wating

in_system

Copy neighbor infor from S-nodes to fill in most entries of its table level by level.

When cannot find a qualified S-node for a level i>=1Try to find an S-node which shares at least the rightmost i-1 with x and stores x as a neighbor

When find such a node, say ySeek and notify nodes that share the rightmost j digits with it, where j is the lowest level that x is stored in y’s table

When finish notifyingWait for the nodes joining currently and are likely to be in the same consistent subnet

When confirm all nodes have exited notifying status

Consistency Preserving Performance

p-ratio In x’s table, the primary-neighbor of the entry is y,

the true primary-neighbor should be z p-ratio = delay from x to y / delay from x to z

K-consistency is always maintained in all experiments

When K increases, p-ratio decreases More neighbor infor is stored => more messages

Even with massive joins and failures, tables are still optimized greatly






Conclusion

Comparing DHTs [DGPR2003]

Each DHT Algorithm has many details making it difficult to compare. We will use a component-base analysis approach Break DHT design into independent components Analyze impact of each component choice separately

Two types of components Routing-level : neighbor & route selection System-level : caching, replication, querying policy, latency

Metrics Used

Metrics used in comparison Flexibility – Options in choosing neighbors and routes Resilience – Does it route when nodes goes down ? Load balancing – Is the content distributed ? Proximity & Latency – Is the content stored nearby ?

Aspects of DHT Geometry - a structure that inspires a DHT design, Distance function –distance between two nodes Algorithm: rules for selecting neighbors and routes using the

distance function

Algorithm & Geometry

What is routing algorithm & geometry ? Routing Algorithm – refers to exact rules for selecting neighbors,

routes. (eg. Chord, CAN, PRR, Tapestry, Pastry) Geometries – refers to the algorithms’ underlying structure derived

from the way in which neighbors and routes are chosen. (Eg. Chord routes on a ring).

Why is geometry important ? Geometry capture flexibility in selection of neighbors and routes. Neighbor selection – Does the geometry choose neighbors based on

proximity ? Leads to shorter paths. Route selection – Number of options for selecting next hops. Leads

to shorter, reliable paths.

DHT Algorithms Analysis

The table summarizes the geometries & algorithms.

We will examine the metric flexibility in these two aspects Flexibility in neighbor selection

Flexibility in route selection

Geometry Algorithm

Tree PRR

Hypercube CAN

Butterfly Viceroy

Ring Chord

XOR Kademlia

Hybrid Pastry

root

0

00 01

1

10 11

010 110

011 111

000 100

001 101

0

2

4

6

7

5

1

3

root

0

00 01

1

10 11

Tree Geometry

root

0

00 01

1

10 11

PRR uses tree geometry. Distance between two nodes is the depth of the binary tree

(Well-balanced tree : log N) Node selection flexibility - has 2(i-1) options of choosing

neighbor at distance i. No routing flexibility

Height = 1

Height = 2

Leafset

Hypercube Geometry

010 110

011 111

000 100

001 101

CAN uses a d-torus hypercube. Each node has log n neighbor. Routing greedily by correcting bits in

any order. Neighbors differ by exactly one bit.

No flexibility in choosing neighbors. Routing from source to destination at log n distance.

First node has log n next hop choices, second hop has log (n – 1) choices. Hence (log n)! choices

Butterfly Geometry

Viceroy uses butterfly geometry. Nodes organized in a series of log n “stages” where all the

nodes at stage i are capable of correcting the ith bit. Routing consists of 3 phases. Done in O(log N) hops No flexibility in route selection and neighbor selection.

Ring Geometry

Chord uses the Ring Maintain log n neighbors and

routes to arbitrary destination in log n hops. Routing in O(log n) hops

Flexibility in neighbor selection, has 2(i-1) possible options to pick its ith neighborAn approx of nlog n / 2 possible routing tables for each node

Yields (log n)! possible routes to route from a source to destination of distance log n.

0

2

4

6

7

5

1

3

Ring Geometry000

101

100

011

010

001

110

111 110

To route from 000 to 110, we have two routes. Route to 100 and then to 110. Route to 010 and then to 110.

XOR

Kademlia uses XOR Geometry. Distance between nodes is XOR of their identifier. Node has 2(i-1) options of choosing neighbor at ith

distance. Yields approx nlog n / 2 entries per routing table. Route flexibility by fixing lower order bits before fixing the

higher bits if an optimal path is not available. May result in longer distances as as the lower order bits fixed need not be preserved by later routing.

Hybrid

Pastry is a hybrid. Its nodes are regarded as both leaves of a binary tree and points to a one-dimensional circle.

Distance between nodes is either the tree distance and cyclic distance between nodes

Node has 2(i-1) options of choosing neighbor at distance i. Yields approx n((log n) / 2) entries per routing table.

Route selection freedom – allowed to take hops on the ring – these paths might not retain the O(log n) bound on routes.

root

0

00 01

1

10 11

Flexibility OverviewProperty Tree Hypercube Ring Butterfly Xor Hybrid

Neighbor selection nlog n / 2 1 nlog n / 2 1 nlog n / 2 nlog n / 2

Route Selection (optimal) 1 c1(log n) c1(log n) 1 1 1

Natural support for sequential neighbors?

no no yes no no Deafult – noFallback – yes

Ring & Hypercube have twice the routing flexibilities than Hybrid & XOR geometries

Resilience Two aspects of robust routing

Static resilience measures how well the algorithm can route in a dynamic environment before the recovery algorithms.

Dynamic recovery measures how quickly states are recovered after failure.

Node failure- 30% failure Tree - 90% routes failed

(no route selection flexibility) Ring, Hypercube –

7% routes failed (most route selection flexibility)

Hybrid, XOR - 20% route failed (half flexibility as ring)

Route Selection Flexibility affects static resilience

0

20

40

60

80

100

0 10 20 30 40 50 60 70 80 90% Failed Nodes

% F

aile

d P

ath

s

Ring

Hybrid

XORTree

Hypercube

Path Latency

Goal is to minimise end-to-end latency of overlay networks. Two proximity methods are considered. Proximity Neighbor Selection (PNS)

Neighbors are chosen on their proximity. Proximity Route Selection (PRS)

Routes are selected depending on the proximity of the neighbors

PNS achieves improvement over PRS which achieves improvement over Plain version.

Geometry does not affect performance of PNS / PRS. Thus it is important to choose a routing algorithm that has a

geometry that accommodates PNS.

Local Convergence

Does messages sent from two nodes to the same destination converge at a node near the two sources ?

Leads to low latencies in the following: Overlay Multicast Caching Server selection

Measured by number of exit points in the network. Best case, only one node sends a message off-domain.

Limitations & Findings

Limitations Author has not considered all geometries Not considered other factors and performance metrics

Findings Routing geometry is important. Flexibility is improves resilience & proximity.

Why not the RING ? Great flexibility to choose neighbors and routes. Implement both

the proximity methods PNS & PRS. Highest performance in resilience tests and is as good as other

geometry in path lengths and local convergence.






Conclusion

Skip List [PSL1990]

Skip list are data structures that can be used in place of balanced trees. Uses probabilistic balancing techniques hence algorithms are simpler and faster.

Described as a sorted linked list in which some nodes are supplemented with pointers that skip over many list elements.

HDR

2 9 23 275 25

1629

NIL

Perfect Skip List

A perfect skip list is one where the height of the ith node is the exponent of the largest power-of-two that divides i. Pointers at level h have length 2h. A perfect skip list supports searches in O(log N).

Because it is expensive to perform insertion and deletions in a perfect skip list, a probabilistic balanced skip list is proposed by consulting a random number generator.

HDR

2 9 23 275 25

1629

NIL

Height is 2 : (22) Height is 3 : (23)

Level 2 pointer skips over 22 nodes

ExamplesHDR NIL

Add Node 10 (height is 1 chose randomly)

HDR NIL10


HDR NIL105


HDR NIL105

8


HDR NIL5

108

12


HDR NIL5

108

122

Search Skip List

HDR

2 9 23 275 25

1629

NIL

• Search for Node 30. From HDR to Node 29. Then stop and search fails. (illustrated)

• Search for Node 23. From HDR to Node 16. Drop two levels, From Node 16 to Node 23. Found.

• Search for Node 27. From HDR to Node 16. Drop one level, From Node 16 to Node 25. Drop one level, from Node 25 to Node 27. Found.

Skip List

Worst case performance when significantly unbalanced. Space efficient. Can use 1.33 pointers per element. Maintains a O(log N) searches with high probability. Comparison with AVL, recursive 2-3 & self adjust trees

Skip List performs more comparison than other methods. Skip List is slightly slower than AVL trees in searches, but

insertions and deletions in a skip list are faster Skip Lists are faster than self adjusting tree when a

uniform distribution is encountered, but slower for highly skewed distributions

SkipNet Introduction [SNL2003]

In DHTs, we cannot control where the data will be stored Data might be stored far away from the administrative domain

and thus hard to administer privileges. – Can we adapt ? Gives rise to Denial of service attacks and traffic analysis.

Solution : Use SkipNet - scalable overlay network that provides controlled data placement and guarantee routing locality by organizing data by string names Content can be placed on pre-defined node or distributed

uniformly across nodes of a hierarchical naming subtree.

Motivation

Disadvantages of Chord, CAN, Tapestry, Pastry: No Content locality:

Explicitly place data on a specific overlay nodes or distribute it across nodes in a specified domain.

Cannot be prone to traffic analysis & Denial of service attacks No Path locality:

Guarantees that routing path between two overlay nodes in a domain does not leave the domain.

Additional security – the traffic does not passed on to other domain which could be its competitor.

SkipNet provides both content & path locality.

How does SkipNet do it?

Employs a string name and numeric ID space. Node names and content identifier string mapped into name ID Hashes of the node names and content identifiers mapped into

the numeric ID. By arranging content in name ID order rather than

dispersing it, we can achieve content & path locality.

Advantages of locality

Improved availability data stored within organisation and can search even if the network

disjoints. Resilience against Internet failures. Nodes within a cluster gracefully

survives failures that disconnect clusters from the rest of the Internet (useful property of SkipNet)

Performance Searches are faster as data is stored near nodes.

Manageability facilitates control and maintenance in an administrative domain

Security Can deal with traffic analysis & denial of service attacks.

SkipNet Structure

Adapts the skip list structure Traversals start from any node State and processing costs should be the same for all nodes We use a Ring & doubly linked list.

Other enhancements. Each node also stored 2 log N pointers rather than a high variable

number of pointers. SkipNet

Perfect : Pointers at level h point to nodes that are exactly 2h nodes to the left and right.

Probabilistic : A node in level h probabilistically determines which ring it belongs to.

SkipNet Structure

Level

2 T T

1 M X

0 D Z

SkipNet nodes ordered by name ID. Routing tables of nodes A and V shown.

A

DM

O

T

ZX

V

Level

2 D D

1 Z O

0 X T

000 001

010

011100

101110

111

SkipNet StructureRing000

Ring001

Ring010

Ring011

Ring100

Ring101

Ring110

Ring111

A

D M O

T

Z X V

A

M

T

X

D O

Z V

A T

M

X

O

Z

D

V

A T

M

X Z

O D

V

Ring 00 Ring 01 Ring 10 Ring 11

Ring 0 Ring 1

Root Ring Level L = 0

L = 1

L = 2

L = 3

The full SkipNet routing infrastructure for an 8 node system, including the ring labels.

Routing By Name ID

Similar to search in Skip Lists Message routed from highest level pointer in either clockwise /

counter clockwise direction with name ID that are not past the destination value.

Terminates when messages arrives at a node whose name ID is closest to destination.

Because nodes are doubly linked, scheme routes either to left or right pointers depending on name ID’s.

Number of hops is O(log N)

Example

Routing a message from Node A to Node V Path:

A (Level 2, clockwise) T, “T” < “V” T (Level 2, clockwise) Failed T (Level 1, clockwise) Failed T (Level 0, clockwise) V. (Destination)

Level

2 T T

1 M X

0 D Z

A

DM

O

T

ZX

VLevel

2 D D

1 Z O

0 X T

000 001

010

011100

101110

111

Level

2 A A

1 X M

0 V O

Routing Algorithm

SendMsg(nameID, msg) {

if( LongestPrefix(nameID,localNode.nameID)==0 )

msg.dir = RandomDirection();

else if( nameID<localNode.nameID )

msg.dir = counterClockwise;

else

msg.dir = clockwise;

msg.nameID = nameID;

RouteByNameID(msg);

}

// Invoked at all nodes (including the source and// destination nodes) along the routing path.RouteByNameID(msg) { // Forward along the longest pointer // that is between us and msg.nameID. h = localNode.maxHeight; while (h >= 0) { nbr = localNode.RouteTable[msg.dir][h]; if (LiesBetween(localNode.nameID, nbr.nameID,msg.nameID, msg.dir)) { SendToNode(msg, nbr); return; } h = h - 1; } // h<0 implies we are the closest node. DeliverMessage(msg.msg);}

Routing By Numeric ID

Routing begins at level 0 ring until a node is found whose numeric ID matches the destination numeric ID in the first digit.

Messages forwarded from ring in level h, Rh, to a ring in level h+1, Rh+1, such that nodes in Rh+1 share h+1 digits with destination numeric ID.

Terminates when Deliver message to node with numeric ID = key If none of the nodes in Rh share h+1 digits with destination

numeric ID then we pick node with numeric ID that is closest to destination’s numeric ID.

Number of message hops is O(log N),

Routing By Numeric ID

E.g. Let Z = 1000, O = 1001. Route from A 1011. Path: A(0000) D (1100 – move up level) O (1001 – move up level) Z (1000) O (1001 – closest

match for 1011) (deliver).

Ring0000

Ring0001

Ring0100

Ring0101

Ring1000

Ring1001

Ring1100

Ring1101

A

D M O

T

Z X V

A

M

T

X

D O

Z V

A T

M

X

O

Z

D

V

A TM

X Z

O D

V


Ring 0 Ring 1

Root Ring

………………….

O

Routing Algorithm// Invoked at all nodes (including the source and destination nodes) along the routing path.// Initially: msg.ringLvl = -1, msg.startNode = msg.bestNode = null & msg.finalDestination = falseRouteByNumericID(msg) {

if (msg.numID == localNode.numID || msg.finalDestination) {DeliverMessage(msg.msg);return;

}if (localNode == msg.startNode) { // Done traversing current ring.

msg.finalDestination = true;SendToNode(msg.bestNode);return;

}h = CommonPrefixLen(msg.numID, localNode.numID);if (h > msg.ringLvl) { // Found a higher ring.

msg.ringLvl = h;msg.startNode = msg.bestNode = localNode;

} else if ( abs(localNode.numID - msg.numID) < abs(msg.bestNode.numID - msg.numID)) {// Found a better candidate for current ring.msg.bestNode = localNode;

}// Forward along current ring.nbr = localNode.RouteTable[clockWise][msg.ringLvl];SendToNode(nbr);

}

Benefits

Skip Net support routing with the same data structure by name ID numeric ID

Bottom ring is sorted by name ID and top rings are sorted by numeric ID.

For a given node, the SkipNet rings to which it belongs to precisely form a Skip List that is a ring & double linked.

Node Joins & Departure

Node Joins A New node finds top level ring that matches its numeric ID. Finds a neighbor in the top ring using name Id search. Starting from one of the neighbors, it searches for its name ID at

the next lower level and thus finds neighbors at lower level. Repeated until it reaches root. The existing nodes only point to the new node only after it has

joined the root ring. Insertion traverse O(log N) hops with high probability

Node Departure Can route correctly as long as root level ring is maintained.

Other levels regarded as optimization hints and it maintains upper-ring membership thru background repair process.

Example

Join - Insert node O (101) Search by numeric ID 101

Highest attainable level is 2 O joins ring containing Z at level 2 Z forwards join message to D at next lower level 1

Proceed by searching by name ID in next lower levels D, V are neighbors in level 1 M, T are neighbors in level 0

Ring000

Ring001

Ring010

Ring011

Ring100

Ring101

Ring110

Ring111

A

D M O

T

Z X V

A

M

T

X

D O

Z V

A T

M

X

O

Z

D

V

A T

M

X Z

O D

V


Ring 0 Ring 1

Root Ring

Ring000

Ring001

Ring010

Ring011

Ring100

Ring101

Ring110

Ring111

A

D M O

T

Z X V

A

M

T

X

D O

Z V

A T

M

X

O

Z

D

V

A T

M

X Z

O D

V


Ring 0 Ring 1

Root RingA

D M O

T

Z X V

A

D M O

T

Z X V

A

M

T

X

A

M

T

X

D O

Z V

D O

Z V

A TA T

M

X

M

X

O

Z

O

Z

D

V

D

V

AA TT

MM

XX ZZ

OO DD

VV


Ring 0 Ring 1

Root Ring Level L = 0

L = 1

L = 2

L = 3

Properties of SkipNet Content & Path Locality

Naming nodes like a DNS entry. Path locality for groups in which nodes share a single DNS suffix.

E.g. reversing DNS names: john.microsoft.com becomes com.microsoft.john Incorporating node name ID into content name gurantees that the content

will be hosted on that node. E.g. com.microsoft.john/doc-name

Constrained Load Balancing Stored using two parts – a CLB Domain and CLB suffix

For example a doc using the name msn.com/DataCenter!TopStories.html. Searching node

Search for node in the CLB Domain using name ID search. Then search by numeric ID for the hash of the CLB suffix constrained by domain ID.

Search is constrained by a nameID prefix, we use the double link list. This type of search affect the performance by a factor of 2.

Performed over a naming subtree but not over arbitrary subset of nodes.

Properties of SkipNet

Fault tolerance: Only need to maintain correct neighbors at Level 0

Each node has 16 neighbors at Level 0. Level 0 repaired easily by contacting life nodes. Employs background stabilization mechanisms when failure

Failure across organizational boundaries only segments the overlay. Gracefully survives.

Security: Nodes cannot create global names containing suffix of registered

domains. Path locality avoids traffic analysis However, outbound traffic still prone to analysis easily.

Range queries: Ability to perform queries over contiguous ring segments.

Enhancements

Use Sparse & Dense Routing Table Use a density parameter k & a non-binary random digit to the

base k for numeric ID.

Duplicate pointer elimination Remove duplicate pointers in the routing table. 25%

improvements can be achieved.

Incorporate Network proximity for routing by name id Introduce a P-table for proximity routing. The goal of P-table is to

maintain routing in O(log ) hops. Ensures that each hop has low latency. Keeps track of the

network distance that are close to itself.

Enhancements

Incorporate Network proximity for routing by numeric id Add a C-table to incorporate network proximity when searching

by numeric ID. Keeps track of nodes that are close and within CLB domain.

Design Alternative

IP routing & DNSo Content placement by routing using IP and DNS lookup.

Single Overlay Networko Content locality, we name node with the hash of the data’s object’s

name. Requires separate routing table for each objecto Use 2 part naming scheme –content name consist of node addresses

concatenated with node-relative names. Does not support guaranteed path locality

o Add constraints to message to limit path locality. However prevents routing from being consistent.

o Use a 2 part segments, use numeric ID and name ID like SkipNet. Result is a static form of constrained load balancing.

Design Alternative

Multiple overlay networko Multiple overlays with membership could be considered.o Requires that access to other overlays are by gateways. o Access to data is constrained and load balanced within a single

overlay not accessible to clients outside except via gateways.

SkipNet provides explicit content placement, allows clients to dynamically define new DHTs over any name prefix scope and guarantees path locality within shared name prefix within a single infrastructure.

Experiments The author run experiments against the following:

Basic SkipNet using only R-Table Full SkipNet using R-Table, P-Table, C-Table. Pastry Chord

We use the following lookup performance metrics Relative Delay Penalty (RDP) - latency of overlay path compare to IP Physical network hops - length of the overlay path measured in IP hops Number of failed lookups

Other metrics (refer to paper) Format of node name Organisation size Models for distribution of nodes and data Using host or organisation generated node name Simulation of domain isolation by failing organization’s link

Experiment Results Basic routing costs

Full SkipNet and Pastry are locality aware while basic SkipNet and Chord are not. Hence performed better.

Non-uniform distribution of data does not affect performance.

Routing Entries per Node

Locality of Placement Measures physical network hops. Chord and Pastry have constant physical hops because they are

oblivious to locality of data since they diffuse data throughout network.

SkipNet shows performance improvements as the locality of the data references increased.

Chord Basic SkipNet Full SkipNet Pastry

16.3 41.7 102.2 63.2

Experiment Results

Fault Tolerance – when organisation disconnected Locality improves fault tolerance. Chord, Pastry fails totally for local lookups at data diffused SkipNet functions and does local lookups

Constrained Load Balancing (within a domain) Studies the Relative Delay Penalty (RDP) as node increases Basic CLB using R-Table cause higher delays penalties Full CLB causes intermediate delays penalties Pastry has low delay penalties.

Network proximity Study the effect of RDP over density k which control P-Table entries. We notice that RDP levels off after k=8 because of the increase of

pointers in P-Table

SkipNet Summary

SkipNet is the first p2p system that achieves both path and content locality. Provides content locality at desired degree and granularity.

Clustering node names allows SkipNet to perform gracefully in face of linkages failure.

Performance is similar to other p2p systems such as Chord and Pastry under uniform access patter.

Under access patterns where intra-organisation traffic predominates, SkipNet performs better.

SkipNet is also more resilience to network partitions than other p2p.

Conclusion Looked at hashed based techniques in P2

PPastryP-Grid

Two important issuesLoad balancing Neighbor table consistency preserving

Comparison of DHT techniques SkipNet – A Skip List Adaption

References[CAN2001] Sylvia Ratnasamy; Paul Francis; Mark Handley; Richard Karp; Scott

Shenke. A Scalable Content-Addressable Network. SIGCOMM’01, August 27-31, 2001.

[CPLS2001] Ion Stoica Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan Chord: A Scalable Peertopeer Lookup Service for InternetApplications. SIGCOMM’01, August 27-31, 2001.

[CSWH2000] I. Clarke, O. Sandberg, B. Wiley, and T. W. Hong, “Freenet: A distributed anonymous information storage and retrieval system”, Proc. of ICSI Workshop on Design Issues in Anonymity and Unobservability, 2000.

[DRGR2003] K. Gummadi, R. Gummadiy, S. Gribble, S. Ratnasamy, S. Shenker, I. Stoicak, The Impact of DHT Routing Geometry on Resilience and Proximity, SIGCOMM’03, August 25–29, 2003.

[LL2004a] S. S. Lam and h. Liu. Failure recovery for structured P2P networks: Protocol design and performance evaluation. In Proc. Of ACM SIGMETRICS, June 2004.

[LL2004b] Consistency-preserving Neighbor Table Optimization for P2P Networks, Technical Report TR-04-01, Dept. of CS, Univ. of Texas at Austin, January 2004.

References (cont.)[GLSKS2004] Load Balancing in Dynamic Structured P2P Systems, Proc. of IEEE

INFOCOM, Portland, Oregon, USA, 2004.[PSL1990] William Pugh. Skip lists: A probabilistic alternative to balanced trees.

Communications of the ACM, June 1990 supported by an AT&T Bell Labs Fellowship and by NSF grant CCR–8908900.

[RD2001] A. Rowstron and P. Druschel, “Pastry: Scalable, decentralized object location and routing for large-scale pear-to-per systems”. In Proc. of the 18th IFIP/ACM International Conf. on Distributed Systems Platforms, November 2001.

[SMKKB2001] I. Stoica, R. Morris, D. Karger, M.F. Kaashoek, H. Balakrishnan, Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications, Proc. Of SIGCOMM ’01, San Diego, California, USA

[SML+2004] I. Stoica, R. Morris, D. Liben-Nowell, D. R. Karger, M. F. Kaashoek, F. Dabek, and H. Balakrishnan, “Chord: A scalable peer-to-peer lookup service for internet applications”, Proc. of the 2001 ACM Annual Conference of the Special Interest Group on Data Communication (ACM SIGCOMM’01), 2001.

[SNL2003] Nicholas J.A. Harvey, Michael B. Jones, Stefan Saroiu, Marvin Theimer, Alec Wolman. SkipNet: A Scalable Overlay Network with Practical Locality Properties. Proceedings of the Fourth USENIX Symposium on Internet Technologies and Systems (USITS '03), Seattle, WA. March 2003

Documents

Structured P2P Networks