Upload
mark-lambert
View
215
Download
0
Embed Size (px)
Citation preview
1
Detecting and Reducing Partition Nodes in Limited-routing-hop
Overlay Networks
Zhenhua Li and Guihai Chen
State Key Laboratory for Novel Software TechnologyNanjing University, Nanjing, P. R. China
2
Background
• Overlay networks - base infrastructures of many Internet applications
• Limited routing hops - routing one hop in the overlay network is much more
expensive than that in the underlying network.
- flooding or flooding-based routing mechanism
- so has a limit called TTL
3
Motivation
• Overlay partition - seriously degrade the system performance
• Existence of topologically-critical nodes - some nodes’ failure will cause overlay partition with much
higher possibility than others
4
Related work (1)
• Proactive avoid and Event driven - using a centralized server to direct nodes’ join and leave
- but the server becomes a single point of failure
• Proactive avoid and Periodical detect - CAM: actively detect cut nodes and then neutralize them
into normal nodes
- but cut nodes are not applicable to limited-routing-hop overlay networks
5
Related work (2)
• Reactive recover and Event driven - ring partition detect and repair on Pastry and SkipNet
- but they can only be used on ring topology
• Reactive recover and Periodical detect - cross-check method: ask other nodes to do random queri
es and compare their results with its own
- but it has much randomness and uncertainty of detection
6
• The concept of partition node - topologically-critical nodes of limited-routing-hop overlay
networks
• Partition node detection and reduction - a distributed proactive method to detect partition nodes
- reduce partition nodes by changing them to normal nodes
- greatly enhance the connectivity and fault tolerance of overlay networks
Our proposed ideas
7
Outline
• Partiton node concept
• Partition node detection
• Partition node reduction
• Performance evaluation
8
Partition node concept (1)
• Cut node vs. partition node.
- (a) (b) C is a cut node because when C fails, the overlay network is partitioned;
- (c) (d) C is a partition node because when C fails, the overlay network is not partitioned, but C’s neighbors 1, 3, 5, 7 can no longer find each other.
9
Partition node concept (2)
Definition 1 (Locatability) In a limited-routing-hop overlay network, node A could locate node B only if A can find B by
sending routing messages. It is denoted by A→B.
Definition 2 (Reachability) In a limited-routing-hop overlay network, node A could reach node C if A can locate C, or A can locate some node B and B can locate C. It is denoted by A→→C.
10
Partition node concept (3)
• Example: Node 1 can only locate nodes 2, 3, 4, and can reach node 5, 6, 7, but cannot reach node 8.
11
Partition node concept (4)
• Definition 3 (Partition Node) Node C is a partition node if C’s neighbor set would be partitioned into two or more unreachable subsets S1, S2, . . . , Sn (n≥2) when C fails.
• Example:
12
Partition node detection (1)
4 steps: • Initialize detection (0)(1)• Probe reachability (2a)(2b)• Partition subsets (3)• Make decision (4)
13
Partition node detection (2)
14
Partition node reduction (1)
• Add edges to reduce partition nodes - choose an appropriate delegate node Ni from each subse
t Si,
- and then connects all the delegate nodes in some way.
- In order to improve the system’s fault tolerance, we try to make every node’s degree above a constant lower bound as much as possible.
15
Partition node reduction (2)
• Linear chain connection vs. Chordal ring connection
- more edges, but much more resilience
16
Partition node reduction (3)
• Remove edges to limit node degree - the new edges added to reduce a partition node cannot
be removed; - remove the edge whose corresponding node has the hi
ghest load factor.
• Total cost of partition node detection and reduction
- n: tatal number of nodes, t: TTL, c: average node degree
- total cost is ))(),(min( 2cnOncO t
17
Performance evaluation (1)
• Partition nodes’ significance to overlay topology.
18
Performance evaluation (2)
• Effectiveness of our method
19
Performance evaluation (3)
• Fault tolerance improvement
20
The End
Thanks!