View
219
Download
0
Tags:
Embed Size (px)
Citation preview
SplitStream: High-Bandwidth Multicast in Cooperative
Environments
Marco BarrenoPeer-to-peer systems
9/22/2003
Background
Tree-based multicast
High demand on few internal nodes
Cooperative environments
Peers contribute resources
We don't assume dedicated infrastructure
Different peers may have different limitations
Goals of SplitStream
Balance load over peers
Accommodate different limitations
Each node has a desired indegree and a forwarding capacity (max outdegree)
Be robust to failures
The SplitStream approach
Split data into stripes, each over its own tree
Each node is internal to only one tree
Built on Pastry and Scribe
Recall that Pastry uses prefix routing
Scribe background
Built on top of Pastry
Any Scribe node may create a group
Other nodes may join group or send multicast
Node with nodeId numerically closest to groupId is the rendezvous point
Root of multicast tree for the group
Joins handled locally
But it's only a single tree
Stripes
SplitStream divides data into stripes
Each stripe uses one Scribe multicast tree
Prefix routing ensures property that each node is internal to only one tree
Inbound bandwidth: can achieve desired indegree while this property holds
Outbound bandwidth: this is harder—we'll have to look at the node join algorithm to see how this works
Respecting forwarding capacity
The tree structure described may not respect maximum capacities
Scribe's push-down fails to resolve the problem because a leaf node in one tree may have children in another tree
Compare this to Overcast
Overcast also creates an overlay to spread multicast work around, but...
Overcast is single-source, while SplitStream is multi-source
Overcast uses a single tree, while SplitStream uses multiple trees
Overcast is designed to maximize bandwidth between root and leaves, while SplitStream is designed to spread load evenly to all nodes (including leaves)
Parent location algorithm
1 Node adopts prospective child
2 If too many children, choose one to reject:
i. First, look for one in stripe without shared prefix
ii. Otherwise, select node with shortest prefix match
3 Orphan locates new parent in up to two steps:
i. Tries former siblings with stripe prefix match
I. Adopts or rejects using same criteria; continue push-down
ii. Use the spare capacity group
The spare capacity group
If orphan hasn't found parent yet, anycasts to spare capacity group
Group contains all SplitStream nodes with fewer children than their forwarding capacity
Anycast returns nearby node, which starts a DFS of the spare capacity group tree, sending first to a child...
Spare capacity group (cont.)
At each node in the search:
If node has no children left to search, check whether it receives a stripe the orphan seeks
If so, verifies that the orphan is not an ancestor (which would create a cycle)
If both tests succeed, the node adopts the orphan
May leave spare capacity group
If either test fails, back up to parent (more DFS...)
A spare capacity example
Consequences
Parent is likely to be physically near orphan due to locality of Pastry and Scribe
However, it is possible for the parent already to be an internal node for another stripe
If this parent fails it will bring down two stripes
Anycast can still fail
Adding the orphan may cause a cycle (fixable)
No node with spare capacity provides stripe sought
Declare failure and notify the application
Correctness and complexity
Big assumptions:
All nodes join at the same time and communication is reliable
Nodes do not leave the system either voluntarily or due to failures
Splitstream can deal with violations of either, but problems may arise that prevent the forest from being constructed
Simulation shows this isn't problematic in practice
Correctness and complexity (2)
A fairly lengthy analysis reveals this rough upper bound on the probability that the algorithm fails to build a feasible forest:
But when the desired indegree of all nodes equals the total number of stripes, the algorithm never fails
Correctness and complexity (3)
Expected state maintained by each node is O(log|N|)
Expected number of messages to build forest is O(|N|log|N|) if trees are well balanced and O(|N|2) in the worst case
Trees should be well balanced if each node forwards its own stripe to two other nodes
Experiments
Experiments (2)
Experiments (3)
Experiments (4)
Conclusions
So what are the major points? =)