Upload
mariana-bartley
View
230
Download
0
Tags:
Embed Size (px)
Citation preview
Gossip Protocolsfrom Epidemic Algorithms for Replicated Database Maintenance, Alan Demers et al
Russell Greenspan
CS523
Spring, 2006
Gossip Protocols
Useful for routing information through an unreliable network
Great for replicating database updates from one site to many
Introduce randomness leading to “probabilistic” reliability
Epidemic Algorithms for Replicated Database Maintenance, Alan Demers et al Gossip protocols in replicated domain name
service on Clearinghouse Servers for Xerox Corporate Internet Direct Mail Anti-Entropy Rumor Mongering
Demers, A., Greene, D., Hauser, C., Irish, W., Larson, J., Shenker, S., Sturgis, H., Swinehart, D., and Terry, D. 1987. Epidemic algorithms for replicated database maintenance. In Proceedings of the Sixth Annual ACM Symposium on Principles of Distributed Computing (Vancouver, British Columbia, Canada, August 10 - 12, 1987). F. B. Schneider, Ed. PODC '87. ACM Press, New York, NY, 1-12. DOI= http://doi.acm.org/10.1145/41840.41841
Direct Mail(What not to do!) Timestamp each operation Notify all sites of every operation when it occurs
Sending pseudocodeforeach (site s in Sites)
send(to:s, item:i); Receiving pseudocode
if (localItem.timeStamp < i.timeStamp)localItem.value = i.value;
Problems No guarantee of message delivery
Unreachable destinations Incorrect destination list
Generates n messages (1 per site s in Sites) Each message traverses the entire network between
source and destination
Anti-Entropy
Based on epidemic theory Single infected site eventually infects entire population of
susceptible sites In database replication, infected site is the one with the latest
update, susceptible sites are those needing the update Total time to update population is proportional to log of
population size Periodically sync entire copy of DB with one or more
remote copies Pseudocode
forsome (site s in Sites) resolveDifferences(localDB, s);
Anti-EntropyResolving Differences Push
If local item is more recent, tell remote copy to update Site susceptible at time t remains susceptible at time t + 1 if no
infected site contacted it at time t + 1 Good when most sites need the update
Pull Ask remote copy if its copy is more recent; if so, update local
copy Site susceptible at time t remains susceptible at time t + 1 if it did
not contact an infected site at time t + 1 Good when most sites are already update-to-date
Push-Pull Do both:
if (localItem.timeStamp < i.timeStamp)localItem.value = i.value; //pull
else if (localItem.timeStamp > i.timeStamp)i.value = localItem.value; //push
Anti-EntropyPerformance Enhancements Compare DB checksums (recomputed after
each update) before exchanging full copy of the DB Only exchange DB if checksums differ Need to use time window that allows updates to
propagate before checksums are compared Keep inverted index by timestamp and only
exchange recent updates
Rumor Mongering
Like a game of network-telephone, infected site s1 shares update with susceptible site s2
s2 then becomes infected and spreads the update to n other susceptible sites (exponential growth)
At a certain point, infected site realizes that rumor has spread sufficiently and stops sharing it via: Feedback-based probability (stop with probability 1/k if site
was already infected) Blind probability (always stop with probability 1/k) Fixed count (stop after k sites report they are already
infected)
Rumor MongeringConnection Limit Consider a limit on the number of rumor
requests a server can process at once Effects of Push and Pull methods
On mostly-stale database, Push excels since if two sites try to push to the same recipient, one is rejected; as epidemic spreads, this has the effect of canceling unnecessary update network traffic
With Pull method, connection limit introduces slight chance that a site might miss an update
Replication Strategy Recap
Direct mail creates high volume of network traffic
Anti-Entropy is only practical as a backup to some other mechanism since it can be so expensive
Rumor Mongering introduces nonzero probability of failure
So... use Rumor Mongering with infrequent Anti-Entropy backup
Deleted Data
Need to use “Death Certificate” to notify sites of deleted data
Similar to imagining a “Deleted” flag on every piece of data, with delete operations updating this flag instead of actually deleting data
Compare timestamp of Death Certificate in same way update timestamps are compared
Store Death Certificates: For a fixed period of time In “dormant” state at small number of sites and activate it
when site s1 realizes site s2 has not seen it; keep “activation timestamp” to determine dormancy
Network Node Distribution
Consider extremes Only nearest neighbors
Traffic per link per cycle = O(1) Cycles to spread update = O(n)
Random Traffic per link per cycle = O(n) Cycles to spread update = O(log n)
Distance proportional to d-2
Traffic per link per cycle = O(log n) Cycles to spread update = O(log n) Great... but achieving d-2 in real-world network not
immediately obvious; suggests rectilinear meshes of sites
Network Node DistributionAchieving Rectilinear Grids Xerox’s topology contains hundreds of nodes
in US and tens of nodes in Europe connected by a pair of transatlantic links
Each site keeps list of other sites sorted by distance and chooses nearest recipients with greatest probabilities
Network Node DistributionPerformance by Distribution FactorMethod and Distribution
Connection Limit
Average Convergence Time (after 250 runs)
Compare Traffic Update Traffic
Average Transatlantic Link
Average Transatlantic Link
Anti-Entropy, Uniform
None 5.27 5.87 75.74 5.85 74.43
Anti-Entropy, Spatial
None 7.76 1.36 2.38 1.89 5.87
Anti-Entropy, Uniform
1 6.97 3.71 47.54 5.83 75.17
Anti-Entropy, Spatial
1 14.14 0.72 0.94 1.94 4.85
Push-pull Rumor Mongering, Uniform
n/a 5.32 8.87 114.0 5.84 75.87
Push-pull Rumor Mongering, Spatial
n/a 7.74 1.99 3.44 1.90 5.94
Network Node DistributionPush and Pull Push alone and Pull alone extremely
sensitive to network topology Consider a network with two isolated sites s1 and
s2
With Push, if the update is introduced at s1 or s2
and these two sites continuously select each other as partners, the update will not propagate
With Pull, if the update is introduced in the main part of the network, high likelihood that the update is no longer hot when s1 or s2 finally go to pull
References
Demers, A., Greene, D., Hauser, C., Irish, W., Larson, J., Shenker, S., Sturgis, H., Swinehart, D., and Terry, D. 1987. Epidemic algorithms for replicated database maintenance. In Proceedings of the Sixth Annual ACM Symposium on Principles of Distributed Computing (Vancouver, British Columbia, Canada, August 10 - 12, 1987). F. B. Schneider, Ed. PODC '87. ACM Press, New York, NY, 1-12. DOI= http://doi.acm.org/10.1145/41840.41841
M. -J. Lin and K. Marzullo. Directional gossip: gossip in a wide area network. To be published in Proceedings of the Third European Dependable Computing Conference (SpringerVerlag LNCS).
D. Agrawal, A. El Abbadi, and R.C. Steinke, "Epidemic Algorithms in Replicated Databases," Proc. 16th Symp. Principles of Database Systems, pp. 161-172, May 1997.
Chandra, R., Ramasubramanian, V., Birman, K.P. Anonymous Gossip: Improving multicast reliability in mobile ad-hoc networks. International Conference on Distributed Computing Systems (2001) 275-283.