4
REACHING AGREEMENT A Fundamental Task— Even in Distributed Computer Systems by Fred B. Schneider, Ozalp Babaoglu, Kenneth P. Birman, and Sam Toueg 19 Coordinating computers can be as difficult as coordinating the actions of people. The problem arises in working with a distributed computing system, which consists of a collection of computers interconnected by communication channels. The computers are usually physically separated, and therefore if they fail, they do so independently. This makes it possible for such a system to be fault-tolerant, for data and tasks can be replicated so that if one computer fails, the others can assume its work. Unfortunately, coordinating the actions of a collection of computers when failures must be tolerated can be dif- ficult, if not (provably) impossible. A provably impossible form of co- ordination is illustrated in the metaphor recounted on the facing page. In the Coordinated Attack Problem, each of the generals can be thought of as a computer and the unreliable messenger as an imperfect communications chan- nel. In terms of the metaphor, we prove that if the communications channel connecting two fault-tolerant computers can lose messages, it is impossible for the computers to agree on whether or not to perform an action suggested by one of them. Is the Coordinated Attack Problem of practical interest? Unfortunately, it is. If information is to be replicated at computers so that it can remain avail- able despite possible failure of some of those computers, then some arrange- ment must be made for the replicated data to be kept consistent. This man- dates that when one copy of the data is updated, all available copies be updated. Performing the update is an instance of the Coordinated Attack Problem: either all or none of the available copies must be updated. ATTACKING THE BYZANTINE GENERALS PROBLEM A second example, the Byzantine Generals Problem, demonstrates an- other practical problem in distributed computing systems. This problem differs from the Coordinated Attack Problem in two ways. First, Byzantine generals can be traitors and exhibit arbitrary and malicious behavior; in the Coordinated THE BYZANTINE GENERALS PROBLEM The generals of the Byzantine army are preparing a final campaign. There are N generals, of whom /, at most, are traitors. The traitorous generals are not known to the others, but may collude and attempt to foil a coordinated attack by the rest. As long as the armies controlled by the nontraitorous generals all attack or all retreat, the campaign will be successful. The Commanding General, known to all the others, decides whether the army should attack. Generals communicate by means of a reliable messenger service. A protocol is needed that will achieve Byzantine Agreement: Agreement. All nontraitorous generals execute the same action. Validity. If the Commanding General is not a traitor, then all nontraitorous generals execute her command.

REACHING AGREEMENT - cs.cornell.edu · REACHING AGREEMENT A Fundamental Task— ... Generals communicate by means of a reliable messenger service. A protocol is needed that will achieve

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: REACHING AGREEMENT - cs.cornell.edu · REACHING AGREEMENT A Fundamental Task— ... Generals communicate by means of a reliable messenger service. A protocol is needed that will achieve

REACHING AGREEMENTA Fundamental Task—Even in Distributed Computer Systems

by Fred B. Schneider, Ozalp Babaoglu,Kenneth P. Birman, and Sam Toueg

19

Coordinating computers can be asdifficult as coordinating the actions ofpeople.

The problem arises in working with adistributed computing system, whichconsists of a collection of computersinterconnected by communicationchannels. The computers are usuallyphysically separated, and therefore ifthey fail, they do so independently. Thismakes it possible for such a system tobe fault-tolerant, for data and tasks canbe replicated so that if one computerfails, the others can assume its work.Unfortunately, coordinating the actionsof a collection of computers whenfailures must be tolerated can be dif-ficult, if not (provably) impossible.

A provably impossible form of co-ordination is illustrated in the metaphorrecounted on the facing page. In theCoordinated Attack Problem, each ofthe generals can be thought of as acomputer and the unreliable messengeras an imperfect communications chan-nel. In terms of the metaphor, we provethat if the communications channelconnecting two fault-tolerant computerscan lose messages, it is impossible for

the computers to agree on whether ornot to perform an action suggested byone of them.

Is the Coordinated Attack Problemof practical interest? Unfortunately, itis. If information is to be replicated atcomputers so that it can remain avail-able despite possible failure of some ofthose computers, then some arrange-ment must be made for the replicateddata to be kept consistent. This man-dates that when one copy of the data isupdated, all available copies be updated.Performing the update is an instance of

the Coordinated Attack Problem: eitherall or none of the available copies mustbe updated.

ATTACKING THE BYZANTINEGENERALS PROBLEMA second example, the ByzantineGenerals Problem, demonstrates an-other practical problem in distributedcomputing systems. This problem differsfrom the Coordinated Attack Problemin two ways. First, Byzantine generalscan be traitors and exhibit arbitrary andmalicious behavior; in the Coordinated

THE BYZANTINE GENERALS PROBLEMThe generals of the Byzantine army are preparing a final campaign. There are Ngenerals, of whom /, at most, are traitors. The traitorous generals are not known tothe others, but may collude and attempt to foil a coordinated attack by the rest. Aslong as the armies controlled by the nontraitorous generals all attack or all retreat,the campaign will be successful.

The Commanding General, known to all the others, decides whether the armyshould attack. Generals communicate by means of a reliable messenger service. Aprotocol is needed that will achieve Byzantine Agreement:

Agreement. All nontraitorous generals execute the same action.Validity. If the Commanding General is not a traitor, then all nontraitorous

generals execute her command.

Page 2: REACHING AGREEMENT - cs.cornell.edu · REACHING AGREEMENT A Fundamental Task— ... Generals communicate by means of a reliable messenger service. A protocol is needed that will achieve

"The generals correspond to processors,traitors... to faulty processors, and

the Commanding General's order...to the valueof the sensor being read. "

Attack Problem, generals always ex-hibited correct behavior. Second,Byzantine generals have access toreliable communication; in the Co-ordinated Attack Problem, communi-cation was not reliable.

The Byzantine Generals Problemmust be solved whenever processing isreplicated in order to counteract theeffects of failures in a computing system.This is the basis for triple-modular-redundancy (TMR), which is used bythe computing system on board thespace shuttle, as well as in otherapplications in which fault-tolerance isrequired. The idea is simple. If allprocessors read the same input fromsensors and process it using the sameprogram, then all non-faulty processorswill produce identical results. Thus, aslong as a majority of the processors arenon-faulty, simply voting on the outputsproduced by the processors will producethe correct action. A key premise,however, is that all non-faulty pro-cessors read the same input. Simplyreading from sensors does not ensurethat all correct processors will agree onthese inputs—a faulty sensor might

furnish different values to differentprocessors.

A protocol is required that willpermit the non-faulty processors toagree on the value of the sensor. Atrivial solution to this problem wouldbe for processors always to use the samevalue, say 0. This is clearly unaccep-table. An agreement protocol must alsoensure that if the sensor is non-faulty,processors actually use its value in theircomputation.

A solution to the Byzantine GeneralsProblem is exactly the needed protocol.The generals correspond to processors,traitors correspond to faulty processors,and the Commanding General's ordercorresponds to the value of the sensorbeing read.

PROCEEDINGBEYOND BYZANTIUMResearch aimed at developing andunderstanding the Coordinated AttackProblem and the Byzantine GeneralsProblem is actively being pursued atCornell, M.I.T., and IBM's researchlaboratory at San Jose. By identifyingthose aspects of the environment that

influence the cost of achieving agree-ment, researchers learn how to constructagreement protocols that are well suitedfor particular applications. For ex-ample, suppose that the communicationchannels link only pairs of processorsand that up to t processors might befaulty; in this case, achieving agreementcan take as many as t+\ rounds ofmessage exchange. This is because afaulty processor can send conflictinginformation along disjoint routes toother processors and can generatespurious messages and then pretend tobe forwarding them on behalf of otherprocessors.

One of us (Babaoglu) has recentlyshown that as few as two rounds sufficeto establish Byzantine Agreement if allprocessors share broadcast channels. Infact, there exists a continuum ofByzantine Agreement protocols thatrequire between 2 and t+l rounds,depending on the number of broadcastchannels available and the intercon-nection topology of processors. Fasteragreement protocols are possible whenmore processors share more broadcastchannels. Clearly, there are trade-offs 20

Page 3: REACHING AGREEMENT - cs.cornell.edu · REACHING AGREEMENT A Fundamental Task— ... Generals communicate by means of a reliable messenger service. A protocol is needed that will achieve

involving delay, fault-tolerance, andhardware cost. Broadcast channels helpto speed up Byzantine Agreementbecause when a faulty processor broad-casts a message on such a channel, ithas witnesses to its action. Subsequentattempts to send conflicting informationcan then be detected and ignored byother processors. The Xerox Ethernetand most other local-area communi-cation networks support the requisitebroadcast property for these protocols,making them quite practical.

Byzantine Agreement can be madepractical even when broadcast channelsare not available. It turns out that t+\rounds are necessary to reach agreementonly if t failures actually occur duringexecution of the agreement protocol. If/failures occur, where/</, agreementcan be achieved in fewer rounds. Thus,the cost of execution is proportional tothe amount of fault-tolerance actuallyneeded. Another of us (Toueg) recentlywas involved in the development of suchan early-stopping protocol. It cantolerate as many as N/3 faulty pro-cessors and terminates in 2/+3 rounds.Since most executions of the protocolare failure-free, agreement is typicallyachieved in three rounds. And when tfailures occur, the protocol is as efficientas the best previously known pro-tocols—even those that do not supportearly stopping.

INEXACT AGREEMENTSAND FAULT TOLERANCEOne can devise fast agreement protocolsthat do not achieve agreement andvalidity, but come close. For example,an Inexact Agreement (formulated bySchneider) allows processors, each of

21 which has its own value, to converge on

some value. While this would not bevery useful when the Byzantine generalsmust decide whether to attack orretreat, it is perfectly appropriate forapplications such as clock synchron-ization. Clocks on computers typicallyrun at different rates and are difficult tosynchronize because delays in messagedelivery are variable—receipt of themessage "It is 9:00", for example, tellsthe receiver very little about what timeit is now; it tells only at what time themessage was sent. On the other hand,having synchronized clocks can be quiteuseful, especially in a system intendedto be fault-tolerant, since it is easy todetect that a processor has halted bynoting that it has taken too long for anexpected reply to arrive. An InexactAgreement can form the basis for aclock synchronization protocol: period-ically, processors use Inexact Agree-ment to agree on a clock value and thenreset their local clocks accordingly.

Byzantine Agreement and other pro-tocols that underlie the construction offault-tolerant systems are complex andsubtle. The ISIS Project (underBirman's direction) is concerned withpackaging such protocols and buildingtools that a programmer can use toconvert a fault-intolerant program intoa fault-tolerant one without worryingabout the details of agreement andreplication. ISIS programmers haveaccess to a broadcast (agreement)routine that can be used to disseminateinformation to copies of programmodules. The system also providesfailure-monitoring facilities so that anISIS program can reconfigure itself inresponse to failures. A prototype ISISis now running on the computer sciencedepartment's network of DEC VAX

SELECTED READING

Babaoglu, O., and R. Drummond.1985. Streets of Byzantium: Net-work Architectures for fast reliablebroadcasts. IEEE Transactions onSoft ware Engineering S E-11 (6):546-54.

Birman, K. P., T. A. Joseph, T.Raeuchle, and A. El Abbadi. 1985.Implementing fault-tolerant distri-buted objects. IEEE Transactionson Software Engineering SE-11(6):502-08.

Mahaney, S., and F. B. Schneider.1985. Inexact agreement: Accuracy,precision, and graceful degradation.In Proceedings of 4th annualSIGACT-SIGOPS symposium onprinciples of distributed computing.In press.

Toueg, S., K. Perry, and T. K.Srikanth. 1985. Fast distributedagreement. In Proceedings of 4thannual SIGACT-SIGOPS sym-posium on principles of distributedcomputing. In press.

Page 4: REACHING AGREEMENT - cs.cornell.edu · REACHING AGREEMENT A Fundamental Task— ... Generals communicate by means of a reliable messenger service. A protocol is needed that will achieve

Schneider Babaoglu

11/780's and SUN Workstations. It hasbeen used to build a number of fault-tolerant application systems and hasperformed surprisingly well.

Reaching agreements and toleratingfaults are modes of conduct forcomputer-to-computer as well asperson-to-person communication. Atleast in the world of computers, we arefinding that logic and imagination arethe keys to implementation.

The four authors of this article are all facultymembers of Cornell's Department ofComputer Science.

Fred B. Schneider, an associate professor,studied at Cornell for his undergraduatedegree, awarded in 1975, and received hisdoctorate from the State University of NewYork at Stonybrook in 1978. His maininterests are in programming methodology,concurrency, and distributed systems; he iscurrently writing a text on concurrentprogramming.

Ozalp Babaoglu, an assistant professor,came to Cornell in 1981 from the LawrenceBerkeley Laboratories, where he was a staffscientist. He holds the B.Sc. degree fromGeorge Washington University and thePh.D., granted in 1981, from the Universityof California at Berkeley. While in graduateschool, he spent a summer at the IBMResearch Laboratory in San Jose, and asemester as a foreign scholar at the National

Research Council in Pavia, Italy. In 1982 hewas a co-recipient of the Sakrison MemorialA ward for his contributions to the BerkeleyUNIX operating system. His specialties areoperating systems, performance evaluationand modeling, and distributed systems.

Kenneth P. Birman received his under-graduate degree from Columbia Universityin 1978 and his doctorate from the Uni-versity of California at Berkeley in 1981. AtColumbia he helped develop computersystems for analyzing 24-hour electro-cardiogram recordings, and after earning hisdoctorate he spent six months in thecardiology department of the Vienna state

hospital in Austria, developing a databasesystem for clinical use. At Berkeley hedeveloped a network version of the UNIXoperating system. He joined the Cornellfaculty as an assistant professor in 1982.

Sam Toueg, an assistant professor, studiedat the Israel Institute of Technology for hisB.S. degree, awarded in 1977, and went toPrinceton University for graduate work.After receiving the Ph.D. in 1979, he was apostdoctoral fellow at IBM's T. J. WatsonResearch Center and then came to Cornellin 1981. A specialist in computer networksand distributed computing, he has been areferee for several professional journals. 22