41
10/5/08 CSC309 Miller 1 Ch4 Can we Trust the Computer

10/5/08CSC309 Miller1 Ch4 Can we Trust the Computer

Embed Size (px)

Citation preview

Page 1: 10/5/08CSC309 Miller1 Ch4 Can we Trust the Computer

10/5/08 CSC309 Miller 1

Ch4 Can we Trust the Computer

Page 2: 10/5/08CSC309 Miller1 Ch4 Can we Trust the Computer

10/5/08 CSC309 Miller 2

Pregnant!

The computer generated diagnosis of pregnanton discharge papers for a visit triggered by “agonizing abdominal pain” came as a real surprise for 71-year-old John Grady Pippen.

Probably a data code entry problem.

Associated Press 9/27/08

Page 3: 10/5/08CSC309 Miller1 Ch4 Can we Trust the Computer

10/5/08 CSC309 Miller 3

It Was a Hardware Failure

On June 3, 1980 two submarine launched ballistic missiles were detected heading toward the US. Eighteen seconds later a number of additional launches were detected.

Page 4: 10/5/08CSC309 Miller1 Ch4 Can we Trust the Computer

10/5/08 CSC309 Miller 4

Forgot a Full Moon Rises

The BMEWS defense system located in Greenland to protect the US from sneak attacks from across the north pole reported with a 99% probability of certainty that we were under a massive attack when the software saw its first moon rise.

Page 5: 10/5/08CSC309 Miller1 Ch4 Can we Trust the Computer

10/5/08 CSC309 Miller 5

Dial zero first

The general in charge of NORAD combat operations was unable to demonstrate to a group of reporters how a direct phone link to the World-Wide Military Command and Control System kept the communication lines open in times of crisis because no one had told him that he needed to dial a zero first.

Page 6: 10/5/08CSC309 Miller1 Ch4 Can we Trust the Computer

10/5/08 CSC309 Miller 6

War Games?

On November 9, 1979 a test tape containing simulated attack data was fed into a NORAD computer.

Page 7: 10/5/08CSC309 Miller1 Ch4 Can we Trust the Computer

10/5/08 CSC309 Miller 7

Problems for Individuals1. The ubiquitous billing error problem.2. Mix records up with someone else.3. Data ingested wrong. (Same code, different meanings)4. Computer errors making it to a credit report.5. Having to prove that you are not dead. 6. The 101 year old who saw a rise in insurance rates because somebody figured he was under age.7. Lots of examples of false arrest.8. Errors in payroll.

Page 8: 10/5/08CSC309 Miller1 Ch4 Can we Trust the Computer

10/5/08 CSC309 Miller 8

System Failures

1. January 1990 AT&T experienced 9 hours of voice and data disruption due to a typo in a three line change to a two-million line program. Fifty million phone calls didn't get through.

2. The Denver airport baggage system delayed opening of the airport by more than a year with a cost of more than $1,000,000 per day. (They had not allowed sufficient time for testing and development and made significant changes after the the project began.)

Page 9: 10/5/08CSC309 Miller1 Ch4 Can we Trust the Computer

10/5/08 CSC309 Miller 9

System Failures (Cont.)3. A computer error created 800,000 winning numbers in a Pepsi Cola sponsored contest where only 18 winners were intended.

4. Japan reports the first death by robot in July 1981. The worker had not taken advantage of an automatic power shut off triggered by opening a gate. There have been 19 additional deaths reported of which 6 have been attributed to stray electromagnetic interference.

Page 10: 10/5/08CSC309 Miller1 Ch4 Can we Trust the Computer

10/5/08 CSC309 Miller 10

System Failures (Cont.)

5. Between March of 1986 and January of 1987 the Therac-25, a machine for administering X-rays for the treatment of cancer, killed three and injured an additional 3. Normal radiation in the 100-200 rad range but it hit patients with up to 25,000 rads.

6. The A320 Airbus was the first plane to be fully "fly-by-wire". In a 1993 crash the plane failed to detect it had landed and retracted the wheels.

Page 11: 10/5/08CSC309 Miller1 Ch4 Can we Trust the Computer

7/1/09 CSC309 Miller 11

Computer Failures Are Probed in Jet Crash

Wall Street Journal (06/27/09) P. A1; Pasztor, Andy; Michaels, Daniel

Aviation investigators looking for a cause of the crash of Air France Flight 447 believe that a rapid chain of computer and equipment failures may have stripped the flight crew of the airplane's automation technology, which pilots generally rely on to control large jets. Regardless of the final findings, the crash is already prompting some flight safety experts to question whether pilots are trained enough to handle widespread flight-computer failures.

Page 12: 10/5/08CSC309 Miller1 Ch4 Can we Trust the Computer

7/1/09 CSC309 Miller 12

Metrorail Crash May Exemplify Automation Paradox

Washington Post (06/29/09) P. A9; Vedantam, Shankar

The fatal collision of two trains on Washington, D.C., Metro's Red Line may come to symbolize the core problem of automation, which is the relationship between humans and their auto-mated control systems. As such systems become more reliable, the greater the likelihood that supervising humans will become less focused, which makes it increasingly probable that unanticipated variables will tangle up the algorithm and lead to disaster.

Page 13: 10/5/08CSC309 Miller1 Ch4 Can we Trust the Computer

10/5/08 CSC309 Miller 13

Bhopal 3 DEC 84Union Carbide plant produced methyl isocyanate, an intermediate product in the production of pesticides, which was both volatile and toxic (considered 100 times more dangerous than cyanide). Methyl isocyanate boils at 100F but the air conditioning at the plant was shut off because it was winter. It also reacts with water yielding heat. 1000 gallons of water were added accidentally and temperature increased and rate of reaction increased until the pressure buildup blew the release valves on the tank.

Page 14: 10/5/08CSC309 Miller1 Ch4 Can we Trust the Computer

10/5/08 CSC309 Miller 14

Bhopal 3 DEC 84 (Cont.)

At 1 in the morning a cloud of deadly gas was moving toward the shanty town that had sprung up around the plant and then on to the city. Three to four thousand people were killed immediately and years later it was estimated that there were 2 deaths per day from the 400,000 injured in the accident.

Page 15: 10/5/08CSC309 Miller1 Ch4 Can we Trust the Computer

10/5/08 CSC309 Miller 15

Setting the Stage

1. Union Carbide was in downsizing mode because of competition from Dupont, etc.

2. Partially because of the Indian government's push for the use of more insecticides in 1977 the company started to convert to the manufacture of the volatile methyl isocyanate.

3. Plant not moved to safer area in part because of the loss to local politicians who had brought the plant in.

Page 16: 10/5/08CSC309 Miller1 Ch4 Can we Trust the Computer

10/5/08 CSC309 Miller 16

Setting the Stage (Cont.)

4. The market dried up due to too much production and this resulted in cutbacks in operators, supervisors and maintenance.

5. A weak union complained and pointed out problems (but it was a weak union).

6. Workers died and newspaper articles pointed out the potential for disaster.

Page 17: 10/5/08CSC309 Miller1 Ch4 Can we Trust the Computer

10/5/08 CSC309 Miller 17

Setting the Stage (Cont.)

7. A three-member safety team from the main office in Connecticut identified all the problems that later led to the disaster but nothing was done.

8. There were six buses allocated to evacuate residents in case of an accident and one didn't have a flat.

Page 18: 10/5/08CSC309 Miller1 Ch4 Can we Trust the Computer

10/5/08 CSC309 Miller 18

How did Bhopal happen?

1. Operators were given little or no training concerning safety and health hazards.

2. All signs concerning operating and safety procedures were in English while many operators spoke only Hindi.

3. Frequent management changes with little or no training for management. Management, at the time of the accident, had transferred in from a battery plant.

Page 19: 10/5/08CSC309 Miller1 Ch4 Can we Trust the Computer

10/5/08 CSC309 Miller 19

How did Bhopal happen?

4. The pressure gauge that would have alerted control room operators that there was a problem was not readable from the control room.

5. Gauges in the control room were considered unreliable.

6. The critical gauge was reading low (20 psi was reported as 2 psi) and the problem was identified only when the operators were gassed.

Page 20: 10/5/08CSC309 Miller1 Ch4 Can we Trust the Computer

10/5/08 CSC309 Miller 20

How?

7. Since operators didn't have gas masks they had to flee instead of staying and helping solve the problem.

8. Because of the late identification of the problem the major safety device was activated too late.

9. The flare tower which when ignited would burn off the gas was not operational.

Page 21: 10/5/08CSC309 Miller1 Ch4 Can we Trust the Computer

10/5/08 CSC309 Miller 21

How?

10. There was not much of a plan for alerting the city (an alarm was sounded a couple of hours after the accident) or for treating victims (Doctors took 4 hours to arrive.)

Page 22: 10/5/08CSC309 Miller1 Ch4 Can we Trust the Computer

10/5/08 CSC309 Miller 22

Military Problems1. President Bush (2/15/91) "Patriot is forty-one for forty-two. Then the estimate was twenty-four for eighty-five, then 10%, and finally maybe we didn't hit any. A software bug was discovered that could have caused the missile to turn back and dive into the ground. The Patriot was designed for aircraft intercept.

Patriot clock drift over 100 hour period resulted in a 678 meter tracking error. Problems included design for 14-hr missions, clock precision, operational conditions, inadequate risk analysis and use of both a 24-bit and 48-bit version of .1. Final totals were 29 dead and 97 injured.

Page 23: 10/5/08CSC309 Miller1 Ch4 Can we Trust the Computer

10/5/08 CSC309 Miller 23

Military Problems

2. Vincennes missiles take down Iranian 655 Airbus with loss of 290 people. Probable crew error in that the plane was not properly identified or was it determined properly if it was ascending or descending. Crew probably spooked by an earlier incident with a gun boat. The tests of this system, which was designed for use on battleships, had been conducted in a corn field.

3. In the 86 raid on Libya we lost a F-111 to our own jamming which also prevented hitting certain targets.

Page 24: 10/5/08CSC309 Miller1 Ch4 Can we Trust the Computer

10/5/08 CSC309 Miller 24

More Examples4. Two Royal Air Force collided killing four crewmenwhen the identical pre-programmed cassettes that controlled their on-board computers put them at the same location at the same instant.

5. Falklands War Sea Dart missile shot down the wrong helicopter. We did the same during Desert Storm.

6. During the breakup of Russia a NASA probe was mistaken for a missile attack.

7. We have failed to identify properly a flock of swans, a satellite breaking up, a forest fire, etc.

Page 25: 10/5/08CSC309 Miller1 Ch4 Can we Trust the Computer

10/5/08 CSC309 Miller 25

More Bugs8. Fire the missile in the wrong direction (Russiansdid it also).

9. Simulator bugs caused a virtual F-16 to flip overwhen it crossed the equator and to continue flying up side down because the program deadlocked onwhether to roll left or right.

10. Scottish trawler sank when run into by submarine whose equipment said that the trawler was three miles away.

Page 26: 10/5/08CSC309 Miller1 Ch4 Can we Trust the Computer

10/5/08 CSC309 Miller 26

Current Military Applications

1. Robot surgeon currently being designed to performbattle field surgery. Here a surgeon located miles away would use video and a robotic device controlled by radio waves to deal with emergency conditions. (Surely the bad guys would not do anything that might jam or corrupt the signals to the robot.)

2. AI applications where the computer selects the final target. We have a case on record where the exhaust fan in a bathroom facility during Desert Storm was selected even though it was identified as "low priority."

Page 27: 10/5/08CSC309 Miller1 Ch4 Can we Trust the Computer

12/2/08 CSC309 Miller 27

Ethics for Battlefield Robots

Noel Sharkey noted in an interview that there is "a headlong rush" to develop battlefield drones capable of making their own decisions about when to strike but that lethal autonomous robots should be prohibited until they can successfully demonstrate ethical behavior, a standard he doubted could be met.

A Soldier, Taking Orders From Its Ethical Judgment Center

New York Times (11/25/08) P. D1; Dean, Cornelia

Page 28: 10/5/08CSC309 Miller1 Ch4 Can we Trust the Computer

12/2/08 CSC309 Miller 28

Ethics for Battlefield Robots

Georgia Tech computer scientist Ronald Arkin thinks smart machines can behave more ethically than humans in battlefield conditions, in part because they could programmed without a self-preservation instinct, thus eliminating the danger that they will attack out of fear. He also wrote that the machines can be built to exhibit no recklessness or anger and will not fall prey to "the psychological problem of 'scenario fulfillment,'" which prompts people to digest new information more easily if it conforms with their pre-existing ideas.

A Soldier, Taking Orders From Its Ethical Judgment Center

New York Times (11/25/08) P. D1; Dean, Cornelia

Page 29: 10/5/08CSC309 Miller1 Ch4 Can we Trust the Computer

10/5/08 CSC309 Miller 29

Terminology

The term Mishap is used to denote an unplanned event or series of events that result in can cause death, injury, occupational illness, or damage to or loss of equipment or property. Mishap is a slightly broader term than"accident" which has traditionally defined by safety engineers as 'an unwanted and unexpected release of energy.'

Page 30: 10/5/08CSC309 Miller1 Ch4 Can we Trust the Computer

10/5/08 CSC309 Miller 30

*Causes of Mishaps*1. Mishaps are almost always caused by multiple factors where the relative contribution of each factor is usually not clear. A mishap can be thought of as a set of events combining in a random fashion.

2. The failure of safety devices is often a contributing factor (sometimes the causal factor) of mishaps. An example would be Redundancy where the lack of synchronization causes a shut down.

3. Mishaps often involve problems in subsystem interfaces. It is often easier to deal with the failure of components than failures in the interfaces between those components.

Page 31: 10/5/08CSC309 Miller1 Ch4 Can we Trust the Computer

10/5/08 CSC309 Miller 31

Why Replace Hardware with Software?

1. Software makes it practical to build more logic into a system.

2. Software is less expensive than hard-wired logic.

3. Software-controlled systems can perform more checking.

4. Easier to change logic. (modifications to a program)

5. Easier to make changes. (change a controller on Hubble)

6. Provide more information in a more meaningful form.

7. Uses less space and power.

Page 32: 10/5/08CSC309 Miller1 Ch4 Can we Trust the Computer

10/5/08 CSC309 Miller 32

Safety-Critical Software"As a rule software systems do not work well until they have been used, and have failed repeatedly, in real applications."

"Software exhibits weak-link behavior." With hardware there are numerous "almost right" states that still work reasonably well.

We are currently building systems in which manual intervention ("Trust the Force, Luke.") is no longer practical.

Redundancy can add to the problem.

Page 33: 10/5/08CSC309 Miller1 Ch4 Can we Trust the Computer

10/5/08 CSC309 Miller 33

Factors Associated with Failures1. Complexity of real-time, multitasking systems.

2. Small errors can cause major failures. (comma wrongcausing a failure is an example of "Software exhibiting weak-link behavior.“)

3. Failing to design and plan for unexpected inputs or conditions. (correlated failures)

4. Interaction with physical systems that don't work as anticipated.

Page 34: 10/5/08CSC309 Miller1 Ch4 Can we Trust the Computer

10/5/08 CSC309 Miller 34

Factors Associated with Failures

5. Incompatibility of hardware and software.

6. Insufficient testing. (hard to plan, hardware reused)

7. Inadequate training. (lack of professional standards)

8. Overconfidence in software.

Page 35: 10/5/08CSC309 Miller1 Ch4 Can we Trust the Computer

10/5/08 CSC309 Miller 35

Interface Design for Safety

1. If a user is ever going to be put in the situation where he needs to take control over from the computer then he must be given feedback so he understands what the computer is doing at all times.

2. The system should behave as an experienced user would expect.

3. A workload for the operator that is too low can be as dangerous as one that is too high in that it can lead to boredom and inattention that prevents the operator from taking control when the need arises.

Page 36: 10/5/08CSC309 Miller1 Ch4 Can we Trust the Computer

10/5/08 CSC309 Miller 36

Miller's Three SystemsWhen evaluating a real situation there are three systems that need to evaluated.

1. The system that is documented often times will hold the key to solving the problem and where reality and design differ is a good place to look for problems.

2. The system that is implemented is usually the one experiencing the problem. Finding out how it really works involves talking to both staff and users.

3. The system that management thinks they have often will differ greatly from 2. and 3. and becomes either the source of the problem or an impediment to getting the problem solved.

Page 37: 10/5/08CSC309 Miller1 Ch4 Can we Trust the Computer

10/5/08 CSC309 Miller 37

Don't Skimp! On programming projects there is that normalpressure to start putting code down and finish the project. This is a overly dangerous approach when we deal with safety critical software. Safety needs to be "designed in" and considerably more time and effort needs to be placed on the planning, specification, design and testing phases.

Page 38: 10/5/08CSC309 Miller1 Ch4 Can we Trust the Computer

10/5/08 CSC309 Miller 38

Don't Skimp!1. It is very important to bring everyone's expertiseinto designing and implementing a solution.

2. There is no reason to repeat the mistakes of others, so study project failures as well as projectsuccesses.

3. Use hardware as the backup for software.

Page 39: 10/5/08CSC309 Miller1 Ch4 Can we Trust the Computer

10/5/08 CSC309 Miller 39

Seven Deadly Sins of safety-critical software development

1. Use for effect: Wrong if method selected to follow trend, or management whim, or peer pressure.

2. Exaggeration: Methods depend on quality and suitability of the people applying them.

3. Too trusting: We tend to put too much faith in techniques and methods.

Page 40: 10/5/08CSC309 Miller1 Ch4 Can we Trust the Computer

10/5/08 CSC309 Miller 40

Seven Deadly Sins (Cont.)

4. Rule by a few: Don’t let the experts push their method. It will take time for others to gain experience.

5. Transitory solutions: Some techniques are short lived. Use well established techniques.

Page 41: 10/5/08CSC309 Miller1 Ch4 Can we Trust the Computer

10/5/08 CSC309 Miller 41

Seven Deadly Sins (Cont.)

6. Documentation: Too much is as bad as too little. Formal should be accompanied by informal.

7. Meandering in design: The early use of formalism in specification catches errors more quickly and reduces costs.

CACM, April 2000/vol 43, No 4. pp91-97.