Monday, November 21, 2016

About the danger of programming errors

What is an error? According to Wikipedia: unintentional deviation from right actions, deeds and thoughts; the difference between the expected or measured and real value. We make errors every day. Some bring inconvenience only to us; others can have more serious consequences. This article provides facts about programming errors that could have been avoided if the code analysis was done more correctly.

About the human factor

A human brain is a sphere that is not yet explored to the end. There are a lot of books and articles written on the topic of its capabilities, but the majority of scientists agree on one thing - we aren't using 100% of our abilities. A human being isn't just logic, erudition, intelligence, but also feelings, emotions and upbringing. Even the most highly qualified specialist with the IQ above 140 (the average level is 100-120) can get tired, get upset or just be inattentive. The result of this concourse of circumstances could be a mistake.
Programmers are very pedantic people, thorough and definitely very smart. But still, when writing the code, they make mistakes. A lot of these errors get detected thanks to the -Wall, asserts, tests, meticulous code review, IDE warnings, building the project by different compilers for different OS, working on different hardware and so on. But even with all these measures, the errors often get unnoticed.
A person who is not connected with programming in any way may think: there is nothing critical in a program error! When a surgeon makes a mistake during the operation - that is dangerous, but an incorrectly placed symbol is nothing to worry about. That's when a person is drastically wrong. I'll provide some examples here, so that you can feel the importance of flawless code.

About money

Four satellites, 2,600 lb, of the Cluster scientific program (study of the solar radiation and Earth's magnetic field interaction) and a european heavy-lift launch vehicle Ariane 5,used to deliver payloads into geostationary transfer orbit (GTO), turned into "confetti" June 4, 1996. This accident attracted attention of the publicity, politicians and heads of responsible organizations.

Conclusion of the commission:
The investigation showed that of the key reasons of the accidents was the software module, which Ariane 5 for from the previous models. Ariane 5, in contrast to the previous model, had a fundamentally different scenario of the pre-flight actions — so different that the work of the fateful software module after the launch time had no sense at all. The module was not modified for the Ariane 5, so the analysis of all operations carried out by the developers, didn't protect the missile carrier from the crush.
later on there were also other issues found, that could have been avoided by doing a more thorough analysis of the launcher software.
The price of such carelessness: 370.000.000 $. Consequences: increase of the investment into the research aimed at the reliability improvement of the systems with special safety requirements. The following automatic analysis of Ariane code (written in Ada) was the first case when the static analysis was used in the scope of a large project using the technique of abstract interpretation.

About the human toll

Therac-25 radiation therapy machine, a medical accelerator. The Canadian Government Organization "Atomic Energy of Canada Limited" released three versions: Therac-6 and Therac-20, Therac-25. 6 and 20 were produced in conjunction with the French company CGR.
The programming code in Therac-20 was based on the code of Therac-6. All three machines had the PDP-11 computer installed. The previous models didn't require it, as they were designed as stand-alone devices. The radiotherapy technician set up various options manually, including the position of the rotating disk to configure the operating mode of the machine.

The Therac-6 and 20 hardware locking mechanisms did not allow the operator to do something dangerous, say, choose a high power electron beam without the x-ray targets.
In the Therac-25 the hardware protection has been removed and the security functions were all given to software. Gradual but inconsistent implementation of improvements in software have led to fatal mistakes. From June 1985 till January 1987 this machine caused six radiation overdoses, some patients got the doses of several thousand rads (a typical therapeutic radiation dose is up to 200 rads, 1000 rads is a lethal dose). At least two died directly from the radiation overdoses.
In the Therac-25 software there were found at least four errors that could lead to overexposure to radiation.
During the investigation it became clear that the software was tested with a minimum number of tests on the simulator, but the majority of time the system was tested as a whole. Thus, the module testing was disregarded, and only integration testing was done.
I think that now you will probably agree that the price of an error is sometimes intolerably high.

When in doubts - trust the program.

A programmer can improve the coding skills, can become a real professional. But even in this case, the error cannot be excluded. The examples provided above show that "trusting to luck" is dangerous, that's why programmers act as cautiously as possible: use a large number of methods and tools helping to control the code quality. One of the tools of this direction is static analysis. These tools help to detect a lot of errors in the source code of the programs written in various programming languages. Tools of this kind analyze the code and generate a report, that helps a programmer find and eliminate the errors.
The best way to show the benefits of such a product is to demonstrate its abilities by checking open-source projects. For example, there were already more than 10000 bugs detected with the help of PVS-Studio static analyzer. You may find them all here: http://www.viva64.com/en/examples/.
Yes, you can program without any additional help of the analyzers. You can check the code yourself, ask you colleagues to recheck your code. But do not forget that the programmer is just a human being, first and foremost. Using a static code analyzer to check the project isn't a sign of unprofessionalism. On the contrary, it is the desire to make the results of our work maximumly close to the ideal. If the error is detected on the stage of the development, only you will know that it was there, otherwise your blunder can become a ground for an article "The dumbest bugs of the decade".

You may find the full versions of the articles, the abstracts from which were used to write this one, here:

No comments:

Post a Comment