A selection of the best articles about programming from the site www.viva64.com/en/pvs-studio/
Friday, September 9, 2016
Thursday, September 8, 2016
Do you sleep well, John - a serial programmer?
A bad programmer John made an error in the code, because of which each user had to spend at least 15 minutes to look for the workaround of this problem. The number of users was 10 million people. Which means, that 150 million minutes were wasted, which equals 2.5 million hours. If a person sleeps 8 hours a day, then he has just 16 hours of conscious activities. So, John destroyed 156250 man-days, which is about 427.8 man-years. The average persons lives 64 years, which means that John killed about 6.68 people.
Do you sleep well, John - a serial programmer?
Do you sleep well, John - a serial programmer?
Tuesday, September 6, 2016
Rechecking Apache HTTP Server
Apache HTTP Server project continues to develop, and so does PVS-Studio analyzer, growing even more powerful with every new version. Let's see what we've got this time.
Introduction
Apache HTTP Server is an open-source cross-platform project consisting of multiple modules. The HTTP Server kernel is written in C and developed completely by the Apache Software Foundation company. The other components were created by a number of third-party developers from the open-source community.
The project authors used Coverity to check the earlier versions of Apache HTTP Server. The recent check, however, hasn't revealed any signs of the code being analyzed by other tools. The project's code is of high quality, though PVS-Studio still managed to find a few interesting errors.
We already checked the project in 2011. For information about the bugs found during that check, see the article "Leo Tolstoy and static code analysis".
The recent analysis was done with PVS-Studio, version 6.08.
Incorrect check for an empty string
typedef struct {
....
ap_regmatch_t *re_pmatch;
apr_size_t re_nmatch;
const char **re_source;
....
} ap_expr_eval_ctx_t;
static const char *ap_expr_eval_re_backref(
ap_expr_eval_ctx_t *ctx, ....)
{
int len;
if (!ctx->re_pmatch ||
!ctx->re_source ||
*ctx->re_source == '\0' || // <=
ctx->re_nmatch < n + 1)
return "";
....
}
Diagnostic message:
V528 It is odd that pointer to 'char' type is compared with the '\0' value. Probably meant: ** ctx->re_source == '\0'. util_expr_eval.c 199
When handling pointers, programmers sometimes mix up pointers and values they point to. In the example above, the programmer forgot to dereference the pointer when checking the third subexpression in the condition. They wanted to check if the string was empty by comparing the first character of the string with the null terminator, but instead compared the pointer itself with the null character. After fixing this expression, we can see that another subexpression should be added to check if there is a pointer to the string.
The analyzer has already caught this error once, as indicated by an error description on our page with examples of errors found by the V528 diagnostic rule. Since the bug is still there, we should report it again. It can be fixed by changing the code in the following way:
if (!ctx->re_pmatch ||
!ctx->re_source ||
!*ctx->re_source ||
**ctx->re_source == '\0' ||
ctx->re_nmatch < n + 1)
return "";
Incrementing a pointer instead of the value
apr_status_t iconv_uc_conv(...., apr_size_t *res)
{
....
*res = (apr_size_t)(0);
if (data == NULL) {
*res = (apr_size_t) -1;
return APR_EBADF;
}
....
if (size < 0) {
....
if (size)
*res ++; // <=
}
....
}
Diagnostic message:
V532 Consider inspecting the statement of '*pointer++' pattern. Probably meant: '(*pointer)++'. iconv_uc.c 114
The pointer is dereferenced, but the returned value is not used. The code of the function, however, indicates that it is the value that the authors intended to work with, so the precedence of the dereferencing operation should be increased by adding a pair of parentheses: (*res) ++;.
Incorrect password clearing
int get_password(struct passwd_ctx *ctx)
{
....
if (strcmp(ctx->passwd, buf) != 0) {
ctx->errstr = "password verification error";
memset(ctx->passwd, '\0', strlen(ctx->passwd));
memset(buf, '\0', sizeof(buf));
return ERR_PWMISMATCH;
}
....
memset(buf, '\0', sizeof(buf)); // <=
return 0;
....
}
Diagnostic message:
V597 The compiler could delete the 'memset' function call, which is used to flush 'buf' buffer. The RtlSecureZeroMemory() function should be used to erase the private data. passwd_common.c 165
Any program, handling private, data must clear passwords and other critical data when they are no longer needed. In the fragment above, the programmer is trying to clear a buffer storing a password. The way they have chosen to do that seemed reliable, but the memset function can do its job properly only when the buffer is used in subsequent code after the cleanup. Otherwise, the compiler is allowed to delete the call to the memset function during the building process. As a result, the critical information that should have been deleted will remain in the memory. What will happen to this memory block and where that information will get is unknown. To clear the storage, use special functions such asRtlSecureZeroMemory() and memset_s().
Those were probably the most critical defects found in Apache HTTP Server project.
A few more errors found by this diagnostic:
- V597 The compiler could delete the 'memset' function call, which is used to flush 'x' buffer. The RtlSecureZeroMemory() function should be used to erase the private data. apr_md4.c 362
- V597 The compiler could delete the 'memset' function call, which is used to flush 'tmpbuf' buffer. The RtlSecureZeroMemory() function should be used to erase the private data. apr_md5.c 436
- V597 The compiler could delete the 'memset' function call, which is used to flush 'final' buffer. The RtlSecureZeroMemory() function should be used to erase the private data. apr_md5.c 662
Uninitialized variable
static int warrsztoastr(...., const wchar_t * arrsz, int args)
{
const apr_wchar_t *wch;
apr_size_t totlen;
apr_size_t newlen;
apr_size_t wsize;
char **env;
char *pstrs;
char *strs;
int arg;
if (args < 0) {
for (args = 1, wch = arrsz; wch[0] || wch[1]; ++wch)
if (!*wch)
++args;
}
wsize = 1 + wch - arrsz;
newlen = totlen = wsize * 3 + 1;
....
(void)apr_conv_ucs2_to_utf8(arrsz, &wsize, strs, &newlen);
....
return args;
}
Diagnostic message:
V614 Potentially uninitialized pointer 'wch' used. start.c 58
The function prepares the information necessary for converting a string from Wide Unicode into UTF-8. If the value of the args variable is negative, the number of characters in the string is unknown and needs to be counted.
Then, the value of wsize is computed based on the address of the string's last character, stored in thewch variable, and the address of the string's first character, stored in arrsz. The wsize variable is used to create a buffer for the new string. The wch variable is initialized inside a loop that executes only if the value of args is negative. Otherwise, the variable won't be initialized, which will lead to undefined behavior as the buffer size will be computed incorrectly.
As for now, the function is used only once, with the value of args being -1. This would have let the error stay unnoticed for a long time until someone passed a positive value for args. I don't know what the authors wanted the function to do in such a situation. It is strange, to say the least, that this function receives as an argument the same value that it returns, while the presence of the conditional statement before it makes its execution absolutely pointless when args is a positive value.
Suspicious expression
static int is_quoted_pair(const char *s)
{
int res = -1;
int c;
if (((s + 1) != NULL) && (*s == '\\')) { // <=
c = (int) *(s + 1);
if (apr_isascii(c)) {
res = 1;
}
}
return (res);
}
Diagnostic message:
V694 The condition ((s + 1) != ((void *) 0)) is only false if there is pointer overflow which is undefined behaviour anyway. mod_mime.c 531
Quite a strange condition. The first expression can be false only when adding up a pointer with one results in an overflow. And a pointer overflow is undefined behavior, so this code is incorrect anyway.
Incorrect check of HRESULT
#define SHSTDAPI EXTERN_C DECLSPEC_IMPORT HRESULT STDAPICALLTYPE
SHSTDAPI SHGetMalloc(_Outptr_ IMalloc **ppMalloc);
CoGetMalloc(MEMCTX_TASK,ppMalloc)
LRESULT CALLBACK ConnectDlgProc(....)
{
....
if (SHGetMalloc(&pMalloc)) { // <=
pMalloc->lpVtbl->Free(pMalloc, il);
pMalloc->lpVtbl->Release(pMalloc);
}
....
}
Diagnostic message:
V545 Such conditional expression of 'if' operator is incorrect for the HRESULT type value 'SHGetMalloc(& pMalloc)'. The SUCCEEDED or FAILED macro should be used instead. apachemonitor.c 915
SHGetMalloc is a system function that returns a result of type HRESULT. HRESULT is a 32-bit valuelogically divided into three fields. You can't use it as a value of bool type; instead, use the SUCCEEDEDmacro.
Superfluous operation?
static const char *process_resource_config_fnmatch(....)
{
apr_status_t rv;
....
rv = apr_dir_open(&dirp, path, ptemp);
if (rv != APR_SUCCESS) {
return apr_psprintf(p,
"Could not open config directory %s: %pm",
path, &rv);
}
candidates = apr_array_make(ptemp, 1, sizeof(fnames));
while (apr_dir_read(....) == APR_SUCCESS) {
....
if (rest && (rv == APR_SUCCESS) && // <=
(dirent.filetype != APR_DIR)) {
continue;
}
fnew = (fnames *) apr_array_push(candidates);
fnew->fname = full_path;
}
....
}
Diagnostic message:
V560 A part of conditional expression is always true: (rv == 0). config.c 2029
The analyzer found a redundant check inside the condition. It may seem just unnecessary code at first, but if you look closer, you'll see that the loop wouldn't start if the check of the rv variable were true. It's also not clear why the programmer uses the value resulting from the previous operations if it's not used elsewhere in the loop body.
The code logic suggests that the rv = apr_dir_open(...) function should be used before the condition: then the check of the rv variable would make sense. Perhaps I'm wrong and it's just a superfluous check, but I do advise the authors to examine this code and fix the error if there's one.
Two more errors of this kind:
- V560 A part of conditional expression is always true: status == 0. mod_ident.c 217 (project mod_ident)
- V560 A part of conditional expression is always true: j == 0. mod_ident.c 217 (project mod_ident)
Redundant condition
static int uldap_connection_init(....)
{
....
if (ldc->ChaseReferrals==AP_LDAP_CHASEREFERRALS_ON){
if ((ldc->ReferralHopLimit != AP_LDAP_HOPLIMIT_UNSET) &&
ldc->ChaseReferrals == AP_LDAP_CHASEREFERRALS_ON) {
....
}
}
....
}
Diagnostic message:
V571 Recurring check. The 'ldc->ChaseReferrals == 1' condition was already verified in line 399. util_ldap.c 400
This example contains a redundant condition: there is no need to check the same expression both in the inner and outer conditional statements, as the inner statement can be executed only when the conditions of the outer one are true. The entire code within these statements requires that all the conditions in both if statements should be checked, so a better way would be to leave the outer statement out and amend the expression of the inner one to keep the checks in the same order.
if (ldc->ChaseReferrals == AP_LDAP_CHASEREFERRALS_ON &&
(ldc->ReferralHopLimit != AP_LDAP_HOPLIMIT_UNSET)) {
....
}
Incorrect pragma directive
#ifdef _MSC_VER
#pragma warning(disable: 4032)
#include <conio.h>
#pragma warning(default: 4032)
#else
#include <conio.h>
#endif
Diagnostic message:
V665 Possibly, the usage of '#pragma warning(default: X)' is incorrect in this context. The '#pragma warning(push/pop)' should be used instead. Check lines: 38, 40. apr_getpass.c 40
In the code above, the authors set a directive to its default value instead of the value it had before. This is a bad approach. Instead, save the previously used value using the #pragma warning(push) directive and then return it with the help of #pragma warning(pop):
#ifdef _MSC_VER
#pragma warning(push)
#pragma warning(disable: 4032)
#include <conio.h>
#pragma warning(pop)
#else
#include <conio.h>
#endif
Conclusion
The defects we have found in this application prove that even the most high-quality and well-tested projects are likely to contain bugs. Static analysis should be applied regularly; one-time checks are not enough. No matter how good you are at programming, typos and other defects are inevitable. PVS-Studio analyzer will allow you to catch errors and defects before they have a chance to slip into the release and cause trouble. We encourage you to download and try the analyzer on your own projects.
Monday, September 5, 2016
Type Conversion in C++ and C# Arithmetic Expressions
In arithmetic expressions, the types of operands can be converted to a common type. Such conversions are described in the language standard, and in C# they are much simpler than in C++. However, I'm not sure that many programmers know all the details.
Perhaps you had situations when the type of an arithmetic expression turned out to be something different from what you had expected. How well do you know the language standard? Test yourself by replacing auto and var with appropriate types in the expressions below and evaluating these expressions:
C++ (we're assuming that an LP64 data model is used):
void Test()
{
unsigned char c1 = std::numeric_limits<unsigned char>::max();
unsigned char c2 = std::numeric_limits<unsigned char>::max();
int i1 = std::numeric_limits<int>::max();
int i2 = std::numeric_limits<int>::max();
unsigned int u1 = std::numeric_limits<unsigned int>::max();
auto x = c1 + c2;
auto y = i1 + i2;
auto z = i1 + u1;
}
C#:
void Test()
{
byte b1 = byte.MaxValue;
byte b2 = byte.MaxValue;
int i1 = int.MaxValue;
int i2 = int.MaxValue;
uint u1 = uint.MaxValue;
var x = b1 + b2;
var y = i1 + i2;
var z = i1 + u1;
}
The answer is below the picture
C++ (LP64):
int x = c1 + c2; // = 510
int y = i1 + i2; // = -2
unsigned int z = i1 + u1; // = 2147483646
C#:
int x = b1 + b2; // = 510
int y = i1 + i2; // = -2
long z = i1 + u1; // = 6442450942
Here is what follows from this test - or rather the C++ and C# standards:
1. Evaluation of x. In an arithmetic expression, all the variables whose values can be represented with type int will be converted to this type, so when adding two variables of type char, unsigned char , short int , or unsigned short int in C++, or variables of type byte, sbyte, short, or ushort in C#, the resulting value will be of type int and no overflow will occur. In our examples, the x variable will take the value 510.
2. Evaluation of . If both arguments are of type int, no further type promotion will take place and an overflow is possible. In C++, an overflow leads to undefined behavior. In C#, the application will continue running by default. You can use the checked keyword or /checked compiler switch to change its behavior so that it raises an OverflowException in the case of an overflow. In our test, the y variable will take the value -2 both in C++ and C#. However, remember that in C++ we'll be dealing with undefined behavior, which may manifest itself in any way - for example writing the number 100500 to y or ending up with a stack overflow.
3. Evaluation of . The situation when one of the arguments is of type int and the other is of typeunsigned int in C++ or uint in C# is handled differently by each standard! In C++, both arguments will be converted to type unsigned int . By the way, if an overflow occurs, it wouldn't be an undefined behavior. In C#, both arguments will be converted to type long and no overflow will be ever possible. It is the reason why we got different values for the z variable in our programs in different languages.
Now let's see what errors can be found in code written without taking type-conversion rules into account.
C++ example:
typedef unsigned int Ipp32u;
typedef signed int Ipp32s;
Ipp32u m_iCurrMBIndex;
VC1EncoderMBInfo* VC1EncoderMBs::GetPevMBInfo(Ipp32s x, Ipp32s y)
{
Ipp32s row = (y > 0) ? m_iPrevRowIndex : m_iCurrRowIndex;
return ((m_iCurrMBIndex - x < 0 || row < 0)
? 0 : &m_MBInfo[row][m_iCurrMBIndex - x]);
}
This code fragment is taken from IPP Samples project. When comparing an expression result with zero, one should keep in mind that int may be cast to unsigned int , and long to unsigned long . In our case, the result of the m_iCurrMBIndex - x expression will be of type unsigned int , so it is always nonnegative - PVS-Studio will warn you about this issue: V547 Expression 'm_iCurrMBIndex - x < 0' is always false. Unsigned type value is never < 0.
C# example:
public int Next(int minValue, int maxValue)
{
long num = maxValue - minValue;
if (num <= 0x7fffffffL)
{
return (((int)(this.Sample() * num)) + minValue);
}
return (((int)((long)(this.GetSampleForLargeRange() * num)))
+ minValue);
}
This sample is taken from SpaceEngineers project. In C#, you should always keep in mind that when adding two variables of type int, their type will never be promoted to long, unlike the situation when you add a variable of type int and a variable of type uint. Therefore, what will be written to the num variable is an int value, which always meets the num <= 0x7fffffffL condition. PVS-Studio knows about this issue and generates the message V3022 Expression 'num <= 0x7fffffffL' is always true.
It's great when you know the standard and know how to avoid errors like those discussed above, but in real life remembering all the intricacies of language is difficult - and totally impossible in the case of C++. And here's where static analyzers like PVS-Studio may be of help.
Labels:
article,
cplusplus,
cpp,
csharp,
developer,
developers,
opensource,
pvsstudio,
tutorial
Friday, September 2, 2016
A space error: 370.000.000 $ for an integer overflow
Start. 37 seconds of flight. KaBOOM! 10 years and 7 billion dollars are turning into dust.
Four satellites, 2,600 lb, of the Cluster scientific program (study of the solar radiation and Earth's magnetic field interaction) and a heavy-lift launch vehicle Ariane 5 turned into "confetti" June 4, 1996.
The programmers were to blame for everything.
The previous model-rocket Ariane 4 has been successfully launched more than 100 times. What could go wrong?
Apparently, to conquer space, one should know Ada language well.
Dossier
Ariane 5 is a European expendable heavy lift launch vehicle that is part of the Ariane rocket family. It is used to deliver payloads into geostationary transfer orbit (GTO) or low Earth orbit (LEO), can launch two-three satellites, and up to eight micro satellites at a time.
The project history
It was created in 1984-1995 by a European Space Agency (EKA, ESA), the main developer - French Centre National d'Etudes Spatiales (CNES). The program participants' were 10 European countries, the project cost was 7 billion US dollars (46.2% - contribution of France).
About a thousand industrial firms took part in the creation of the rocket. The prime contractor is a European company, Airbus Defence and Space (Airbus Group unit, "Airbus Group", Paris). The marketing for Ariane 5 was done by a French company, Arianespace (Evry), with which ESA signed an agreement November 25, 1997.
Vehicle description
Ariane 5 is a two-stage heavy class booster rocket. Length — 52-53 m, maximum diameter — 5.4 m, starting weight: 775-780 tonnes (depending on the configuration).
The first stage is equipped with a liquid rocket engine Vulcain 2 ("Volcano-2"; the first three versions of the missile were made of Vulcain), and the second is HM7B (for the version of Ariane 5 ECA) or Aestus (for Ariane 5 ES). Vulcain 2 and HM7B engines run on a mixture of hydrogen and oxygen, and are manufactured by a French company Snecma (a part of "Safran" group, Paris)
Aestus uses non volatile fuel - a mixture of the MMH propellants with Nitrogen tetroxide oxidizer. The engine was developed by a German company Daimler Chrysler Aerospace AG (DASA, Munich).
In addition, attached to the sides were two solid rocket booster accelerators (manufacturer-Europropulsion, Suresnes, France; a joint venture between Safran Group and the Italian company Avio), which provide more than 90% of torque starting at the beginning, delivering 90% of the thrust during the first launch phases. In the version of the Ariane 5 ES, the second stage may not be available when outputting the payloads into low anchor orbit.
On-board computers
www.ruag.com/space/products/digital-electronics-for-satellites-launchers/on-board-computers
Investigation
www.ruag.com/space/products/digital-electronics-for-satellites-launchers/on-board-computers
Investigation
The day after the catastrophe, the General Director of the European Space Agency (ESA), and Chairman of the French National Centre for space research (CNES) issued a decree on the formation of an independent Commission to investigate the circumstances and causes of this emergency, which included well-known experts and scholars from all interested European countries.
The Commission began its work on June 13, 1996 and on 19 July they released its exhaustive report (PDF), which immediately became available on the net.
The Commission had telemetry data, trajectory data, as well as recorded optical observations of the course of the flight.
The explosion occurred at an altitude of approximately 4 km, and the debris was scattered over an area of about 12 square km in the savanna and the surrounding swamps. The Comission studied the testimonies of numerous specialists and examined the production and operational documentation.
Technical details of the accident
The position and orientation of the booster in space were measured by an Inertial Reference Systems — IRS, a part of which is a built-in computer, which evaluates the angles and speeds based on the information provided by the onboard Inertial Platform, equipped with laser gyroscopes and accelerometers. The data from IRS were passed by a special bus for the onboard computer, which provided the necessary information for the implementation of the flight program and managed directly - through the hydraulic and servo mechanism - the solid booster accelerators and cryogenic engines.
Duplication of the equipment was used to ensure the reliability of Flight Control Systems. Therefore, two IRS systems (one - active and the other is its hot standby) with identical hardware and software were operating in parallel. As soon as the onboard computer detected that the "active" IRS withdrew from a regular mode, it immediately switched to another. There were also two on-board computers.
Significant phases of development process
7 minutes before the scheduled launch there was detected an infringement of "visibility criterion". Therefore, the start was postponed by an hour.
LT (Launch Time) = 9 o'clock. At 33 min. 59 sec. local time, the "launch window" was "caught" again and finally, the vehicle launched and was running in a normal mode until LT+37 seconds.
In the following several seconds there was a dramatic deviation from the given missile trajectory that ended in an explosion.
At LT+39 seconds, because of high aerodynamic load due to the "angle of attack" exceeding 20 degrees, the starting accelerators separated from its main stage, which triggered the missile Autodestruct System.
The change of the angle of attack happened because of a malfunction in the nozzle rotation of the solid accelerators, which was caused by a command from an on-board computer based on the information from the active Navigation System (IRS 2).
Some of this information was incorrect in principle: what has been interpreted as flight details was actually diagnostic information from the IRS 2 firmware.
The built-in computer IRS2 passed incorrect data, because it diagnosed a contingency, having "caught" an exception that was thrown by one of the software modules.
At the same time the on-board computer could not switch to the backup system IRS 1 because it had already ceased to function during the previous cycle (which took 72 milliseconds) - for the same reason as the IRS 2.
An exception "thrown" by an IRS program, resulted from the conversion of data from a 64-bit floating point format to a 16-bit signed integer, which led to "Operand Error".
The error occurred in a component that is meant only for performing "adjustment" of the Inertial Platform. This software module generates significant results only until the moment LT+7 seconds of the detachment from the launch pad. After the rocket soared up, the module could no longer affect the module.
"The adjustment function" had to be active (according to the established results) for 50 seconds after the initiation of the "flight mode" on the Navigation System bus (the moment LT-3 seconds), was performed.
The "Operand Error" occurred because of an unexpectedly large magnitude of BH (Horizontal Bias — a horizontal skew), evaluated by the internal function based on the value of "horizontal speed" measured by the Platform sensors.
The BH magnitude served as an indicator of the precision of the Platform positioning. The BH magnitude turned out to be much greater than it was expected, because the trajectory of the Ariane 5 at the early stage was significantly different from the flight path of the Ariane 4 (where this software module was previously used), which led to a much higher "horizontal velocity".
The final action that had fatal consequences was the processor work termination. Thus, the whole Navigation System ceased to function. It was technically impossible to resume its actions.
The researchers were able to reproduce this chain of events using computer modeling, combined with other research materials and experiments this allowed them to conclude that the causes and the circumstances of the accident are fully identified.
The causes and origins of the accident
The initial requirement to continue the adjustment after the rocket takeoff, was embedded for more than 10 years before the fateful events, when they designed the early Ariane models.
The flight could be cancelled just several seconds before the flight, for example, in the interval of LT-9, for example, when the IRS started the "flight mode", and LT-5 seconds, when there was a command to perform several operations with the rocket equipment.
In the case of an unexpected cancellation of the takeoff, it was necessary to quickly return to the countdown mode - and not to repeat all the installation operations from the beginning, including the bringing of the Inertial Platform (an operation, requiring 45 min. - the time when the "launch window" would be lost).
It was stated that in case the launch was cancelled, 50 seconds after the LT-9 would be enough for the equipment on the Earth to regain full control over the Inertial Platform without data loss - the Platform could stop the transference that was initiated and the corresponding software module would register all the information about its condition, which will help to return to the original position (in case the rocket is still on the launch pad). Once, in 1989, during start number 33 of the Ariane 4 rocket, this peculiarity was successfully activated.
However, the Ariane 5, in contrast to the previous model had a fundamentally different scenario of pre-flight actions — so different that the work of the fateful software module after the launch time made no sense at all. However, the module was used again without any modifications.
ADA Language
The investigation revealed that this software module contained seven variables involved in type conversion operations. It turned out that the developers performed the analysis for the vulnerability of all operations, capable of throwing an exception.
It was their conscious action - to add adequate protection to four variables, and leave three of them - including BH - unprotected. The ground for this decision was the certainty that overflow is not possible in these variables in general.
This confidence was supported by the evaluations, showing that the expected range of physical parameters that was taken as the basis for the determination of the values of the mentioned variables can never lead to an undesirable situation. And it was true — but for the trajectory evaluated for Ariane 4.
The new generation Ariane 5 rocket launched on an entirely different trajectory, for which no evaluations were carried out. Meanwhile, it turned out that the "horizontal velocity" (together with the initial acceleration) exceeded the estimated (for Ariane 4) more than five times.
The protection of all 7 (including BH) variables wasn't provided because the maximum workload for the IRS computer was declared as 80%. The developers had to look for ways to reduce unnecessary evaluation expenses, and they weakened the protection in that fragment where theoretically the accident could not happen. When it occurred, then the exception handling mechanism was activated, which turned out to be completely inadequate.
This mechanism supposes three main steps.
- The information about the contingency should be transmitted via the bus to the onboard computer OBC.
- In parallel it was written - together with the whole context - to the reprogramming memory EEPROM (during the investigation it was possible to restore it and read the contents)
- The work of IRS processor should have been aborted.
The last action was a fatal one; it led to the accident despite the fact that the situation was quite normal (even though there was an exception generated due to unsecured overflow).
Conclusion
The defect on the Ariane 5was the result of several factors. There were many stages during development and testing when the defect could have been detected.
- The programming module was reused in a new environment where the conditions of functioning were significantly different from the requirements of the program module. These requirements have not been revised.
- The system identified and detected an error. Unfortunately, the specification of the error-handling mechanism was inappropriate and caused the final destruction.
- The erroneous module was never properly tested in the new environment - neither the hardware, nor the level of system integration. Therefore, the flaws in the development and implementation were not detected.
From the report of the commission:
The main task during the development of Ariane 5 was the reducing of the occasional accident. The exception thrown was not a random accident, but an error in the structure. The exception was detected, but handled incorrectly, because of the point of view that a program should be considered correct, until the opposite is shown. The Commission holds the opposite view, that the software should be considered erroneous, until the best practical current methods demonstrate its correctness.
Happy ending
Despite this failure, there were 4 more satellites, Cluster II built and put into orbit on the rocket Soyuz-U/Fregat in the year 2000.
This accident attracted the attention of the public, politicians, and the heads of organizations to the high risks connected with the usage of complex computational systems, which increased investment into research aimed at improving the reliability of life-critical systems. The following automatic analysis of the Ariane code (written in Ada) was the first case when the static analysis was used in the scope of a large project using the abstract interpretation technique.
Sources
- Report Ariane 501 — Presentation of Inquiry Board report
- Telles, Matt The Science of Debugging
- Class 25: Software Disasters
- Ariane 5 - Chronicle of a failure
- ARIANE 5 — The Software Reliability Verification Process
- Safety in Software — now more important than ever
- Static Analysis and Verification of Aerospace Software by Abstract Interpretation
- ADA source code
This article was originally published (in Russian) at the website habrahabr.ru. The article was translated and published at our blog by the author's permission.
By Aleksey Statsenko
Labels:
Ariane,
Arianespace,
article,
bugs,
CNES,
codeanalysis,
ESA,
programming,
ЕКА
Subscribe to:
Posts (Atom)