Thursday, May 4, 2017

Checking the code of Valgrind dynamic analyzer by a static analyzer

I should say right away, that the article was not written to show that static analysis works better than dynamic. This statement would be incorrect, as well as the reverse idea. The tools of static and dynamic analysis complement each other, they do not compete with each other. Both of these methods have strengths and weaknesses. Some errors cannot be detected by dynamic analyzers, some - by static ones. That's why I suggest treating this post as another demonstration of the abilities of PVS-Studio, not the comparison of two methodologies.
Picture 1

The methodologies of dynamic and static analysis

The source code contains some hints that can help you detect errors. Let's look at a simple example:


char *str = foo();
if (str == '\0')
It is strange to compare the pointer not with nullptr, NULL or at least with 0, but with a character literal '\0'. On the basis of this strange thing, a static analyzer can assume that they wanted to check not the fact that the pointer is 0, but that the string is empty, i.e. there was an intention to check if there is a terminal null in the beginning of the string, but a programmer accidentally forgot to dereference the pointer. Most likely, this will really be an error and the correct code will be as follows:
char *str = foo();
if (*str == '\0')
This information gets lost during the compilation and the dynamic analyzer is not able to detect this bug. From the point of view of the dynamic analyzer, the pointer is verified against NULL, so there is nothing to worry about.
Another weakness of a dynamic analyzer is in the necessity to execute the code containing an error. Sometimes it is very hard to do for a big number of code fragments. I'll explain this using a code fragment taken from a real application:
ADOConnection* piTmpConnection = NULL;
hr = CoCreateInstance(
              CLSID_DataLinks,
              NULL,
              CLSCTX_INPROC_SERVER, 
              IID_IDataSourceLocator,
              (void**)&dlPrompt
              );
if( FAILED( hr ) )
{
  piTmpConnection->Release();
  dlPrompt->Release( );
  return connstr;
}
If the function CoCreateInstance was executed with an error, then we'll get the dereference of the piTmpConnection null pointer. In fact, the string piTmpConnection->Release(); is just redundant here, because no connection was created here.
It could be rather troublesome to detect such a situation with the help of a dynamic analyzer, as we'll have to simulate a situation when the function CoCreateInstance returns the error status. It's not easy to do this.
Theoretically, the static analyzer has information about the code, and therefore is capable of finding more bugs than a dynamic analyzer. In practice, the possibility of static analyzers are limited by the available memory and the acceptable time of work. In other words, a static analyzer can consider how code will work under all possible variants of input data. But it will take a little less than 150 years on the cluster, where it will need an incredible amount of installed memory.
As a result, in practice, static analyzers aren't able to detect a lot of error types. For example, they do not notice leaks, if the pointer is passed between a lot of functions. In turn, the dynamic analyzers greatly cope with such tasks, regardless of the code complexity.

The analysis results

We regularly check various projects to spread the word about static analysis methodology in general, and about our tool PVS-Studio in particular, so I couldn't miss a chance to check the Valgrind project. It is a kind of a challenge to us to find errors in it. This is a high-quality, well-tested project that is already being checked by Coverity. In general, I am sure that this code was checked by enthusiasts and various tools. Even several errors found would be a great success.
Let's see if there was anything interesting that PVS-Studio managed to find in the code of Valgrind.
static void lk_fini(Int exitcode)
{
  ....
  VG_(umsg)("  taken:         %'llu (%.0f%%)\n",
            taken_Jccs, taken_Jccs * 100.0 / total_Jccs ?: 1);
  ....
}
PVS-Studio warning: V502 Perhaps the '?:' operator works in a different way than it was expected. The '?:' operator has a lower priority than the '/' operator. lk_main.c 1014
The operator ?: is very treacherous and it should be used very carefully. I have discussed this topic in the fourth chapter of my small e-book, which I recommend to have a look at. Let's see why this code is suspicious.
I think the programmer wanted to protect himself from dividing by zero. That's why, if the variable total_Jccs is 0, then the division should be by 1. The code was intended to work like this:
taken_Jccs * 100.0 / (total_Jccs ?: 1)
However, the precedence of the ?: operator is lower than of the division and multiplication operators. Therefore, the expression is evaluated as follows:
(taken_Jccs * 100.0 / total_Jccs) ?: 1
However, perhaps the code works exactly as intended. Even if it so, it's better to add brackets, so that other programmers don't get confused in the future, if there is an error here or not.
static Bool doHelperCall (....)
{
  ....
  UInt nVECRETs = 0;
  ....
  vassert(nVECRETs ==
           (retTy == Ity_V128 || retTy == Ity_V256) ? 1 : 0);
  ....
}
PVS-Studio warning: V502 Perhaps the '?:' operator works in a different way than it was expected. The '?:' operator has a lower priority than the '==' operator. host_arm_isel.c 795
In fact, this is an interesting case. The ?: operator is used incorrectly, but still, the code is correct.
The check was meant to work like this:
nVECRETs == ((retTy == Ity_V128 || retTy == Ity_V256) ? 1 : 0)
But it works like this:
(nVECRETs == (retTy == Ity_V128 || retTy == Ity_V256)) ? 1 : 0
The funny thing is that if you look closely, you can see that these checks are equivalent. The result will be the same.
Similar issues can be found here:
  • V502 Perhaps the '?:' operator works in a different way than it was expected. The '?:' operator has a lower priority than the '==' operator. host_arm64_isel.c 737
  • V502 Perhaps the '?:' operator works in a different way than it was expected. The '?:' operator has a lower priority than the '==' operator. host_mips_isel.c 611
typedef  ULong  DiOffT;
typedef
   struct {
      Bool   fromC;
      DiOffT off;
      SizeT  size;
      SizeT  used;
      UChar  data[];
   }
   CEnt;
static Bool is_sane_CEnt (....)
{
  ....
  CEnt* ce = img->ces[i];
  ....
  if (!(ce->size == CACHE_ENTRY_SIZE)) goto fail;
  if (!(ce->off >= 0)) goto fail;                         // <=
  if (!(ce->off + ce->used <= img->real_size)) goto fail;
  ....
}
PVS-Studio warning: V547 Expression 'ce->off >= 0' is always true. Unsigned type value is always >= 0. image.c 147
The off member is a variable of an unsigned type, which means it is always greater than or equal to zero. Thus, the condition (!(ce->off >= 0)) is always false.
static void sdel_Counts ( Counts* cts )
{
   memset(cts, 0, sizeof(Counts));
   free(cts);
}
PVS-Studio warning: V597 The compiler could delete the 'memset' function call, which is used to flush 'cts' object. The memset_s() function should be used to erase the private data. cg_merge.c 324
Perhaps, to simplify the error search in Valgrind, the memory is filled with zeros before freeing. However, in the release-version the compiler will likely to remove the call of the memset function, as the buffer is not used anymore before the call of the free function.
Similar fragments where the memory may not be zeroed:
  • V597 The compiler could delete the 'memset' function call, which is used to flush 'ffn' object. The memset_s() function should be used to erase the private data. cg_merge.c 263
  • V597 The compiler could delete the 'memset' function call, which is used to flush 'cts' object. The memset_s() function should be used to erase the private data. cg_merge.c 332
  • V597 The compiler could delete the 'memset' function call, which is used to flush 'cpf' object. The memset_s() function should be used to erase the private data. cg_merge.c 394
static
Bool dis_AdvSIMD_scalar_shift_by_imm(DisResult* dres, UInt insn)
{
  ....
  ULong nmask = (ULong)(((Long)0x8000000000000000ULL) >> (sh-1));
  ....
}
PVS-Studio warning: V610 Unspecified behavior. Check the shift operator '>>'. The left operand '((Long) 0x8000000000000000ULL)' is negative. guest_arm64_toIR.c 9428
If the shifted operand has a negative value, the resulting is implementation-defined. Thus, we are dealing with dangerous code.
Now let's consider a situation when the dereference of the pointer is written before its check against NULL:
PRE(xsm_op)
{
   struct vki_xen_flask_op *op = (struct vki_xen_flask_op *)ARG1;

   PRINT("__HYPERVISOR_xsm_op ( %u )", op->cmd);            // <=

   PRE_MEM_READ("__HYPERVISOR_xsm_op", ARG1,
                sizeof(vki_uint32_t) + sizeof(vki_uint32_t));

   if (!op)                                                 // <=
      return;
  ....
}
PVS-Studio warning: V595 The 'op' pointer was utilized before it was verified against nullptr. Check lines: 350, 360. syswrap-xen.c 350
Similar cases:
  • V595 The 'sysctl' pointer was utilized before it was verified against nullptr. Check lines: 568, 578. syswrap-xen.c 568
  • V595 The 'domctl' pointer was utilized before it was verified against nullptr. Check lines: 710, 722. syswrap-xen.c 710
Bool ML_(read_elf_debug_info) ( struct _DebugInfo* di )
{
  ....
  if (inrw && sdynbss_present) {
    vg_assert(di->sbss_present);
    sdynbss_present = False;
    vg_assert(di->sbss_svma + di->sbss_size == svma);
    di->sbss_size += size;
    ....
  } else                                                // <=
  
  if (inrw && !di->sbss_present) {
    di->sbss_present = True;
    di->sbss_svma = svma;
    di->sbss_avma = svma + inrw->bias;
  ....
}
PVS-Studio warning: V705 It is possible that 'else' block was forgotten or commented out, thus altering the program's operation logics. readelf.c 2231
The else keyword in the code looks very suspicious. The code is not aligned in accordance with the logic of his work. In addition, there is an empty line after else. This suggests that we see the consequences of sloppy refactoring and else is not needed here.
static
Bool doHelperCallWithArgsOnStack (....)
{
  ....
   if (guard) {
      if (guard->tag == Iex_Const
          && guard->Iex.Const.con->tag == Ico_U1
          && guard->Iex.Const.con->Ico.U1 == True) {
         /* unconditional -- do nothing */
      } else {
         goto no_match; //ATC
         cc = iselCondCode( env, guard );
      }
   }
  ....
}
PVS-Studio warning: V779 Unreachable code detected. It is possible that an error is present. host_arm_isel.c 461
The code line
cc = iselCondCode( env, guard );
will never get executed.
void reset_valgrind_sink(const char *info)
{
   if (VG_(log_output_sink).fd != initial_valgrind_sink.fd
       && initial_valgrind_sink_saved) {
      VG_(log_output_sink).fd = initial_valgrind_sink.fd;
      VG_(umsg) ("Reset valgrind output to log (%s)\n",
                 (info = NULL ? "" : info));
   }
}
PVS-Studio warning: V547 Expression '((void *) 0)' is always false. server.c 110
The analyzer warning may look strange and needs clarification.
We are interested in the following statement:
(info = NULL ? "" : info))
The macro NULL expands to ((void *) 0) and we get:
(info = ((void *) 0) ? "" : info))
The precedence of the ?: operator is higher than of the = operator, that's why the evaluations are done in the following way:
(info = (((void *) 0) ? "" : info)))
Perhaps, you would agree that the condition ((void *) 0) for the operator ?: looks strange; PVS-Studio also warns us about it. Apparently, we are dealing with a typo and the code should be as follows:
(info == NULL ? "" : info))
And the last code fragment for today:
void genReload_TILEGX ( /*OUT*/ HInstr ** i1,
                        /*OUT*/ HInstr ** i2, HReg rreg,
                        Int offsetB )
{
  TILEGXAMode *am;
  vassert(!hregIsVirtual(rreg));
  am = TILEGXAMode_IR(offsetB, TILEGXGuestStatePointer());

  switch (hregClass(rreg)) {
  case HRcInt64:
    *i1 = TILEGXInstr_Load(8, rreg, am);
    break;
  case HRcInt32:
    *i1 = TILEGXInstr_Load(4, rreg, am);
    break;
  default:
    ppHRegClass(hregClass(rreg));
    vpanic("genReload_TILEGX: unimplemented regclass");
    break;
  }
}
PVS-Studio warning: V751 Parameter 'i2' is not used inside function body. host_tilegx_defs.c 1223
I think the programmer forgot to write NULL by the address i2, as it was done in other similar functions:
*i1 = *i2 = NULL;
There is a similar bug here:
V751 Parameter 'i2' is not used inside function body. host_mips_defs.c 2000

Conclusion

Thank you for attention. Try PVS-Studio static code analyzer for Linux.

Here is information for Windows developers: PVS-Studio for Windows. For them everything is a little easier. They can simply install the plugin for Visual Studio and check their C, C++ and C# projects using a demo-version.

No comments:

Post a Comment