Thursday, February 16, 2017

Porting is a Delicate Matter: Checking Far Manager under Linux

Far Manager, which takes over from Norton Commander, created back in the times of DOS, is one of the most popular file managers on Microsoft Windows. Far Manager facilitates the file system management (file creation, editing, viewing, copying, moving, search, and deletion) and provides means to extend the standard feature set (handling of the network, archives, backup copies, and so on). Far Manager was recently ported to Linux, and there is currently an alpha version available. The PVS-Studio team couldn't ignore that event and miss the opportunity to test the quality of the ported code.
Picture 24

About Far Manager

Far Manager is a console keyboard-oriented file manager for operating systems of the Microsoft Windows family. The project inherits the two-panel layout, the standard (default) color scheme, and the set of keyboard shortcuts from a popular file manager Norton Commander and provides a convenient user interface for handling files and directories (creating, viewing, editing, copying, renaming, deleting, and so on).

Figure 1 - Far Manager 2 on Windows (click to enlarge)
Figure 1 - Far Manager 2 on Windows (click to enlarge)
Far Manager was created by Eugene Roshal. The first version was released on September 10, 1996. The last version (1.65) in whose development Roshal took part is dated June 23, 2000. After that, the Far Group in fact took over the project. It's not until March 29, 2006, that the next version, v1.70, was released. On December 13, 2008, version 2.0 was released and the program became open source. It has been distributed under the revised BSD license ever since. Far Manager versions 1.70 through 2.0 look almost the same, so users can move to newer versions without having to adapt from scratch. Unicode support was added in version 1.80. The latest release, v3.0, is dated November 4, 2016.
On August 10, 2016, the development group released the first test build of the Linux port, Far2l. This build currently features a built-in usable terminal and plugins Align, AutoWrap, Colorer, DrawLine, Editcase, FarFTP, FarLng, MultiArc, NetBox, SimpleIndent, TmpPanel. The source code is distributed under the GPLv2 license.
Figure 2 - Far Manager 2 on Linux (click to enlarge)
Figure 2 - Far Manager 2 on Linux (click to enlarge)

Enough talking

The analyzer output a total of 1038 General Analysis warnings for Far2l project. The chart below shows how the warnings are distributed across the severity levels:
Figure 1 - Warning distribution across the severity levels
Figure 1 - Warning distribution across the severity levels
Let me comment on this diagram briefly. The analyzer output 153 High-level, 336 Medium-level, and 549 Low-level warnings.
This number is relatively large, but we should keep in mind that not each warning is a real bug. Having studied High- and Medium-level messages, I found 250 cases that were very likely to be errors.
For the High and Medium levels, the rate of false positives is about 49%. In other words, every second warning points to a real defect in the code.
Now let's estimate the relative error density. The total number of source lines of code (SLOC) of this project is 538,675. Therefore, the error density is 0.464 errors per 1000 SLOC. One day, I believe, we will gather all these statistical data together and write a summary article about the code quality of different projects.
It should be noted that the error density parameter we have calculated does not reflect the general error density across the whole project: it may be both greater (if the analyzer failed to notice a real bug) and less (if the analyzer reported correct code as faulty). Other projects usually show higher error density, so you can call it a successful port from the code quality viewpoint. However, we strongly recommend that the authors fix the errors found by the analyzer, as they are far from harmless.

Analysis results

One thing you should know before reading on is that the examples discussed below were refactored to make them easier to read. Remember also that these are just the most interesting examples out of all the numerous errors found by PVS-Studio in this project.

Copy-Paste

Picture 28
PVS-Studio diagnostic message: V501 There are identical sub-expressions 'Key == MCODE_F_BM_GET' to the left and to the right of the '||' operator. macro.cpp 4819
int KeyMacro::GetKey()
{
  ....
  DWORD Key = !MR ? MCODE_OP_EXIT : GetOpCode(MR, Work.ExecLIBPos++);
  ....
  switch (Key)
  {
  ....
  case MCODE_F_BM_POP:
  {
    TVar p1, p2;

    if (Key == MCODE_F_BM_GET)
      VMStack.Pop(p2);

    if (   Key == MCODE_F_BM_GET    // <=
        || Key == MCODE_F_BM_DEL 
        || Key == MCODE_F_BM_GET    // <=
        || Key == MCODE_F_BM_GOTO)
    {
      VMStack.Pop(p1);
    }
    ....
  }
  }
}
The Key variable is compared with the MCODE_F_BM_GET constant twice. This must be a typo and the programmer actually intended to compare Key with some other constant. The analyzer found 3 more issues of this kind:
  • V501 There are identical sub-expressions '!StrCmpN(CurStr, L"!/", 2)' to the left and to the right of the '||' operator. fnparce.cpp 291
  • V501 There are identical sub-expressions '!StrCmpN(CurStr, L"!=/", 3)' to the left and to the right of the '||' operator. fnparce.cpp 291
  • V501 There are identical sub-expressions 'KEY_RCTRL' to the left and to the right of the '|' operator. keyboard.cpp 1830
PVS-Studio diagnostic message: V581 The conditional expressions of the 'if' operators situated alongside each other are identical. Check lines: 267, 268. APIStringMap.cpp 268
static BOOL WINPORT(GetStringType)( DWORD type,
                                    LPCWSTR src,
                                    INT count,
                                    LPWORD chartype )
{
  ....
  while (count--)
  {
    int c = *src;
    WORD type1, type3 = 0; /* C3_NOTAPPLICABLE */
    ....
    if ((c>=0xFFE0)&&(c<=0xFFE6)) type3 |= C3_FULLWIDTH; // <=
    if ((c>=0xFFE0)&&(c<=0xFFE6)) type3 |= C3_SYMBOL;    // <=
    ....
  }
  ....
}
The second condition looks like it was written using Copy-Paste and is identical to the first one. However, if this is a conscious decision, the code can be simplified by removing the second condition:
....
if ((c>=0xFFE0)&&(c<=0xFFE6)) type3 |= C3_FULLWIDTH | C3_SYMBOL; 
....
It was not the only error of this type:
  • V581 The conditional expressions of the 'if' operators situated alongside each other are identical. Check lines: 272, 273. APIStringMap.cpp 273
  • V581 The conditional expressions of the 'if' operators situated alongside each other are identical. Check lines: 274, 275. APIStringMap.cpp 275
  • V581 The conditional expressions of the 'if' operators situated alongside each other are identical. Check lines: 6498, 6503. macro.cpp 6503
  • V581 The conditional expressions of the 'if' operators situated alongside each other are identical. Check lines: 1800, 1810. vmenu.cpp 1810
  • V581 The conditional expressions of the 'if' operators situated alongside each other are identical. Check lines: 3353, 3355. wrap.cpp:3355
PVS-Studio diagnostic message: V523 The 'then' statement is equivalent to the 'else' statement. Queque.cpp 358
void FTP::AddToQueque(FAR_FIND_DATA* FileName, 
                      LPCSTR Path, 
                      BOOL Download)
{
  ....
  char *m;
  ....
  if(Download)
    m = strrchr(FileName->cFileName, '/'); // <=
  else
    m = strrchr(FileName->cFileName, '/'); // <=
  ....
}
The second condition in this example must have been also written with the help of "Copy-Paste": whatever the value of Download (TRUEFALSE), the 'm' pointer will be assigned the position of the last occurrence of the '/' character.

Undefined behavior

Picture 37
PVS-Studio diagnostic message: V567 Undefined behavior. The 'Item[FocusPos]->Selected' variable is modified while being used twice between sequence points. dialog.cpp 3827
int Dialog::Do_ProcessSpace()
{
  ....
  if (Item[FocusPos]->Flags & DIF_3STATE)
    (++Item[FocusPos]->Selected) %= 3;       // <=
  else
    Item[FocusPos]->Selected = !Item[FocusPos]->Selected;
  ....
}
We are obviously dealing with undefined behavior here: the Item[FocusPos]->Selected variable is modified twice in one sequence point (an increment and dividing of modulo 3 followed by an assignment).
There was one more fragment with similar undefined behavior:
  • V567 Undefined behavior. The '::ViewerID' variable is modified while being used twice between sequence points. viewer.cpp 117
PVS-Studio diagnostic message: V610 Undefined behavior. Check the shift operator '<<'. The right operand 'sizeof (wchar_t) * 8' is greater than or equal to the length in bits of the promoted left operand. RegExp.cpp 4467
#define rechar wchar_t
#define RE_CHAR_COUNT (1 << sizeof(rechar) * 8)

int RegExp::Optimize()
{
  ....
  for (op=code; ; op=op->next)
  {
    switch (OP.op)
    {
    ....
    case opType:
    {
      for (int i = 0; i < RE_CHAR_COUNT; i++)    // <=
      {
        if (ISTYPE(i, OP.type))
        {
          first[i]=1;
        }
      }
      
      break;
    }
    }
    ....
  }
  ....
}
The error has to do with the fact that the type wchar_t is 4 bytes long on Linux, so signed int (4 bytes) is shifted by 32 bits to the left. As specified by the C++11 standard, when the left operand has a signed type and a positive value, a left-shift by N bytes causes undefined behavior, if N is greater than or equal to the length in bits of the left operand. This is what the fixed version of the code should look like:
#define rechar wchar_t
#define RE_CHAR_COUNT (static_cast<int64_t>(1) << sizeof(rechar) * 8)

int RegExp::Optimize()
{
  ....
  for (int64_t i = 0; i < RE_CHAR_COUNT; i++)
  {
    ....
  }
  ....
}
The analyzer found a few more defects leading to undefined behavior related to the left-shift:
  • V610 Undefined behavior. Check the shift operator '<<'. The right operand 'sizeof (wchar_t) * 8' is greater than or equal to the length in bits of the promoted left operand. RegExp.cpp 4473
  • V610 Undefined behavior. Check the shift operator '<<'. The right operand 'sizeof (wchar_t) * 8' is greater than or equal to the length in bits of the promoted left operand. RegExp.cpp 4490
  • V610 Undefined behavior. Check the shift operator '<<'. The right operand 'sizeof (wchar_t) * 8' is greater than or equal to the length in bits of the promoted left operand. RegExp.cpp 4537
  • V610 Undefined behavior. Check the shift operator '<<'. The right operand 'sizeof (wchar_t) * 8' is greater than or equal to the length in bits of the promoted left operand. RegExp.cpp 4549
  • V610 Undefined behavior. Check the shift operator '<<'. The right operand 'sizeof (wchar_t) * 8' is greater than or equal to the length in bits of the promoted left operand. RegExp.cpp 4561

Incorrect memory handling

Picture 41
Let's start the new section with a little warm-up. Try to spot the bug in the code below by yourself (Hint: it's in the TreeItem::SetTitle function).
class UnicodeString
{
  ....
  UnicodeString(const wchar_t *lpwszData) 
  { 
    SetEUS(); 
    Copy(lpwszData); 
  }
  ....
  const wchar_t *CPtr() const { return m_pData->GetData(); }
  operator const wchar_t *() const { return m_pData->GetData(); }
  ....
}

typedef UnicodeString FARString;

struct TreeItem
{
  FARString strName;
  ....
}

TreeItem **ListData;
void TreeList::SetTitle()
{
  ....
  if (GetFocus())
  {
    FARString strTitleDir(L"{");
    const wchar_t *Ptr = ListData 
                         ? ListData[CurFile]->strName
                         : L""; 
    ....
  }
  ....
}
PVS-Studio diagnostic message: V623 Consider inspecting the '?:' operator. A temporary object of the 'UnicodeString' type is being created and subsequently destroyed. Check third operand. treelist.cpp 2093
Pretty subtle, isn't it? In this example, the ListData[CurFile]->strName variable is an instance of class UnicodeString, which contains an overloaded implicit conversion operator to type const wchar_t*. Now pay attention to the ternary operator in the function TreeList::SetTitle: the second and third operands have different types (UnicodeString and const char [1], respectively). The idea was that if the first operand returns false, then the pointer Ptr will be pointing to an empty string. Since the constructor UnicodeString is not declared as an explicit, in fact, the third operand is chosen as a temporary object, (which, in its turn, will be cast to type const wchar_t*). Further, the temporary object is destroyed and Ptr will point to invalid data. This is what the fixed code looks like:
....
const wchar_t *Ptr = ListData 
                     ? ListData[CurFile]->strName.CPtr()
                     : L"";
....
An interesting thing about the next example is that it triggered two diagnostics at once.
PVS-Studio diagnostic messages:
  • V779 Unreachable code detected. It is possible that an error is present. 7z.cpp 203
  • V773 The function was exited without releasing the 't' pointer. A memory leak is possible. 7z.cpp 202
BOOL WINAPI _export SEVENZ_OpenArchive(const char *Name,
                                       int *Type)
{
  Traverser *t = new Traverser(Name);
  if (!t->Valid()) 
  {
    return FALSE;
    delete t;
  }
  
  delete s_selected_traverser;
  s_selected_traverser = t;
  return TRUE;
}
Well, what do we have here? Firstly, there is, indeed, unreachable code in the if statement's body: if the condition is true, the function exits, returning FALSE. Secondly, that unreachable code simply caused a memory leak: the object pointed to by the pointer is not deleted. To fix these errors, the two statements within the if block need to be swapped.
The next example demonstrates how you can make a mistake when evaluating the size of an object of a class (struct) using a pointer.
PVS-Studio diagnostic messages:
  • V568 It's odd that 'sizeof()' operator evaluates the size of a pointer to a class, but not the size of the 'PInfo' class object. filelist.cpp 672
  • V568 It's odd that 'sizeof()' operator evaluates the size of a pointer to a class, but not the size of the 'PInfo' class object. filelist.cpp 673
int64_t FileList::VMProcess(int OpCode,
                            void *vParam,
                            int64_t iParam)
{
  switch (OpCode)
  {
  ....
  case MCODE_V_PPANEL_PREFIX:           // PPanel.Prefix
  {
    PluginInfo *PInfo = (PluginInfo *)vParam;
    memset(PInfo, 0, sizeof(PInfo));          // <=
    PInfo->StructSize = sizeof(PInfo);        // <=
    ....
  }
  ....
  }
}
Both errors have to do with the function sizeof(PInfo) returning the pointer size (4 or 8 bytes) instead of the expected structure size. Therefore, memset will fill with zeros only the first 4 (8) bytes of the structure, and the PInfo->StructSize field will be assigned the pointer size. Here's the fixed version:
....
PluginInfo *PInfo = (PluginInfo*)vParam;
memset(PInfo, 0, sizeof(*PInfo));
PInfo->StructSize = sizeof(*PInfo);
....
The analyzer found two more defects of this type:
  • V568 It's odd that 'sizeof()' operator evaluates the size of a pointer to a class, but not the size of the 'HistoryItem' class object. history.cpp 594
  • V568 It's odd that 'sizeof()' operator evaluates the size of a pointer to a class, but not the size of the 'handle' class object. plugins.cpp 682

Strange conditions

Picture 39
Another warm-up. Try to find the bug in the code below:
int FTP::ProcessKey(int Key, unsigned int ControlState)
{
  ....
  if(   !ShowHosts 
     && (ControlState == 0 || ControlState == PKF_SHIFT) 
     && Key == VK_F6)
  {
    FTP *ftp = OtherPlugin(this);
    int  rc;

    if(   !ftp 
       && ControlState == 0 
       && Key == VK_F6)
    {
      return FALSE;
    }
    ....
  }
  ....
}
PVS-Studio diagnostic message: V560 A part of conditional expression is always true: Key == 0x75. Key.cpp 493
Note the external and internal conditions: the Key variable is compared with the constant VK_F6. If the execution reaches the internal condition, Key is guaranteed to be equal to VK_F6, making the second check redundant. The improved code will look as follows:
....
if(   !ftp 
   && ControlState == 0)
{
  return FALSE;
}
....
This diagnostic found a few more defects like that:
  • V560 A part of conditional expression is always true: !cps. DString.cpp 47
  • V560 A part of conditional expression is always true: !ShowHosts. FGet.cpp 139
  • V560 A part of conditional expression is always false: !wsz. cnDownload.cpp 190
  • V560 A part of conditional expression is always true: !UserReject. extract.cpp 485
  • And 8 additional diagnostic messages.
PVS-Studio diagnostic message: V503 This is a nonsensical comparison: pointer <= 0. fstd_exSCPY.cpp 8
char *WINAPI StrCpy(char *dest, LPCSTR src, int dest_sz)
{
  if(dest <= 0)   // <=
    return NULL;
  ....
}
This code contains a meaningless comparison of a pointer with a negative value (pointers don't work with memory areas that have negative addresses). This is what the fixed version could look like:
....
if(dest == nullptr)
  return NULL;
....
PVS-Studio diagnostic message: V584 The 'FADC_ALLDISKS' value is present on both sides of the '==' operator. The expression is incorrect or it can be simplified. findfile.cpp 3116
enum FINDASKDLGCOMBO
{
  FADC_ALLDISKS,
  FADC_ALLBUTNET,
  ....
};

FindFiles::FindFiles()
{
  
  ....
  if (   FADC_ALLDISKS + SearchMode == FADC_ALLDISKS     // <=
      || FADC_ALLDISKS + SearchMode == FADC_ALLBUTNET)
  {
    ....
  }
  ....
}
The analyzer detected a strange condition in the first part of a compound conditional expression. Based on the FINDASKDLGCOMBO enumeration, the FADC_ALLDISKS constant has the value 0 and FADC_ALLBUTNET has the value 1. If we use the numerical values in the conditional expression, we'll get the following:
if (   0 + SearchMode == 0
    || 0 + SearchMode == 1)
{
  ....
}
Judging by this code, the whole condition can be simplified:
if (   SearchMode == FADC_ALLDISKS
    || SearchMode == FADC_ALLBUTNET)
{
  ....
}

Incorrect format string handling

Picture 40
PVS-Studio diagnostic message: V576 Incorrect format. Consider checking the fourth actual argument of the 'swprintf' function. The char type argument is expected. FarEditor.cpp 827
void FarEditor::showOutliner(Outliner *outliner)
{
  ....
  wchar_t cls = 
    Character::toLowerCase((*region)[region->indexOf(':') + 1]);
  
  si += swprintf(menuItem+si, 255-si, L"%c ", cls); // <=
  ....
}
This might be a porting error. It has to do with the fact that in Visual C++, the format-string specifiers in the functions printing wide strings are interpreted in a non-standard way: the %c specifier is expecting a wide character (wide char, wchar_t), while on Linux, as specified by the standard, %c is expecting a multibyte character (multibyte symbol, char). The fixed code should look as follows:
si += swprintf(menuItem+si, 255-si, L"%lc ", cls);
 
PVS-Studio diagnostic message: V576 Incorrect format. Consider checking the fourth actual argument of the 'swprintf' function. The pointer to string of char type symbols is expected. cmddata.cpp 257
void CommandData::ReadConfig()
{
  ....
  wchar Cmd[16];
  ....
  wchar SwName[16+ASIZE(Cmd)];
  swprintf(SwName,ASIZE(SwName), L"switches_%s=",Cmd);  // <=
  ....
}
This case is similar to the previous one: the format string contains specifier %s, so a multibyte string (char*) is expected. However, what it receives is a wide string (wchar_t*). This is the fixed code:
swprintf(SwName,ASIZE(SwName), L"switches_%ls=",Cmd);
The analyzer also reported two other instances with incorrectly passed format-string parameters:
  • V576 Incorrect format. Consider checking the third actual argument of the 'fprintf' function. The char type argument is expected. vtansi.cpp 1033
  • V576 Incorrect format. Consider checking the third actual argument of the 'fprintf' function. The char type argument is expected. vtansi.cpp 1038

Conclusion

What conclusions can we draw about the Linux port of Far Manager? True, there are many defects, but it's just an alpha version after all, and the project is still developing. Static code analysis can help you find bugs at the earliest development stage and prevent them from making it to the repository, but to fully feel its advantages, you should run it regularly (or at least during night builds).
I offer you all to try PVS-Studio and evaluate its usefulness for yourself: the analyzer can run on Microsoft Windows and supports deb/rpm-based Linux distributions, allowing you to scan projects quickly and on a regular basis. What's more, if you are a student, an individual developer, or a developer of open-source, non-commercial software, you can use PVS-Studio for free.

In this video tutorial you may see how install PVS-Studio for Linux and check your project (using Far Manager as an example). If you know an interesting project worth checking, you may suggest it on GitHub. Here are more details about that: "Propose a project for analysis by PVS-Studio: now on GitHub".

No comments:

Post a Comment