The Evil within the Comparison Functions
Perhaps, readers remember my article titled "Last line effect". It describes a pattern I've once noticed: in most cases programmers make an error in the last line of similar text blocks. Now I want to tell you about a new interesting observation. It turns out that programmers tend to make mistakes in functions comparing two objects. This statement looks implausible; however, I'll show you a great number of examples of errors that may be shocking to a reader. So, here is a new research, it will be quite amusing and scary.
Problematics
Here is my statement: programmers quite often make mistakes in rather simple functions that are meant to compare two objects. This claim is based on the experience of our team in checking a large number of open source projects in C, C++ and C#.
The functions we are going to consider here are IsEqual, Equals, Compare, AreEqual and so on or overloaded operators as ==, !=.
I noticed that when writing articles, very often I come across errors related to the comparison functions. I decided to explore this question in detail and examined the base of errors we found. I did a search of functions throughout the base containing words Cmp, Equal, Compare and such. The result was very impressive and shocking.
In fact this story is similar to the one we had when writing the article "Last line effect". Similarly, I noticed an anomaly and decided to explore it more carefully. Unfortunately, unlike the aforementioned article, I don't know how to bring statistics here and which figures to provide. Perhaps, later I'll come up with a solution with the statistics. At this point I am guided by intuition and can only share my feelings. They see that there are a lot of errors in the comparison functions and I am sure, you will get the same feeling when you see that huge amount of truly impressive examples.
Psychology
For a moment let's go back to the article "Last line effect". By the way, if you haven't read it, I suggest taking a break and looking at it. There is a more detailed analysis of this topic: "The last line effect explained"
In general, we can conclude that the cause of the errors in the last lined is related to the fact that the developer has already mentally moved to the new lines/tasks instead of focusing on the completion of the current fragment. As a result - when writing similar blocks of text, there is a higher probability that a programmer will make an error in the last one.
I believe that in the case of writing a comparison function, a developer in general often don't focus on it, considering it to be too trivial. In other words, he writes the code automatically, without thinking over it. Otherwise, it is not clear how one can make an error like this:
bool IsLuidsEqual(LUID luid1, LUID luid2) { return (luid1.LowPart == luid2.LowPart) && (luid2.HighPart == luid2.HighPart); }
PVS-Studio analyzer detected this error in the code of RunAsAdmin Explorer Shim (C++) project: V501 There are identical sub-expressions to the left and to the right of the '==' operator: luid2.HighPart == luid2.HighPart RAACommon raacommonfuncs.cpp 1511
A typo. In the second line it should be: luid1.HighPart == luid2.HighPart.
The code is very simple. Apparently, the simplicity of code spoils everything. A programmer immediately thinks of the task to write such a function as standard and uninteresting. He instantly thinks of the way to write the function and he has just to implement the code. This is a routine, but unfortunately an inevitable process to start writing more important, complex and interesting code. He is already thinking about the new task... and as a result - makes an error.
In addition, programmers rarely write unit tests for such functions. Again the simplicity of these functions prevents from it. It seems that it would be too much to test them, as these functions are simple and repetitive. A person has written hundreds of such functions in his life, can he make an error in another function? Yes, he can and he does.
I would also like to note that we aren't talking about code of students who are just learning to program. We are talking about bugs in the code of such projects as GCC, Qt, GDB, LibreOffice, Unreal Engine, CryEngine 4 V Chromium, MongoDB, Oracle VM Virtual Box, FreeBSD, WinMerge, the CoreCLR, MySQL, Mono, CoreFX, Roslyn, MSBuild, etc. It's all very serious.
We are going to have a look at so many diverse examples that it would be scary to sleep at night.
Erroneous Patterns in Comparison Functions
All errors in comparison functions will be divided into several patterns. In the article we'll be talking about errors in projects in C, C++ and C#, but it makes no sense to separate these languages, as most of the patterns are similar for different languages.
Pattern: A < B, B > A
Very often in the comparison functions there is a need to make such checks:
- A < B
- A > B
Sometimes programmers think that is more elegant to use the same operator <, but to switch the variables.
- A < B
- B < A
However, due to the inattentiveness, we get such checks:
- A < B
- B > A
In fact, one and the same comparison is done twice here. Perhaps, it's not clear what it is about here, but we'll get to the practical examples and it'll all become clearer.
string _server; .... bool operator<( const ServerAndQuery& other ) const { if ( ! _orderObject.isEmpty() ) return _orderObject.woCompare( other._orderObject ) < 0; if ( _server < other._server ) return true; if ( other._server > _server ) return false; return _extra.woCompare( other._extra ) < 0; }
PVS-Studio analyzer detected this error in the code of MongoDB (C++): V581 The conditional expressions of the 'if' operators situated alongside each other are identical. Check lines: 44, 46. parallel.h 46
This condition:
if ( other._server > _server )
Will always be false, as the same check was done two lines before. Correct code variant:
if ( _server < other._server ) return true; if ( other._server < _server ) return false;
This error was detected in the code of Chromium project (C++):
enum ContentSettingsType; struct EntryMapKey { ContentSettingsType content_type; ... }; bool OriginIdentifierValueMap::EntryMapKey::operator<( const OriginIdentifierValueMap::EntryMapKey& other) const { if (content_type < other.content_type) return true; else if (other.content_type > content_type) return false; return (resource_identifier < other.resource_identifier); }
PVS-Studio warning: V517 The use of 'if (A) {...} else if (A) {...}' pattern was detected. There is a probability of logical error presence. Check lines: 61, 63. browser content_settings_origin_identifier_value_map.cc 61
That was a C++ example, now it's C# turn. The next error was found in the code of IronPython and IronRuby (C#).
public static int Compare(SourceLocation left, SourceLocation right) { if (left < right) return -1; if (right > left) return 1; return 0; }
PVS-Studio warning (C#): V3021 There are two 'if' statements with identical conditional expressions. The first 'if' statement contains method return. This means that the second 'if' statement is senseless. SourceLocation.cs 156
I think there is no need in explanation.
Note. For C# there was just one example of an error, but for C++ - two. In general, there will be less bugs in the C# code, than for C/C++. But I do not recommend rushing to the conclusion that C# is much safer. The thing is that PVS-Studio analyzer has only recently learned to check C# code relatively recently, and we have just checked less projects written in C#, than in C and C++.
Pattern: a Member of the Class is Compared with itself
The comparison functions usually consist of successive comparisons of structure/class members. This code tends to be more erronreous, when the member of the class starts being compared with itself. I can specify two subtypes of errors.
In the first case, a programmer forgets to specify the name of the object and writes in the following way:
return m_x == foo.m_x && m_y == m_y && // <= m_z == foo.m_z; In the second case, the same name of the object is written. return zzz.m_x == foo.m_x && zzz.m_y == zzz.m_y && // <= zzz.m_z == foo.m_z;
Let's take a closer look at practical examples of this pattern. Pay attention that incorrect comparison often occurs in the last block of similar code blocks, which reminds us of the "last line effect" again.
The error is found in the code of Unreal Engine 4 (C++) project:
bool Compare(const FPooledRenderTargetDesc& rhs, bool bExact) const { .... return Extent == rhs.Extent && Depth == rhs.Depth && bIsArray == rhs.bIsArray && ArraySize == rhs.ArraySize && NumMips == rhs.NumMips && NumSamples == rhs.NumSamples && Format == rhs.Format && LhsFlags == RhsFlags && TargetableFlags == rhs.TargetableFlags && bForceSeparateTargetAndShaderResource == rhs.bForceSeparateTargetAndShaderResource && ClearValue == rhs.ClearValue && AutoWritable == AutoWritable; // <= }
PVS-Studio warning: V501 There are identical sub-expressions to the left and to the right of the '==' operator: AutoWritable == AutoWritable rendererinterface.h 180
The code of Samba (C) project:
static int compare_procids(const void *p1, const void *p2) { const struct server_id *i1 = (struct server_id *)p1; const struct server_id *i2 = (struct server_id *)p2; if (i1->pid < i2->pid) return -1; if (i2->pid > i2->pid) return 1; return 0; }
PVS-Studio warning: V501 There are identical sub-expressions to the left and to the right of the '>' operator: i2->pid > i2->pid brlock.c 1901
The code of MongoDB (C++) project:
bool operator==(const MemberCfg& r) const { .... return _id==r._id && votes == r.votes && h == r.h && priority == r.priority && arbiterOnly == r.arbiterOnly && slaveDelay == r.slaveDelay && hidden == r.hidden && buildIndexes == buildIndexes; // <= }
PVS-Studio warning: V501 There are identical sub-expressions to the left and to the right of the '==' operator: buildIndexes == buildIndexes rs_config.h 101
The code of Geant4 Software (C++) project:
inline G4bool G4FermiIntegerPartition:: operator==(const G4FermiIntegerPartition& right) { return (total == right.total && enableNull == enableNull && // <= partition == right.partition); }
PVS-Studio warning: V501 There are identical sub-expressions to the left and to the right of the '==' operator: enableNull == enableNull G4hadronic_deex_fermi_breakup g4fermiintegerpartition.icc 58
The code of LibreOffice (C++) project:
class SvgGradientEntry { .... bool operator==(const SvgGradientEntry& rCompare) const { return (getOffset() == rCompare.getOffset() && getColor() == getColor() // <= && getOpacity() == getOpacity()); // <= } .... }
PVS-Studio warning: V501 There are identical sub-expressions to the left and to the right of the '==' operator: getColor() == getColor() svggradientprimitive2d.hxx 61
The code of Chromium (C++) project:
bool FileIOTest::MatchesResult(const TestStep& a, const TestStep& b) { .... return (a.data_size == a.data_size && // <= std::equal(a.data, a.data + a.data_size, b.data)); }
PVS-Studio warning: V501 There are identical sub-expressions to the left and to the right of the '==' operator: a.data_size == a.data_size cdm_file_io_test.cc 367
The code of FreeCAD (C++) project:
bool FaceTypedBSpline::isEqual(const TopoDS_Face &faceOne, const TopoDS_Face &faceTwo) const { .... if (surfaceOne->IsURational() != surfaceTwo->IsURational()) return false; if (surfaceTwo->IsVRational() != // <= surfaceTwo->IsVRational()) // <= return false; if (surfaceOne->IsUPeriodic() != surfaceTwo->IsUPeriodic()) return false; if (surfaceOne->IsVPeriodic() != surfaceTwo->IsVPeriodic()) return false; if (surfaceOne->IsUClosed() != surfaceTwo->IsUClosed()) return false; if (surfaceOne->IsVClosed() != surfaceTwo->IsVClosed()) return false; if (surfaceOne->UDegree() != surfaceTwo->UDegree()) return false; if (surfaceOne->VDegree() != surfaceTwo->VDegree()) return false; .... }
PVS-Studio warning: V501 There are identical sub-expressions 'surfaceTwo->IsVRational()' to the left and to the right of the '!=' operator. modelrefine.cpp 780
The code of Serious Engine (C++) project:
class CTexParams { public: inline BOOL IsEqual( CTexParams tp) { return tp_iFilter == tp.tp_iFilter && tp_iAnisotropy == tp_iAnisotropy && // <= tp_eWrapU == tp.tp_eWrapU && tp_eWrapV == tp.tp_eWrapV; }; .... };
PVS-Studio warning: V501 There are identical sub-expressions to the left and to the right of the '==' operator: tp_iAnisotropy == tp_iAnisotropy gfx_wrapper.h 180
The code of Qt (C++) project:
inline bool qCompare(QImage const &t1, QImage const &t2, ....) { .... if (t1.width() != t2.width() || t2.height() != t2.height()) { .... }
PVS-Studio warning: V501 There are identical sub-expressions to the left and to the right of the '!=' operator: t2.height() != t2.height() qtest_gui.h 101
The code of FreeBSD (C) project:
static int compare_sh(const void *_a, const void *_b) { const struct ipfw_sopt_handler *a, *b; a = (const struct ipfw_sopt_handler *)_a; b = (const struct ipfw_sopt_handler *)_b; .... if ((uintptr_t)a->handler < (uintptr_t)b->handler) return (-1); else if ((uintptr_t)b->handler > (uintptr_t)b->handler) // <= return (1); return (0); }
PVS-Studio warning: V501 There are identical sub-expressions '(uintptr_t) b->handler' to the left and to the right of the '>' operator. ip_fw_sockopt.c 2893
The code of Mono (C#) project:
static bool AreEqual (VisualStyleElement value1, VisualStyleElement value2) { return value1.ClassName == value1.ClassName && // <= value1.Part == value2.Part && value1.State == value2.State; }
PVS-Studio warning: V3001 There are identical sub-expressions 'value1.ClassName' to the left and to the right of the '==' operator. ThemeVisualStyles.cs 2141
The code of Mono (C#) project:
public int ExactInference (TypeSpec u, TypeSpec v) { .... var ac_u = (ArrayContainer) u; var ac_v = (ArrayContainer) v; .... var ga_u = u.TypeArguments; var ga_v = v.TypeArguments; .... if (u.TypeArguments.Length != u.TypeArguments.Length) // <= return 0; .... }
PVS-Studio warning: V3001 There are identical sub-expressions 'u.TypeArguments.Length' to the left and to the right of the '!=' operator. generic.cs 3135
The code of MonoDevelop (C#) project:
Accessibility DeclaredAccessibility { get; } bool IsStatic { get; } private bool MembersMatch(ISymbol member1, ISymbol member2) { if (member1.Kind != member2.Kind) { return false; } if (member1.DeclaredAccessibility != // <=1 member1.DeclaredAccessibility // <=1 || member1.IsStatic != member1.IsStatic) // <=2 { return false; } if (member1.ExplicitInterfaceImplementations().Any() || member2.ExplicitInterfaceImplementations().Any()) { return false; } return SignatureComparer .HaveSameSignatureAndConstraintsAndReturnTypeAndAccessors( member1, member2, this.IsCaseSensitive); }
PVS-Studio warning: V3001 There are identical sub-expressions 'member1.IsStatic' to the left and to the right of the '!=' operator. CSharpBinding AbstractImplementInterfaceService.CodeAction.cs 545
The code of Haiku (C++) project:
int __CORTEX_NAMESPACE__ compareTypeAndID(....) { int retValue = 0; .... if (lJack && rJack) { if (lJack->m_jackType < lJack->m_jackType) // <= { return -1; } if (lJack->m_jackType == lJack->m_jackType) // <= { if (lJack->m_index < rJack->m_index) { return -1; } else { return 1; } } else if (lJack->m_jackType > rJack->m_jackType) { retValue = 1; } } return retValue; }
PVS-Studio warning: V501 There are identical sub-expressions to the left and to the right of the '<' operator: lJack->m_jackType < lJack->m_jackType MediaJack.cpp 783
Just below there is exactly the same error. As I understand, in both cases a programmer forgot to replace lJack with rJack.
The code of CryEngine V (C++) project:
bool CompareRotation(const Quat& q1, const Quat& q2, float epsilon) { return (fabs_tpl(q1.v.x - q2.v.x) <= epsilon) && (fabs_tpl(q1.v.y - q2.v.y) <= epsilon) && (fabs_tpl(q2.v.z - q2.v.z) <= epsilon) // <= && (fabs_tpl(q1.w - q2.w) <= epsilon); }
PVS-Studio warning: V501 There are identical sub-expressions to the left and to the right of the '-' operator: q2.v.z - q2.v.z entitynode.cpp 93
Pattern: Evaluating the Size of a Pointer Instead of the Size of the Structure/Class
This type of error occurs in programs written in C and C++ and is caused by incorrect use of the sizeof operator. The error in evaluating not the size of the object, but the size of the pointer. Example:
T *a = foo1(); T *b = foo2(); x = memcmp(a, b, sizeof(a));
Instead of the size of the T structure, a size of the pointer gets evaluated. The size of the pointer depends on the used data model, but usually it is 4 or 8. As a result, more or less bites in the memory get compared than take the structure.
Correct variant of the code:
x = memcmp(a, b, sizeof(T));
or
x = memcmp(a, b, sizeof(*a));
Now let's move on to the practical part. Here is how such a bug looks in the code of CryEngine V (C++) code:
bool operator==(const SComputePipelineStateDescription& other) const { return 0 == memcmp(this, &other, sizeof(this)); }
PVS-Studio warning: V579 The memcmp function receives the pointer and its size as arguments. It is possibly a mistake. Inspect the third argument. graphicspipelinestateset.h 58
The code of Unreal Engine 4 project (C++):
bool FRecastQueryFilter::IsEqual( const INavigationQueryFilterInterface* Other) const { // @NOTE: not type safe, should be changed when // another filter type is introduced return FMemory::Memcmp(this, Other, sizeof(this)) == 0;
}
PVS-Studio warning: V579 The Memcmp function receives the pointer and its size as arguments. It is possibly a mistake. Inspect the third argument. pimplrecastnavmesh.cpp 172
Pattern: Repetitive Arguments of Cmp(A, A) Type
Comparison functions usually call other comparison functions. At the same time one of the possible errors is that the reference/pointer is passed to the same object twice. Example:
x = memcmp(A, A, sizeof(T));
Here the object A will be compared with itself, which, is of course, has no sense.
We'll start with an error, found in the debugger GDB (C):
static int psymbol_compare (const void *addr1, const void *addr2, int length) { struct partial_symbol *sym1 = (struct partial_symbol *) addr1; struct partial_symbol *sym2 = (struct partial_symbol *) addr2; return (memcmp (&sym1->ginfo.value, &sym1->ginfo.value, // <= sizeof (sym1->ginfo.value)) == 0 && sym1->ginfo.language == sym2->ginfo.language && PSYMBOL_DOMAIN (sym1) == PSYMBOL_DOMAIN (sym2) && PSYMBOL_CLASS (sym1) == PSYMBOL_CLASS (sym2) && sym1->ginfo.name == sym2->ginfo.name); }
PVS-Studio warning: V549 The first argument of 'memcmp' function is equal to the second argument. psymtab.c 1580
The code of CryEngineSDK project (C++):
inline bool operator != (const SEfResTexture &m) const { if (stricmp(m_Name.c_str(), m_Name.c_str()) != 0 || // <= m_TexFlags != m.m_TexFlags || m_bUTile != m.m_bUTile || m_bVTile != m.m_bVTile || m_Filter != m.m_Filter || m_Ext != m.m_Ext || m_Sampler != m.m_Sampler) return true; return false; }
PVS-Studio warning: V549 The first argument of 'stricmp' function is equal to the second argument. ishader.h 2089
The code of PascalABC.NET (C#):
private List<string> enum_consts = new List<string>(); public override bool IsEqual(SymScope ts) { EnumScope es = ts as EnumScope; if (es == null) return false; if (enum_consts.Count != es.enum_consts.Count) return false; for (int i = 0; i < es.enum_consts.Count; i++) if (string.Compare(enum_consts[i], this.enum_consts[i], true) != 0) return false; return true; }
PVS-Studio warning: V3038 The 'enum_consts[i]' argument was passed to 'Compare' method several times. It is possible that other argument should be passed instead. CodeCompletion SymTable.cs 2206
I'll give some explanation here. The error in the factual arguments of the Compare function:
string.Compare(enum_consts[i], this.enum_consts[i], true)
The thing is that enum_consts[i] and this.enum_consts[i are the same things. As I understand, a correct call should be like this:
string.Compare(es.enum_consts[i], this.enum_consts[i], true)
or
string.Compare(enum_consts[i], es.enum_consts[i], true)
Pattern: Repetitive Checks A==B && A==B
Quite a common error in programming is when the same check is done twice. Example:
return A == B && C == D && // <= C == D && // <= E == F;
Two variants are possible in this case. The first is quite harmless: one comparison is redundant and can be simply removed. The second is worse: some other variables were to be compared, but a programmer made a typo.
In any case, such code deserves close attention. Let me scare you a little more, and show that this error can be found even in the code of GCC compiler (C):
static bool dw_val_equal_p (dw_val_node *a, dw_val_node *b) { .... case dw_val_class_vms_delta: return (!strcmp (a->v.val_vms_delta.lbl1, b->v.val_vms_delta.lbl1) && !strcmp (a->v.val_vms_delta.lbl1, b->v.val_vms_delta.lbl1)); .... }
PVS-Studio warning: V501 There are identical sub-expressions '!strcmp(a->v.val_vms_delta.lbl1, b->v.val_vms_delta.lbl1)' to the left and to the right of the '&&' operator. dwarf2out.c 1428
The function strcmp is called twice with the same set of arguments.
The code of Unreal Engine 4 project (C++):
FORCEINLINE bool operator==(const FShapedGlyphEntryKey& Other) const { return FontFace == Other.FontFace && GlyphIndex == Other.GlyphIndex // <= && FontSize == Other.FontSize && FontScale == Other.FontScale && GlyphIndex == Other.GlyphIndex; // <= }
PVS-Studio warning: V501 There are identical sub-expressions 'GlyphIndex == Other.GlyphIndex' to the left and to the right of the '&&' operator. fontcache.h 139
The code of Serious Engine project (C++):
inline BOOL CValuesForPrimitive::operator==(....) { return ( (....) && (vfp_ptPrimitiveType == vfpToCompare.vfp_ptPrimitiveType) && .... (vfp_ptPrimitiveType == vfpToCompare.vfp_ptPrimitiveType) && .... );
PVS-Studio warning: V501 There are identical sub-expressions '(vfp_ptPrimitiveType == vfpToCompare.vfp_ptPrimitiveType)' to the left and to the right of the '&&' operator. worldeditor.h 580
The code of Oracle VM Virtual Box project (C++):
typedef struct SCMDIFFSTATE { .... bool fIgnoreTrailingWhite; bool fIgnoreLeadingWhite; .... } SCMDIFFSTATE; /* Pointer to a diff state. */ typedef SCMDIFFSTATE *PSCMDIFFSTATE; /* Compare two lines */ DECLINLINE(bool) scmDiffCompare(PSCMDIFFSTATE pState, ....) { .... if (pState->fIgnoreTrailingWhite // <= || pState->fIgnoreTrailingWhite) // <= return scmDiffCompareSlow(....); .... }
PVS-Studio warning: V501 There are identical sub-expressions 'pState->fIgnoreTrailingWhite' to the left and to the right of the '||' operator. scmdiff.cpp 238
Pattern: Incorrect Use of the Value, Returned by memcmp Function
The memcmp function returns the following values of int type:
- < 0 - buf1 less than buf2;
- 0 - buf1 identical to buf2;
- > 0 - buf1 greater than buf2;
Please note that '>0' can be any number, not only 1. These numbers can be: 2, 3, 100, 256, 1024, 5555, 65536 and so on. This means that this result cannot be placed to a variable of the char and short type. The high bits can be lost, which might violate the logic of program execution.
Also this means that the result cannot be compared with constants 1 or -1. In other words, it is wrong to write this:
if (memcmp(a, b, sizeof(T)) == 1) if (memcmp(x, y, sizeof(T)) == -1)
Correct comparisons:
if (memcmp(a, b, sizeof(T)) > 0) if (memcmp(a, b, sizeof(T)) < 0)
The danger of this code is that it may successfully work for a long time. The errors may start showing up when moving to a new platform or with the change of the compiler version.
The code of ReactOS project (C++):
HRESULT WINAPI CRecycleBin::CompareIDs(....) { .... return MAKE_HRESULT(SEVERITY_SUCCESS, 0, (unsigned short)memcmp(pidl1->mkid.abID, pidl2->mkid.abID, pidl1->mkid.cb)); }
PVS-Studio warning: V642 Saving the 'memcmp' function result inside the 'unsigned short' type variable is inappropriate. The significant bits could be lost breaking the program's logic. recyclebin.cpp 542
The code of Firebird project (C++):
SSHORT TextType::compare(ULONG len1, const UCHAR* str1, ULONG len2, const UCHAR* str2) { .... SSHORT cmp = memcmp(str1, str2, MIN(len1, len2)); if (cmp == 0) cmp = (len1 < len2 ? -1 : (len1 > len2 ? 1 : 0)); return cmp; }
PVS-Studio warning: V642 Saving the 'memcmp' function result inside the 'short' type variable is inappropriate. The significant bits could be lost breaking the program's logic. texttype.cpp 338
The code of CoreCLR project (C++):
bool operator( )(const GUID& _Key1, const GUID& _Key2) const { return memcmp(&_Key1, &_Key2, sizeof(GUID)) == -1; }
PVS-Studio warning: V698 Expression 'memcmp(....) == -1' is incorrect. This function can return not only the value '-1', but any negative value. Consider using 'memcmp(....) < 0' instead. sos util.cpp 142
The code of OpenToonz project (C++):
bool TFilePath::operator<(const TFilePath &fp) const { .... char differ; differ = _wcsicmp(iName.c_str(), jName.c_str()); if (differ != 0) return differ < 0 ? true : false; .... }
PVS-Studio warning: V642 Saving the '_wcsicmp' function result inside the 'char' type variable is inappropriate. The significant bits could be lost, breaking the program's logic. tfilepath.cpp 328
Pattern: Incorrect Check of Null References
This error pattern is typical for C# programs. Sometimes in the comparison functions programmers write the type casting with the help of the as operator. The error is that inadvertently a programmer verifies against null not the new reference, but the original one. Let's take a look at a synthetic example:
ChildT foo = obj as ChildT; if (obj == null) return false; if (foo.zzz()) {}
The check if (obj == null) protects from the situation, if the obj variable contains a null reference. However, there is no protection from the case if it turns out that the as operator returns a null reference. The correct code should be like this:
ChildT foo = obj as ChildT; if (foo == null) return false; if (foo.zzz()) {}
Typically, this error occurs due to negligence of the programmer. Similar bugs are possible in the programs in C and C++, but I haven't found such a case in our error base.
The code of MonoDevelop project (C#):
public override bool Equals (object o) { SolutionItemReference sr = o as SolutionItemReference; if (o == null) return false; return (path == sr.path) && (id == sr.id); }
PVS-Studio warning: V3019 Possibly an incorrect variable is compared to null after type conversion using 'as' keyword. Check variables 'o', 'sr'. MonoDevelop.Core SolutionItemReference.cs 81
The code of CoreFX (C#):
public override bool Equals(object comparand) { CredentialHostKey comparedCredentialKey = comparand as CredentialHostKey; if (comparand == null) { // This covers also the compared == null case return false; } bool equals = string.Equals(AuthenticationType, comparedCredentialKey.AuthenticationType, .... .... }
PVS-Studio warning: V3019 Possibly an incorrect variable is compared to null after type conversion using 'as' keyword. Check variables 'comparand', 'comparedCredentialKey'. CredentialCache.cs 4007
The code of Roslyn project (C#):
public override bool Equals(object obj) { var d = obj as DiagnosticDescription; if (obj == null) return false; if (!_code.Equals(d._code)) return false; .... }
PVS-Studio warning: V3019 Possibly an incorrect variable is compared to null after type conversion using 'as' keyword. Check variables 'obj', 'd'. DiagnosticDescription.cs 201
The code of Roslyn (C#):
protected override bool AreEqual(object other) { var otherResourceString = other as LocalizableResourceString; return other != null && _nameOfLocalizableResource == otherResourceString._nameOfLocalizableResource && _resourceManager == otherResourceString._resourceManager && _resourceSource == otherResourceString._resourceSource && .... }
PVS-Studio warning: V3019 Possibly an incorrect variable is compared to null after type conversion using 'as' keyword. Check variables 'other', 'otherResourceString'. LocalizableResourceString.cs 121
The code of MSBuild project (C#):
public override bool Equals(object obj) { AssemblyNameExtension name = obj as AssemblyNameExtension; if (obj == null) // <= { return false; } .... }
PVS-Studio warning: V3019 Possibly an incorrect variable is compared to null after type conversion using 'as' keyword. Check variables 'obj', 'name'. AssemblyRemapping.cs 64
The code of Mono project (C#):
public override bool Equals (object o) { UrlMembershipCondition umc = (o as UrlMembershipCondition); if (o == null) // <= return false; .... return (String.Compare (u, 0, umc.Url, ....) == 0); // <= }
PVS-Studio warning: V3019 Possibly an incorrect variable is compared to null after type conversion using 'as' keyword. Check variables 'o', 'umc'. UrlMembershipCondition.cs 111
The code of Media Portal 2 project (C#):
public override bool Equals(object obj) { EpisodeInfo other = obj as EpisodeInfo; if (obj == null) return false; if (TvdbId > 0 && other.TvdbId > 0) return TvdbId == other.TvdbId; .... }
PVS-Studio warning: V3019 Possibly an incorrect variable is compared to null after type conversion using 'as' keyword. Check variables 'obj', 'other'. EpisodeInfo.cs 560
The code of NASA World Wind project (C#):
public int CompareTo(object obj) { RenderableObject robj = obj as RenderableObject; if(obj == null) // <= return 1; return this.m_renderPriority.CompareTo(robj.RenderPriority); }
PVS-Studio warning: V3019 Possibly an incorrect variable is compared to null after type conversion using 'as' keyword. Check variables 'obj', 'robj'. RenderableObject.cs 199
Pattern: Incorrect Loops
In some functions, collections of items are compared. Of course, different variant of the loops are used for its comparison. If a programmer writes the code inattentively, it's easy to mix something up, as it is with the comparison functions. Let's look at a few of these situations.
The code of Trans-Proteomic Pipeline (C++):
bool Peptide::operator==(Peptide& p) { .... for (i = 0, j = 0; i < this->stripped.length(), j < p.stripped.length(); i++, j++) { .... }
PVS-Studio warning: V521 Such expressions using the ',' operator are dangerous. Make sure the expression is correct. tpplib peptide.cpp 191
Note that the comma operator is used in the condition. The code is clearly incorrect, because the condition, written to the left of the coma is ignored. That is, the condition on the left is evaluated, but its result is not used in any way.
The code of Qt project (C++):
bool equals( class1* val1, class2* val2 ) const { ... size_t size = val1->size(); ... while ( --size >= 0 ){ if ( !comp(*itr1,*itr2) ) return false; itr1++; itr2++; } ... }
PVS-Studio warning: V547 Expression '-- size >= 0' is always true. Unsigned type value is always >= 0. QtCLucene arrays.h 154
The code of CLucene project (C++):
class Arrays { .... bool equals( class1* val1, class2* val2 ) const{ static _comparator comp; if ( val1 == val2 ) return true; size_t size = val1->size(); if ( size != val2->size() ) return false; _itr1 itr1 = val1->begin(); _itr2 itr2 = val2->begin(); while ( --size >= 0 ){ if ( !comp(*itr1,*itr2) ) return false; itr1++; itr2++; } return true; } .... }
PVS-Studio warning: V547 Expression '-- size >= 0' is always true. Unsigned type value is always >= 0. arrays.h 154
The code of Mono project (C#):
public override bool Equals (object obj) { .... for (int i=0; i < list.Count; i++) { bool found = false; for (int j=0; i < ps.list.Count; j++) { // <= if (list [i].Equals (ps.list [j])) { found = true; break; } } if (!found) return false; } return true; }
PVS-Studio warning: V3015 It is likely that a wrong variable is being compared inside the 'for' operator. Consider reviewing 'i' corlib-net_4_x PermissionSet.cs 607
Apparently, there is a typo here, and the variable j instead of i should be used in the nested loop:
for (int j=0; j < ps.list.Count; j++)
Pattern: A = getA(), B = GetA()
Quite often in the comparison functions a programmer has to write code of this kind:
if (GetA().x == GetB().x && GetA().y == GetB().y)
Intermediate variables are used to reduce the size of the conditions or for optimization:
Type A = GetA(); Type B = GetB(); if (A.x == B.x && A.y == B.y)
But inadvertently, a person sometimes makes a mistake and initializes temporary variables with the same value:
Type A = GetA(); Type B = GetA();
Now let's take a look at these errors in the code of real applications.
The code of LibreOffice project (C++):
bool CmpAttr( const SfxPoolItem& rItem1, const SfxPoolItem& rItem2) { .... bool bNumOffsetEqual = false; ::boost::optional<sal_uInt16> oNumOffset1 = static_cast<const SwFmtPageDesc&>(rItem1).GetNumOffset(); ::boost::optional<sal_uInt16> oNumOffset2 = static_cast<const SwFmtPageDesc&>(rItem1).GetNumOffset(); if (!oNumOffset1 && !oNumOffset2) { bNumOffsetEqual = true; } else if (oNumOffset1 && oNumOffset2) { bNumOffsetEqual = oNumOffset1.get() == oNumOffset2.get(); } else { bNumOffsetEqual = false; } .... }
PVS-Studio warning: V656 Variables 'oNumOffset1', 'oNumOffset2' are initialized through the call to the same function. It's probably an error or un-optimized code. Check lines: 68, 69. findattr.cxx 69
The code of Qt project (C++):
AtomicComparator::ComparisonResult IntegerComparator::compare(const Item &o1, const AtomicComparator::Operator, const Item &o2) const { const Numeric *const num1 = o1.as<Numeric>(); const Numeric *const num2 = o1.as<Numeric>(); if(num1->isSigned() || num2->isSigned()) .... }
PVS-Studio warning: V656 Variables 'num1', 'num2' are initialized through the call to the same function. It's probably an error or un-optimized code. Consider inspecting the 'o1.as < Numeric > ()' expression. Check lines: 220, 221. qatomiccomparators.cpp 221
Pattern: Sloppy Copying of the Code
A large amount of errors, cited previously can be called the consequences of sloppy Copy-Paste. They fell under some categories of the erroneous pattern and I decided that it would be logical to describe them in corresponding sections. However, I have several errors that have clearly appeared because of sloppy code copying, but I have no idea how to classify them. That's why I collected these errors here.
The code of CoreCLR project (C++):
int __cdecl Compiler::RefCntCmp(const void* op1, const void* op2) { .... if (weight1) { .... if (varTypeIsGC(dsc1->TypeGet())) { weight1 += BB_UNITY_WEIGHT / 2; } if (dsc1->lvRegister) { weight1 += BB_UNITY_WEIGHT / 2; } } if (weight1) { .... if (varTypeIsGC(dsc2->TypeGet())) { weight1 += BB_UNITY_WEIGHT / 2; // <= } if (dsc2->lvRegister) { weight2 += BB_UNITY_WEIGHT / 2; } } .... }
PVS-Studio warning: V778 Two similar code fragments were found. Perhaps, this is a typo and 'weight2' variable should be used instead of 'weight1'. clrjit lclvars.cpp 2702
The function was long that's why it is shortened for the article. If we examine the code of the function, we'll see that a part of the code was copied, but in one fragment a programmer forgot to replace the variable weight1 with weight2.
The code of WPF samples by Microsoft project (C#):
public int Compare(GlyphRun a, GlyphRun b) { .... if (aPoint.Y > bPoint.Y) // <= { return -1; } else if (aPoint.Y > bPoint.Y) // <= { result = 1; } else if (aPoint.X < bPoint.X) { result = -1; } else if (aPoint.X > bPoint.X) { result = 1; } .... }
PVS-Studio warning: V3003 The use of 'if (A) {...} else if (A) {...}' pattern was detected. There is a probability of logical error presence. Check lines: 418, 422. txtserializerwriter.cs 418
The code of PascalABC.NET project (C#):
public void CompareInternal(....) { .... else if (left is int64_const) CompareInternal(left as int64_const, right as int64_const); .... else if (left is int64_const) CompareInternal(left as int64_const, right as int64_const); .... }
PVS-Studio warning: V3003 The use of 'if (A) {...} else if (A) {...}' pattern was detected. There is a probability of logical error presence. Check lines: 597, 631. ParserTools SyntaxTreeComparer.cs 597
The code of SharpDevelop project (C#):
public int Compare(SharpTreeNode x, SharpTreeNode y) { .... if (typeNameComparison == 0) { if (x.Text.ToString().Length < y.Text.ToString().Length) return -1; if (x.Text.ToString().Length < y.Text.ToString().Length) return 1; } .... }
PVS-Studio warning: V3021 There are two 'if' statements with identical conditional expressions. The first 'if' statement contains method return. This means that the second 'if' statement is senseless NamespaceTreeNode.cs 87
The code of Coin3D (C++):
int SbProfilingData::operator == (const SbProfilingData & rhs) const { if (this->actionType != rhs.actionType) return FALSE; if (this->actionStartTime != rhs.actionStopTime) return FALSE; if (this->actionStartTime != rhs.actionStopTime) return FALSE; .... }
PVS-Studio warning: V649 There are two 'if' statements with identical conditional expressions. The first 'if' statement contains function return. This means that the second 'if' statement is senseless. Check lines: 1205, 1206. sbprofilingdata.cpp 1206
The code of Spring (C++):
bool operator < (const aiFloatKey& o) const {return mTime < o.mTime;} bool operator > (const aiFloatKey& o) const {return mTime < o.mTime;}
PVS-Studio warning: V524 It is odd that the body of '>' function is fully equivalent to the body of '<' function. assimp 3dshelper.h 470
And here is the last, particularly interesting code fragment that PVS-Studio analyzer found in MySQL project (C++).
static int rr_cmp(uchar *a,uchar *b) { if (a[0] != b[0]) return (int) a[0] - (int) b[0]; if (a[1] != b[1]) return (int) a[1] - (int) b[1]; if (a[2] != b[2]) return (int) a[2] - (int) b[2]; if (a[3] != b[3]) return (int) a[3] - (int) b[3]; if (a[4] != b[4]) return (int) a[4] - (int) b[4]; if (a[5] != b[5]) return (int) a[1] - (int) b[5]; // <= if (a[6] != b[6]) return (int) a[6] - (int) b[6]; return (int) a[7] - (int) b[7]; }
PVS-Studio warning: V525 The code containing the collection of similar blocks. Check items '0', '1', '2', '3', '4', '1', '6' in lines 680, 682, 684, 689, 691, 693, 695. sql records.cc 680
Most likely, a programmer wrote the first comparison, then the second and got bored. So he copied to the buffer a text block:
if (a[1] != b[1]) return (int) a[1] - (int) b[1];
A pasted it to the text of the program as many times as he needed. Then he changed indexes, but made a mistake in one place and got an incorrect comparison:
if (a[5] != b[5]) return (int) a[1] - (int) b[5];
Note. I discuss this error in more detail in my mini-book "The Ultimate Question of Programming, Refactoring, and Everything" (see a chapter "Don't do the compiler's job").
Pattern: Equals Method Incorrectly Processes a Null Reference
In C# the accepted practice is to implement the Equals methods in such a way, so that they correctly process a situation, if a null reference is passed as an argument. Unfortunately, not all the methods are implemented according to this rule.
The code of GitExtensions (C#):
public override bool Equals(object obj) { return GetHashCode() == obj.GetHashCode(); // <= }
PVS-Studio warning: V3115 Passing 'null' to 'Equals(object obj)' method should not result in 'NullReferenceException'. Git.hub Organization.cs 14
The code of PascalABC.NET project (C#):
public override bool Equals(object obj) { var rhs = obj as ServiceReferenceMapFile; return FileName == rhs.FileName; }
PVS-Studio warning: V3115 Passing 'null' to 'Equals' method should not result in 'NullReferenceException'. ICSharpCode.SharpDevelop ServiceReferenceMapFile.cs 31
Miscellaneous Errors
The code of G3D Content Pak project (C++):
bool Matrix4::operator==(const Matrix4& other) const { if (memcmp(this, &other, sizeof(Matrix4) == 0)) { return true; } ... }
PVS-Studio warning: V575 The 'memcmp' function processes '0' elements. Inspect the 'third' argument. graphics3D matrix4.cpp 269
One closing bracket is put incorrectly. As a result, the amount of bites compared is evaluated by the statement sizeof(Matrix4) == 0. The size of any class is more than 0, which means that the result of the expression is 0. Thus, 0 bites get compared.
Correct variant:
if (memcmp(this, &other, sizeof(Matrix4)) == 0) {
The code of Wolfenstein 3D project (C++):
inline int operator!=( quat_t a, quat_t b ) { return ( ( a.x != b.x ) || ( a.y != b.y ) || ( a.z != b.z ) && ( a.w != b.w ) ); }
PVS-Studio warning: V648 Priority of the '&&' operation is higher than that of the '||' operation. math_quaternion.h 167
Apparently, in one fragment the && operator was accidentally written instead of ||.
The code of FlightGear project (C):
static int tokMatch(struct Token* a, struct Token* b) { int i, l = a->strlen; if(!a || !b) return 0; .... }
PVS-Studio warning: V595 The 'a' pointer was utilized before it was verified against nullptr. Check lines: 478, 479. codegen.c 478
If we pass NULL as the first argument to the function, we'll get null pointer dereference, although the programmer wanted the function to return 0.
The code of WinMerge project (C++):
int TimeSizeCompare::CompareFiles(int compMethod, const DIFFITEM &di) { UINT code = DIFFCODE::SAME; ... if (di.left.size != di.right.size) { code &= ~DIFFCODE::SAME; code = DIFFCODE::DIFF; } ... }
PVS-Studio warning: V519 The 'code' variable is assigned values twice successively. Perhaps this is a mistake. Check lines: 79, 80. Merge timesizecompare.cpp 80
The code of ReactOS project (C++):
#define IsEqualGUID(rguid1, rguid2) \ (!memcmp(&(rguid1), &(rguid2), sizeof(GUID))) static int ctl2_find_guid(....) { MSFT_GuidEntry *guidentry; ... if (IsEqualGUID(guidentry, guid)) return offset; ... }
PVS-Studio warning: V512 A call of the 'memcmp' function will lead to underflow of the buffer 'guidentry'. oleaut32 typelib2.c 320
A pointer is written here as the first argument. As a result, the address of the pointer gets evaluated, which has no sense.
Correct variant:
if (IsEqualGUID(*guidentry, guid)) return offset;
The code of IronPython and IronRuby project (C#):
public static bool Equals(float x, float y) { if (x == y) { return !Single.IsNaN(x); } return x == y; }
PVS-Studio warning: V3024 An odd precise comparison: x == y. Consider using a comparison with defined precision: Math.Abs(A - B) < Epsilon. FloatOps.cs 1048
It's not clear what is the point of a special check against NaN here. If the condition (x == y) is true, it means that both x and y and different from NaN, because NaN isn't equal to any other value, including itself. It seems that the check against NaN is just not necessary, and the code can be shortened to:
public static bool Equals(float x, float y) { return x == y; }
The code of Mono project (C#):
public bool Equals (CounterSample other) { return rawValue == other.rawValue && baseValue == other.counterFrequency && // <= counterFrequency == other.counterFrequency && // <= systemFrequency == other.systemFrequency && timeStamp == other.timeStamp && timeStamp100nSec == other.timeStamp100nSec && counterTimeStamp == other.counterTimeStamp && counterType == other.counterType; }
PVS-Studio warning: V3112 An abnormality within similar comparisons. It is possible that a typo is present inside the expression 'baseValue == other.counterFrequency'. System-net_4_x CounterSample.cs 139
How Do these Programs Work at all?
Looking through all the errors, it seems miraculous that all these programs generally work. Indeed, the comparison functions do a very important and responsible task in program.
There are several explanations of why these programs work despite these errors:
- In a lot of functions, only a part of the object is compared incorrectly. The partial comparison is enough for most of the tasks in this program.
- There are no situations (yet) when the function works incorrectly. For example, this applies to the functions that aren't protected from null pointers or those, where the result of the memcmp function call is placed into the variable of char type. The program is simply lucky.
- The reviewed comparison function is used very rarely or not used at all.
- Who said that the program is working? A lot of programs really do something wrong!
Recommendations
I demonstrated how many errors can be found in the comparison functions. It follows that the efficiency of these functions should be checked with unit-tests by all means.
It is really necessary to write unit-tests for the comparison operators, for Equals functions and so on.
I am quite sure that there was such an understanding among programmers before reading this article, that unit tests for such functions is extra work and they won't detect any errors anyway: the comparison functions are just so simple at the first glance... Well, now I showed the horror that can hide in them.
Code reviews and using static analysis tools would also be a great help.
Conclusion
In this article we mentioned a large amount of big-name projects that are developed by highly qualified experts. These projects are thoroughly tested using different methodologies. Still, it didn't stop PVS-Studio from finding errors in them. This shows that PVS-Studio can become a nice complement to other methodologies used to improve the quality and reliability of the code.
Visit our site and try PVS-Studio yourself.
by Andrey Karpov
No comments:
Post a Comment