Tuesday, March 28, 2017

An interesting bug in Entity Framework

Recently, we started a new hobby that is also a way to spread the word about our static code analyzer PVS-Studio. We check open-source projects and release patches with fixes. Today I would like to speak about one interesting bug that I found in Entity Framework project. I have already sent a patch to fix this error, you may find it here. But enough talking. The analyzer issued 2 warnings for one string:
  • V3014 It is likely that a wrong variable is being incremented inside the 'for' operator. Consider reviewing 'i'. EFCore ExpressionEqualityComparer.cs 214
  • V3015 It is likely that a wrong variable is being compared inside the 'for' operator. Consider reviewing 'i' EFCore ExpressionEqualityComparer.cs 214
This is actually not a rare case when the analyzer issues 2 or even 3 warnings for one line. The thing is that the incorrect code can be anomalous from several points of view at the same time.

Let us consider the code:
var memberInitExpression = (MemberInitExpression)obj;
....
for (var i = 0; i < memberInitExpression.Bindings.Count; i++)
{
  var memberBinding = memberInitExpression.Bindings[i];
  .... 
  switch (memberBinding.BindingType)
  {
    case ....
    case MemberBindingType.ListBinding:
      var memberListBinding = (MemberListBinding)memberBinding;
      for(var j=0; i < memberListBinding.Initializers.Count; i++)
      {
        hashCode += (hashCode * 397) ^
          GetHashCode(memberListBinding.Initializers[j].Arguments);
      }
      break;
    ....
   }
}
What's going on here? As we can see, we have 2 loops. In the first we see a counter to iterate the list memberInitExpression.Bindings, in the second - a counter to iterate the list memberListBinding.Initializers. But for some reason the second loop uses the counter from the first loop. It seemed very suspicious to me, so I decided to write a small unit test to check if it really is an error or just a tricky algorithm of the program.
The code of the unit test:
[ConditionalFact]
public void Compare_member_init_expressions_by_hash_code()
{
    MethodInfo addMethod = typeof(List<string>).GetMethod("Add");

    MemberListBinding bindingMessages = Expression.ListBind(
        typeof(Node).GetProperty("Messages"),
        Expression.ElementInit(addMethod, Expression.Constant(
          "Greeting from PVS-Studio developers!"))
    );

    MemberListBinding bindingDescriptions = Expression.ListBind(
        typeof(Node).GetProperty("Descriptions"),
        Expression.ElementInit(addMethod, Expression.Constant(
          "PVS-Studio is a static code analyzer for C, C++ and C#."))
    );

    Expression query1 = Expression.MemberInit(
        Expression.New(typeof(Node)),
        new List<MemberBinding>() {
          bindingMessages                    // One member
        }
    );

    Expression query2 = Expression.MemberInit(
        Expression.New(typeof(Node)),
        new List<MemberBinding>() {
          bindingMessages,                  // Two members
          bindingDescriptions
        }
    );

    var comparer = new ExpressionEqualityComparer();
    var key1Hash = comparer.GetHashCode(query1);
    var key2Hash = comparer.GetHashCode(query2);

    // The hash codes for both expressions 
    // were the same before my edit
    Assert.NotEqual(key1Hash, key2Hash);      // <=
}
My expectations were confirmed. It is a real error. The thing is that when comparing 2 expressions, only 2 first elements of the collections have always been compared, which led to incorrect results for different expressions with identical first elements. Taking into account that Entity Framework works with expressions very closely, and its main aim is transforming lambdas and Linq requests to the SQL requests, I think it should not be hard to guess what results could have such a serious bug.
According to Common Weakness Enumeration, the bug found can be classified as CWE-670 (Always-Incorrect Control Flow Implementation). It is not clear if this code weakness can be exploited as a vulnerability, but the bug is quite serious. This is a good demonstration that the PVS-Studio analyzer can be used to search for potential vulnerabilities. In fact, it has always been able to do this, we just haven't focused on this aspect of our analyzer. More details on this topic can be found in the article "PVS-Studio: searching security defects".

No comments:

Post a Comment