Monday, August 29, 2016

How to not shoot yourself in the foot when working with serialization

Despite the fact that it's quite easy and comfortable to use the serialization mechanism in C#, there are some points that are worth taking note of. This article is about the ways in which you might shoot yourself in the foot working with serialization, code examples, where the main pitfalls are, and also about the way PVS-Studio can help you avoid getting into trouble.
Picture 1

Who is this article for?

This article will be especially useful to those who are only starting to familiarize themselves with the serialization mechanism. More experienced programmers may also learn something interesting, or just be reassured that even professionals make mistakes.
However, it is assumed that the reader is already somewhat familiar with the serialization mechanism.
But what has PVS-Studio got to do with it? In the 6.05 release we have added 6 diagnostic rules which detect suspicious code, using the serialization mechanism. These diagnostics look mainly for problem areas related to the [Serializable] attribute, or the implementation of the ISerializable interface.
Note.
We should understand that the statements described in the article are relevant for some serializers, for example - BinaryFormatter and SoapFormatter; for others, which are manually written serializers, the behavior can be different. For example, the absence of the attribute [Serializable] for the class may not prevent serialization and deserialize it with a custom serializer.
By the way, if you're working with serialization, I advise you to download the trial version of the analyzer, and check your code to see suspicious fragments.

Implementing ISerializable, don't forget about the serialization constructor

The implementation of the type of the ISerializable interface helps control the serialization, choosing which members need to be serialized, which of them - not, which values should be written during the serialization of the members, and so on.
Picture 2
ISerializable interface contains a declaration of one method - GetObjectData, that will be called upon the object serialization. But together with this method, we should always have a constructor implemented that will be called when the object is deserialized. As the interface cannot oblige you to implement a constructor in the class, this task goes to the programmer who is doing the serialization of the serializable type. The serialization constructor has the following signature:
Ctor(SerializationInfo, StreamingContext)
Without this constructor, the serialization of the object will be successful, (assuming that theGetObjectData method is implemented correctly), but it will be impossible to restore (deserialize) it - we'll have the exception SerializationException thrown.
Let's look at an example of such code from a Glimpse project:
[Serializable]
internal class SerializableTestObject : ISerializable
{
  public string TestProperty { get; set; }

  public void GetObjectData(SerializationInfo info, 
                            StreamingContext context)
  {
    info.AddValue("TestProperty", this.TestProperty);
  }
}
PVS-Studio warning: V3094 Possible exception when deserializing. The SerializableTestObject(SerializationInfo, StreamingContext) constructor is missing. Glimpse.Test.AspNet SessionModelConverterShould.cs 111
The serialization of the item of this class will be successful, but during the deserialization we'll have an exception, because there is no appropriate constructor. This is most likely not an error (judging by the class and file name), but as an illustration of the situation, it works well.
The serialization constructor for this class might look like this:
protected SerializableTestObject(SerializationInfo info, 
                                 StreamingContext context)
{
  TestProperty = info.GetString(nameof(TestProperty));
}

Pay attention to the access modifier of the serialization constructor

When writing a type that implements ISerializable interface it's very important to define the access modifier for the serialization constructor. There are several possible ways:
  • the serialization constructor is declared with the private modifier in an unsealed class;
  • the serialization constructor is declared with an access modifier public or internal;
  • the serialization constructor is declared with the protected modifier in a sealed class.
The first variant is of the greatest interest to us, as it can be the most dangerous one. Let's briefly look at the second point, the third one isn't that useful - the compiler won't declare the member with theprotected modifier in the structure (compilation error), if this class is declared in the sealed class, the compiler will issue a warning.

The serialization constructor in an unsealed class has an access modifier 'private'.

This is the most dangerous type of situation, where the access modifiers are applied incorrectly to the serialization constructors. If the type is unsealed, it is implied that it may have descendants. However, if the serialization constructor has a private access modifier, it cannot be called from a child class.
In this case, the developer of the child class has 2 options - either not use the parent class at all, or deserialize the members of a base class manually. It is worth noting that the second case can hardly be considered a solution to the problem:
  • there is no certainty that a trivial member deserialization is provided in the base class;
  • the developer of the child class may forget to deserialize a base class member;
  • In spite of wanting to do so, it will be impossible to deserialize private members of the base class.
Therefore, when writing an unsealed serializable class, pay attention to the access modifier that has the serialization constructor.
During the analysis we found several projects where this rule wasn't complied with.
NHibernate
[Serializable]
public class ConnectionManager : ISerializable, 
                                 IDeserializationCallback
{
  ....
  private ConnectionManager(SerializationInfo info, 
                            StreamingContext context)
  {
    ....
  }
  ....
}
PVS-Studio warning: V3103 A private Ctor(SerializationInfo, StreamingContext) constructor in unsealed type will not be accessible when deserializing derived types. NHibernate ConnectionManager.cs 276
Roslyn
[Serializable]
private class TestDiagnostic : Diagnostic, ISerializable
{
  ....
  private TestDiagnostic (SerializationInfo info, 
                          StreamingContext context)
  {
    ....
  }
  ....
}
PVS-Studio warning: V3103 A private TestDiagnostic(SerializationInfo, StreamingContext) constructor in unsealed type will not be accessible when deserializing derived types. DiagnosticAnalyzerTests.cs 100
In both examples, given above, the developer should have set the access modifier protected for the serialization constructor, so that the child classes could call it during the deserialization.

Do not declare the serialization constructor with modifiers 'public' or 'internal'

This is a "good coding style" tip. The declaration of the serialization constructor with the modifier publicor internal won't lead to an error, but there is no point in doing this - this constructor is not intended to be used externally, and there is no difference for the serializer, which access modifier has the constructor.
When checking open source projects we saw several cases where this rule wasn't taken into account.
MSBuild
[Serializable]
private sealed class FileState : ISerializable
{
  ....
  internal SystemState(SerializationInfo info, 
                       StreamingContext context)
  {
    ....
  }
  ....
}
PVS-Studio warning: V3103 The Ctor(SerializationInfo, StreamingContext) constructor should be used for deserialization. Making it internal is not recommended. Consider making it private. Microsoft.Build.Tasks SystemState.cs 218
[Serializable]
private sealed class FileState : ISerializable
{
  ....
  internal FileState(SerializationInfo info, StreamingContext context)
  {
    ....
  }
  ....
}
PVS-Studio warning: V3103 The Ctor(SerializationInfo, StreamingContext) constructor should be used for deserialization. Making it internal is not recommended. Consider making it private. Microsoft.Build.Tasks SystemState.cs 139
In both cases, the access modifier private should have been set for the serialization constructor, because both classes are sealed.
NHibernate
[Serializable]
public class StatefulPersistenceContext : IPersistenceContext,   
                                          ISerializable, 
                                          IDeserializationCallback
{
  ....
  internal StatefulPersistenceContext(SerializationInfo info, 
                                      StreamingContext context)
  {
    ....
  }
  ....
}
PVS-Studio warning: V3103 The Ctor(SerializationInfo, StreamingContext) constructor should be used for deserialization. Making it internal is not recommended. Consider making it protected. NHibernate StatefulPersistenceContext.cs 1478
[Serializable]
public class Configuration : ISerializable
{
  ....
  public Configuration(SerializationInfo info, 
                       StreamingContext context)
  {
   ....
  }
  ....
}
PVS-Studio warning: V3103 The Ctor(SerializationInfo, StreamingContext) constructor should be used for deserialization. Making it public is not recommended. Consider making it protected. NHibernate Configuration.cs 84
Considering the fact that both classes are unsealed, we should have set protected as an access modifier for the serialization constructors.

Implement GetObjectData virtual method in unsealed classes

The rule is simple - when you are writing an unsealed class, implementing the ISerializable interface, declare the method GetObjectData with the virtual modifier. This will allow the child classes perform correct serialization of the object when using polymorphism.
Picture 3
To see the situation more clearly, I suggest having a look at several examples.
Suppose we have the following declarations of the parent and child classes.
[Serializable]
class Base : ISerializable
{
  ....
  public void GetObjectData(SerializationInfo info, 
                            StreamingContext context)
  {
    ....
  }
}

[Serializable]
sealed class Derived : Base
{
  ....
  public new void GetObjectData(SerializationInfo info, 
                                StreamingContext context)
  {
    ....
  }
}
Suppose we have a method of serialization and deserialization of an object:
void Foo(BinaryFormatter bf, MemoryStream ms)
{
  Base obj = new Derived();
  bf.Serialize(ms, obj);
  ms.Seek(0, SeekOrigin.Begin);
  Derived derObj = (Derived)bf.Deserialize(ms);
}
In this case, the serialization will be done incorrectly because the GetObjectData method will be called not for the parent, but for the child class. Consequently, members of the child class will not be serialized. If during the deserialization from the object of SerializationInfo we get member values, added in the method GetObjectData of the child class, we'll have an exception thrown, as the object ofSerializationInfo type won't contain required keys.
To correct an error in the parent class to the GetObjectData method, we should add the virtual modifier, in a derived class - override.
But, if in the parent class there is only explicit implementation of ISerializable interface, you won't be able to add a virtual modifier. However, leaving everything as it is, you run the risk of complicating the lives of developers of the child classes.
Let's look at an example of implementation of the parent and child classes:
[Serializable]
class Base : ISerializable
{
  ....
  void ISerializable.GetObjectData(SerializationInfo info, 
                                   StreamingContext context)
  {
    ....
  }
}

[Serializable]
sealed class Derived : Base, ISerializable
{
  ....
  public void GetObjectData(SerializationInfo info, 
                            StreamingContext context)
  {
    ....
  }
}
In this case, we won't be able to access the GetObjectData method of the parent class from the child class. Also, if we have private members serialized in the base method, it won't be possible to access them from a child class, which means that we won't be able to have correct serialization as well. To fix this error, we should add implicit implementation to a base class of a virtual method GetObjectData, besides the explicit implementation. Then the corrected code might look like this:
[Serializable]
class Base : ISerializable
{
  ....
  void ISerializable.GetObjectData(SerializationInfo info, 
                                    StreamingContext context)
  {
    GetObjectData(info, context);
  }

  public virtual void GetObjectData(SerializationInfo info, 
                                    StreamingContext context)
  {
    ....
  }
}

[Serializable]
sealed class Derived : Base
{
  ....
  public override void GetObjectData(SerializationInfo info, 
                                     StreamingContext context)
  {
    ....
    base.GetObjectData(info, context);
  }
}
Or, if we don't mean to do the inheritance of this class, we should make it sealed, by adding a sealedmodifier to the class declaration.
Roslyn
[Serializable]
private class TestDiagnostic : Diagnostic, ISerializable
{
  private readonly string _kind;
  ....
  private readonly string _message;
  ....
  void ISerializable.GetObjectData(SerializationInfo info,  
                                   StreamingContext context)
  {
    info.AddValue("id", _descriptor.Id);
    info.AddValue("kind", _kind);
    info.AddValue("message", _message);
    info.AddValue("location", _location, typeof(Location));
    info.AddValue("severity", _severity, typeof(DiagnosticSeverity));
    info.AddValue("defaultSeverity", _descriptor.DefaultSeverity,
                   typeof(DiagnosticSeverity));
    info.AddValue("arguments", _arguments, typeof(object[]));
  }
  ....
}
PVS-Studio warning: V3104 'GetObjectData' implementation in unsealed type 'TestDiagnostic' is not virtual, incorrect serialization of derived type is possible. CSharpCompilerSemanticTest DiagnosticAnalyzerTests.cs 112
TestDiagnostic is unsealed (although it is private, so there can be inheritance from it in the frames of the same class), but with that, it has only explicit implementation of ISerializable interfacein which we have the private members serialized. This means the following: the developer of the child class won't be able to serialize the necessary members: the method GetObjectData is not available, and the access modifier won't allow access to the members directly.
It would be better to move the whole serialization code, given above, to the virtual methodGetObjectData, and to use it from the explicit interface implementation.
void ISerializable.GetObjectData(SerializationInfo info, 
                                 StreamingContext context)
{
  GetObjectData(info, context);
}

public virtual void GetObjectData(SerializationInfo info,
                                  StreamingContext context)
{
  info.AddValue("id", _descriptor.Id);
  info.AddValue("kind", _kind);
  info.AddValue("message", _message);
  info.AddValue("location", _location, typeof(Location));
  info.AddValue("severity", _severity, typeof(DiagnosticSeverity));
  info.AddValue("defaultSeverity", _descriptor.DefaultSeverity,
                typeof(DiagnosticSeverity));
  info.AddValue("arguments", _arguments, typeof(object[]));
}

All the serializable members must have a serializable type

This condition is mandatory for the proper serialization of an object, regardless of whether it is an automatic serialization (when the type is annotated with the [Serializable] attribute, and when it does not implement the ISerializable interface), or the serialization is performed manually (ISerializableimplemented).
Otherwise, if during the serialization we have a member that is not annotated with the [Serializable]attribute, we'll have the exception thrown of the SerializationException type.
If you want to serialize an object without the members having a non-serializable type, there are several possible variants:
  • make a non-serializable type serializable;
  • if there is automatic serialization, annotate the fields that aren't meant to be serialized with an attribute [NonSerialized];
  • if you do manual serialization, just ignore those members, that you don't need.
Pay attention to the fact that the [NonSerialized] attribute can only be applied to fields. Thus, you will not be able to prevent the serialization of a property, but, if it has a non-serializable type - you will get an exception. For example, when trying to serialize SerializedClass, the definition is given below:
sealed class NonSerializedType { }

[Serializable]
sealed class SerializedClass
{
  private Int32 value;
  public NonSerializedType NSProp { get; set; }
}
We work around this situation by implementing a property through a field, annotated by an attribute[NonSerialized]:
[Serializable]
sealed class SerializedClass
{
  private Int32 value;

  [NonSerialized]
  private NonSerializedType nsField;

  public NonSerializedType NSProp
  {
    get { return nsField; }
    set { nsField = value; }
  }
}
The diagnostic rule V3097 of the PVS-Studio static analyzer is able to detect errors such as the serializable type having members of non-serializable types, not annotated by the [NonSerialized]attribute.
But again, I should mention that this warning doesn't always detect a real error - everything will depend on the serializer that is being used.
Lets' have a look at the code fragments where this condition was violated.
Subtext
public class BlogUrlHelper
{
  ....
}

[Serializable]
public class AkismetSpamService : ICommentSpamService
{
  ....
  readonly BlogUrlHelper _urlHelper;
  ....
}
PVS-Studio warning: V3097 Possible exception: the 'AkismetSpamService' type marked by [Serializable] contains non-serializable members not marked by [NonSerialized]. Subtext.Framework AkismetSpamService.cs 31
The type BlogUrlHelper of the filed _urlHelper isn't serializable, so if you try to serialize the instance ofAkismetSpamService with some serializers, we'll get the exception of SerializationException type thrown. We should solve the problem based on the situation. If you use serializers of BinaryFormatter orSoapFormatter type - it's necessary to annotate the field with the attribute [NonSerialized] or annotate the BlogUrlHepler type with the [Serializable] attributeIf you use other serializers which do not require the [Serializable] attribute in the serializable fields, then it's much simpler.
NHibernate
public class Organisation
{
 ....
}

[Serializable]
public class ResponsibleLegalPerson  
{
  ....
  private Organisation organisation;
  ....
}
PVS-Studio warning: V3097 Possible exception: the 'ResponsibleLegalPerson' type marked by [Serializable] contains non-serializable members not marked by [NonSerialized]. NHibernate.Test ResponsibleLegalPerson.cs 9
The situation is the same as above - it's all or nothing. It all depends on the serializer.

Don't forget the [Serializable] attribute when implementing ISerializable interface

This advice applies to those who are just starting to work with serialization. Controlling the serialization manually, by implementing the ISerializable interface, it is easy to forget to annotate the type with[Serializable], which may potentially lead to the exception of SerializationException type. Serializers likeBinaryFormatter require such an attribute.
Picture 4
SharpDevelop
An interesting example of this error in the SharpDevelop project.
public class SearchPatternException : Exception, ISerializable
{
  ....
  protected SearchPatternException(SerializationInfo info, 
                                   StreamingContext context) 
    : base(info, context)
  {
  }
}
PVS-Studio warning: V3096 Possible exception when serializing 'SearchPatternException' type. [Serializable] attribute is missing. ICSharpCode.AvalonEdit ISearchStrategy.cs 80
public class DecompilerException : Exception, ISerializable
{
  ....
  protected DecompilerException(SerializationInfo info, 
                                StreamingContext context) 
    : base(info, context)
  {
  }
}
PVS-Studio warning: V3096 Possible exception when serializing 'DecompilerException' type. [Serializable] attribute is missing. ICSharpCode.Decompiler DecompilerException.cs 28
To pass the exception object between the application domains, we have its serialization and deserialization. Accordingly, the types of exception should be serializable. In the examples given above the types SearchPatternException, and DecompilerException, are inherited from Exception and implement serialization constructors, but at the same time are not annotated by the [Serializable]attribute, which means that when attempting to serialize objects of these types (for example, to transfer between the domains), we'll have an exception of SerializationException type generated. Thus, for example, by throwing an exception in another application domain, you will catch not the thrown exception, but SerializationException.

Make sure that in the GetObjectData, all the required type members get serialized

By implementing the ISerializable interface, and defining the GetObjectData method, you are taking responsibility for the members of the type that will be serialized, and the values that will be written there. In this case, the developers are offered a great scope in managing serialization: as the serializable value, associated with the member (to be honest - with any string), you can write the actual value of the serialized object, the result of work of some method, constant, or literal value - anything you want.
However, in this case great responsibility falls on the shoulders of the developer, because he should remember all the members that are meant to serialized, even if they are in the base class. We are all just human beings, so sometimes some members remain forgotten.
There is a special rule V3099 in PVS-Studio analyzer to detect such situations. I suggest looking at some code examples that were detected by this rule.
SharpDevelop
[Serializable]
public abstract class XshdElement
{
  public int LineNumber { get; set; }
  
  public int ColumnNumber { get; set; }
  
  public abstract object AcceptVisitor(IXshdVisitor visitor);
}

[Serializable]
public class XshdColor : XshdElement, ISerializable
{
  ....
  public virtual void GetObjectData(SerializationInfo info,        
                                    StreamingContext context)
  {
    if (info == null)
      throw new ArgumentNullException("info");
    info.AddValue("Name", this.Name);
    info.AddValue("Foreground", this.Foreground);
    info.AddValue("Background", this.Background);
    info.AddValue("HasUnderline", this.Underline.HasValue);
    if (this.Underline.HasValue)
      info.AddValue("Underline", this.Underline.Value);
    info.AddValue("HasWeight", this.FontWeight.HasValue);
    if (this.FontWeight.HasValue)
      info.AddValue("Weight", this.FontWeight
                                  .Value
                                  .ToOpenTypeWeight());
    info.AddValue("HasStyle", this.FontStyle.HasValue);
    if (this.FontStyle.HasValue)
      info.AddValue("Style", this.FontStyle.Value.ToString());
    info.AddValue("ExampleText", this.ExampleText);
  }
}
PVS-Studio warning: V3099 Not all the members of 'XshdColor' type are serialized inside 'GetObjectData' method: LineNumber, ColumnNumber. ICSharpCode.AvalonEdit XshdColor.cs 101
In this code there are no problems described above, like incorrect access modifiers in the serialization constructor, or missing [Serializable] attribute, or virtual modifier for the GetObjectData method.
Alas, there is still an error here. In the GetObjectData method, the properties of the base class aren't taken into account, which means that some data will be lost during the serialization. As a result, during the deserialization, an object will be restored with a different state.
In this case, the solution is to manually add the necessary values, as follows, for example:
info.AddValue(nameof(LineNumber), LineNumber);
info.AddValue(nameof(ColumnNumber), ColumnNumber);
If the base class had also implemented the ISerializable interface, the solution would have been more elegant - the call in the derived method GetObjectData of the base one.
NHibernate
[Serializable]
public sealed class SessionImpl : AbstractSessionImpl, 
                                  IEventSource, 
                                  ISerializable, 
                                  IDeserializationCallback
{
  ....
  void ISerializable.GetObjectData(SerializationInfo info, 
                                   StreamingContext context)
  {
    log.Debug("writting session to serializer");

    if (!connectionManager.IsReadyForSerialization)
    {
      throw new InvalidOperationException("Cannot serialize a Session 
                                           while connected");
    }

    info.AddValue("factory", Factory, typeof(SessionFactoryImpl));
    info.AddValue("persistenceContext", persistenceContext, 
                   typeof(StatefulPersistenceContext));
    info.AddValue("actionQueue", actionQueue, typeof(ActionQueue));
    info.AddValue("timestamp", timestamp);
    info.AddValue("flushMode", flushMode);
    info.AddValue("cacheMode", cacheMode);

    info.AddValue("interceptor", interceptor, typeof(IInterceptor));

    info.AddValue("enabledFilters", enabledFilters, 
                   typeof(IDictionary<string, IFilter>));
    info.AddValue("enabledFilterNames", enabledFilterNames, 
                   typeof(List<string>));

    info.AddValue("connectionManager", connectionManager, 
                   typeof(ConnectionManager));
  }
  .... 
  private string fetchProfile;
  ....
}
PVS-Studio warning: V3099 Not all the members of 'SessionImpl' type are serialized inside 'GetObjectData' method: fetchProfile. NHibernate SessionImpl.cs 141
This time the field of the current class (fetchProfile) has been forgotten to be serialized. As you can see in the declaration, it is not annotated by the [NonSerialized] attribute (in contrast to the other fields, that are not serializable in the GetObjectData method).
There were two more similar fragments in the project:
  • V3099 Not all the members of 'Configuration' type are serialized inside 'GetObjectData' method: currentDocumentName, preMappingBuildProcessed. NHibernate Configuration.cs 127
  • V3099 Not all the members of 'ConnectionManager' type are serialized inside 'GetObjectData' method: flushingFromDtcTransaction. NHibernate ConnectionManager.cs 290
There is quite an interesting thing about errors of this kind - they either lead to throwing of an exception, or to logical errors that are really hard to detect.
The exception will be thrown in the case where in the serialization constructor, the programmer attempts to get the value of the field that has just been added (and access by the missing key). If the member was forgotten altogether (both in the GetObjectData and in the serialization constructor), then the state of the object will get damaged.

Summary

Briefly summarizing all the information, we can formulate several tips and rules:
  • Annotate the types, implementing the ISerializable interface with the [Serializable] attribute.
  • Make sure that all members annotated by the [Serializable] attribute get correctly serialized;
  • Implementing the ISerializable interface, don't forget to implement the serialization constructor (Ctor(SerializationInfo, StreamingContext));
  • In the sealed types, set the access modifier private for a serialization constructor, in the unsealed -protected;
  • In the unsealed types implementing the ISerializable interface, make the GetObjectData method virtual;
  • Check that in the GetObjectData all the necessary members get serialized, including members of the base class if there are such.
Picture 5

Conclusion

I hope you learned something new from this article, and have become a expert in the sphere of serialization. Sticking to the rules and following the tips that we have given above, you will save time debugging the program, and make life easier for yourself, and other developers working with your classes. PVS-Studio analyzer will also be of great help, allowing you to detect such errors right after they appear in your code.

Additional information

  • V3094. Possible exception when deserializing type. The Ctor(SerializationInfo, StreamingContext) constructor is missing
  • V3096. Possible exception when serializing type. [Serializable] attribute is missing
  • V3097. Possible exception: type marked by [Serializable] contains non-serializable members not marked by [NonSerialized]
  • V3099. Not all the members of type are serialized inside 'GetObjectData' method
  • V3103. A private Ctor(SerializationInfo, StreamingContext) constructor in unsealed type will not be accessible when deserializing derived types
  • V3104. 'GetObjectData' implementation in unsealed type is not virtual, incorrect serialization of derived type is possible
  • MSDN. Serialization in the .NET Framework
  • MSDN. Custom Serialization

No comments:

Post a Comment