Sunday, February 12, 2017

Incremental analysis in PVS-Studio: now on the build server

Introduction

When implementing a static analyzer to the existing development process, a team may encounter certain difficulties. For example, it is very useful to check the modified code before it gets into the version control system. However, performing a static analysis in this case may require quite a long time, especially for projects with a large code base. In this article we are going to look at the incremental analysis mode of PVS-Studio analyzer that allows checking only modified files, which significantly reduces the time for analysis. Therefore, developers will be able to use static analysis as often as it is necessary to minimize the risks of erroneous code getting into the version control system. There were several reasons for writing this article. Firstly, it was a wish to tell once more about this useful feature of our analyzer; secondly, the fact that we have completely rewritten the incremental analysis mechanism and added the support of this mode to the command line version of our analyzer.

Scenarios of using a static analyzer

Any approach to improving the quality of software assumes that the defects must be detected as early as possible. Ideally, we should write the code without errors from the very beginning, but this practice works only in one corporation:

Picture 1
Why is it so important to detect and fix bugs as soon as possible? I will not speak about such banal things as reputational risks that will inevitably arise if your users start massively reporting about defects in your software. Let's focus specifically on the economic component of fixing errors in your code. We have no statistics on the average price of errors. The errors may be very different, they may be detected on various stages of the software life cycle, the software may be used in various subject areas, where errors are critical and not. Although, we do not know the average cost of eliminating a defect depending on the software life cycle, in the industry as a whole, we can estimate the dynamics of this cost, using a widely known "1-10-100" rule. As applied to the software development, this rule states that if the cost of the elimination of the defect on the development phase is 1$, then after passing this code to the testers it grows to 10$ and will rise up to 100$ once the defected code is in the production.
Picture 2
There are a lot reasons for such rapid growth of prices for the defect correction, for example:
  • Changes in one fragment of the code can affect a lot of other parts of the application.
  • Completing once more the tasks that were already closed: changes in the design, coding, editing the documentation and so on.
  • The delivery of the fixed version to the users and the necessity to persuade them to update.
Understanding the importance of fixing the bugs on the earliest stages of the software life cycle, we offer our clients to use a two-level scheme of checking the code by a static analyzer. The first level is to test the code on the computer of the developer before the code gets into the version control system. That is, the developer writes some code and immediately inspects it with a static analyzer. For this purpose we have a plugin for Microsoft Visual Studio (supports all versions from 2010 to 2015 included). The plugin allows you to check one or more source code files, projects or the entire solution.
The second level of protection is to run the static analyzer during the overnight project build. This helps to make sure that new errors were not added to the version control system, or to take the necessary actions to correct these errors. In order to minimize the risk of having errors in the later stages of the software life cycle, we suggest using both levels of static analysis execution: locally on the machines of developers and on a centralized server of continuous integration.
This approach cannot be called completely flawless and doesn't guarantee that the errors, detected by a static analyzer won't get into a version control system, or at the worst will be fixed before the build goes to the testing phase. The compulsion to perform static analysis manually before committing the code will probably face strong resistance. Firstly, if the project has a large code base, no one would want to sit and wait until the project is checked. Or, in case the developer decides to save time and check only those files that were modified by him, he will have to keep track of the edited files, which no one will do, of course. If we consider the build server, where we have, besides the overnight builds the build after the detection of modifications in the version control system, then doing the static analysis of the whole codebase during numerous daily builds also is not possible because static analysis will take a lot of time.
The incremental analysis mode, which allows scanning only recently changed files, helps to solve these problems. Let's consider in more detail, what advantages can the incremental analysis mode bring when used on a computers of developers and the build server.

Incremental analysis on the computer of a developer - a barrier against bugs before they get into a version control system

If a development team decides to use static code analysis and it is done only on the build server, during the overnight builds, for example, then sooner or later the developers will start treating this tool as an enemy. No wonder, because all the team members will see what mistakes their colleagues make. We strive to ensure that all stakeholders perceive the static analyzer as a friend and as a useful tool that helps to improve the quality of the code. If you want the errors detected by the analyzer not to get to the version control system and become publicly seen, the static analysis should be performed on the machines of the developers, so that the possible issues are detected as early as possible.
As I have already mentioned, the manual start of static analysis on the whole codebase may require quite a long time. If the developer has to memorize which files he has worked on, it can become quite annoying too.
The incremental analysis allows both to decrease the time for static analysis and to get rid of the necessity to start the analysis manually. The incremental analysis starts automatically after the build of a project or a solution. We consider this trigger to be the most appropriate for the analysis. It's logical to check if the project gets built before committing changes in the version control system. Thus, the incremental analysis mode allows you to get rid of the annoying actions caused by the need to manually perform static analysis on the computer of the developer. Incremental analysis works in the background mode, so the developer can continue working on the code without waiting for the analysis to finish.
You can enable the after build incremental analysis mode in the menu PVS-Studio/Incremental Analysis After Build (Modified Files Only), this option is enabled in PVS-Studio by default:
Picture 3
Once the incremental analysis mode is enabled, PVS-Studio will analyze all the modified files automatically in the background after the project is built. If PVS-Studio detects such modifications, incremental analysis will be launched automatically, and an animated PVS-Studio icon will appear in the notification area:
Picture 4
You may find more details related to the use of incremental analysis in the article PVS-Studio's incremental analysis mode.

Incremental analysis on the continuous integration server - an additional barrier against bugs

The continuous integration presupposes that the project is built on the build server after every commit to the version control system. As a rule, besides the project build, the existing set of unit tests gets executed. In addition to unit-tests, a lot of teams having the practice of continuous integration, use a build server to ensure processes of continuous quality control. For example, besides running unit and integration tests, these processes may include static and dynamic analysis, performance measurement and so on.
One of the important requirements for the tasks performed on the continuous integration server, is that the project build and execution of any additional actions must be very fast, so that the team can quickly respond to the detected problems. Performing static analysis on a large code base after each commit to the version control system contradicts to this requirement, because it may take a very long time. We couldn't put up with such limitations, that's why after reviewing our Visual Studio plugin, where we have had the incremental analysis mode for quite a long time, we asked ourselves: why not implement the same mode in the command line module PVS-Studio_Cmd.exe?
No sooner said than done, and in our module for command line which is designed to integrate static code analyzer into various build systems, there is now an incremental analysis mode. This mode works just like incremental analysis in the plugin.
Thus, with the added support of incremental analysis in PVS-Studio_Cmd.exe it is possible to use our static analyzer in a continuous integration system during numerous daily builds. Due to the fact that only modified files will be checked since the last update of the code from the version control system, static analysis will be done very quickly, and the duration of the project build will practically stay the same.
To activate the incremental analysis mode for the command line module PVS-Studio_Cmd.exe, specify the key --incremental and set one of the following modes:
  • Scan - analyze all dependencies to determine, which files will be analyzed incrementally. There will be no immediate analysis.
  • Analyze - perform incremental analysis. This step should be done after Scan and can be performed both before and after the build of the solution or the project. The static analysis will be performed only for the files that have been modified since the previous build.
  • ScanAndAnalyze - analyze all the dependencies to determine which files should be analyzed incrementally and perform incremental analysis of the edited files with the source code.
To get more detailed information about the incremental analysis mode in the command line module PVS-Studio_Cmd.exe, read the articles PVS-Studio's incremental analysis mode and Analyzing Visual C++ (.vcxproj) and Visual C# (.csproj) projects from the command line
I should also note that the BlameNotifier utility, that is included in the distribution of PVS-Studio, greatly compliments the functionality of incremental analysis. 
This utility interacts with popular VCSs (currently supported systems are Git, Svn and Mercurial) to get information about those developers who committed the erroneous code and send notifications to them.
Thus we recommend the following scenario of using the analyzer on the continuous integration server:
  • to perform incremental analysis for numerous daily builds so that you can control the code quality only for the newly modified files;
  • it is advisable to perform the analysis of the whole code base during the overnight builds to have the full information about defects in the code.

Peculiarities of the incremental analysis implementation in PVS-Studio

As I have already noted, the incremental analysis mode has existed in the PVS-Studio plugin of Visual Studio for a long time. In the plugin the detection of the modified files to be analyzed incrementally, was implemented with the help of COM-wrappers of Visual Studio. Such an approach is absolutely impossible for implementing the incremental analysis feature in the command line version of our analyzer, as it is completely independent of the inner infrastructure of Visual Studio. It's not the best idea to support different implementations having the same functions in different components, that's why we decided right away that in the plugin for Visual Studio and in the command line utility PVS-Studio_Cmd.exe we'll use the common code base.
Theoretically it's not a hard task to detect modified files since the last build of a project. To solve it, we need to get the time of modification of the target binary file and the modification of all files, involved in the building of the target binary file. Those files with the source code that were modified later than the target file, should be added to the list of files for the incremental analysis. However, things are more complicated in the real world. In particular, for projects implemented in C or C++, it is very difficult to identify all of the files involved in the build of the target file, for example, those header files, which were included directly in code, and are absent in the project file. Here I should note that under Windows, both our Visual Studio plugin (which is obvious) and the command line version PVS-Studio_Cmd.exe support only the analysis of MSBuild projects. This fact greatly simplified our task. It is also worth mentioning that in the Linux version of PVS-Studio you can also use incremental analysis - it works there "out of the box": when you use compile monitoring, only the built files will be analyzed. Accordingly, the incremental analysis will start doing the incremental build; the situation will be the same during the direct integration to the build system (for example, to make files).
MSBuild provides a mechanism for tracking accesses to the file system (File Tracking). For the incremental build of projects, implemented in C and C++, the correspondence between the source files (for example, cpp-fies, header files) and the target files are written to the *.tlog-files. For example, for the CL task, the paths to all the source files, read by the compiler, will be written to the file CL.read.{ID}.tlog, and the paths to the target files will be saved in the CL.write.{ID}.tlog file.
So in the CL.*.tlog files we already have all the information about the source files that have been compiled, and the target files. The task gets gradually simpler. However, we still have the task to traverse all the source and target files and compare dates of their modifications. Can we simplify it even more? Of course! In the Microsoft.Build.Utilities namespace we find classes CanonicalTrackedInputFiles and CanonicalTrackedOutputFiles that are responsible for the work with files CL.read.*.tlog and CL.write.*.tlog accordingly. Having created instances of these classes, and using the method CanonicalTrackedInputFiles.ComputeSourcesNeedingCompilation, we get a list of source files for compilation, based on the analysis of the target files and the dependency graph of the source files.
Let's have a look at an example of the code that allows to get a list of files, for which we should perform incremental analysis using this approach. In this example, sourceFiles is a collection of full normalized paths to all the source files of the project, tlogDirectoryPath is a path to the directory, where the *.tlog-files are located.
var sourceFileTaskItems =
    new ITaskItem[sourceFiles.Count];

for (var index = 0; index < sourceFiles.Count; index++)
    sourceFileTaskItems[index] =
        new TaskItem(sourceFiles[index]);

var tLogWriteFiles =
    GetTaskItemsFromPath("CL.write.*", tlogDirectoryPath);
var tLogReadFiles =
    GetTaskItemsFromPath("CL.read.*", tlogDirectoryPath);

var trackedOutputFiles =
    new CanonicalTrackedOutputFiles(tLogWriteFiles);
var trackedInputFiles =
    new CanonicalTrackedInputFiles(tLogReadFiles,
        sourceFileTaskItems, trackedOutputFiles, false, false);

ITaskItem[] sourcesForIncrementalBuild =
    trackedInputFiles.ComputeSourcesNeedingCompilation(true);
Thus, using standard tools of MSBuild, we managed to make the mechanism of identification of files for incremental analysis the same as the inner mechanism of MSBuild for incremental build, which provides a high quality of this approach.

Conclusion


In this article we had to look at the advantages of using incremental analysis on the machines of the developers and on the build server. Also we "looked under the hood" and got to know how to detect files for incremental analysis using MSBuild abilities. For all who are interested, I suggest downloading the trial version of our PVS-Studio analyzer and see what can be detected in your projects. Bugless code to you!

1 comment:

  1. Nice post... I am using static secure code review tools and this tool is beneficial to all developers, managers and architects. Thanks for sharing

    ReplyDelete