Sunday, February 12, 2017

Incremental analysis in PVS-Studio: now on the build server

Introduction

When implementing a static analyzer to the existing development process, a team may encounter certain difficulties. For example, it is very useful to check the modified code before it gets into the version control system. However, performing a static analysis in this case may require quite a long time, especially for projects with a large code base. In this article we are going to look at the incremental analysis mode of PVS-Studio analyzer that allows checking only modified files, which significantly reduces the time for analysis. Therefore, developers will be able to use static analysis as often as it is necessary to minimize the risks of erroneous code getting into the version control system. There were several reasons for writing this article. Firstly, it was a wish to tell once more about this useful feature of our analyzer; secondly, the fact that we have completely rewritten the incremental analysis mechanism and added the support of this mode to the command line version of our analyzer.

Scenarios of using a static analyzer

Any approach to improving the quality of software assumes that the defects must be detected as early as possible. Ideally, we should write the code without errors from the very beginning, but this practice works only in one corporation:

Picture 1
Why is it so important to detect and fix bugs as soon as possible? I will not speak about such banal things as reputational risks that will inevitably arise if your users start massively reporting about defects in your software. Let's focus specifically on the economic component of fixing errors in your code. We have no statistics on the average price of errors. The errors may be very different, they may be detected on various stages of the software life cycle, the software may be used in various subject areas, where errors are critical and not. Although, we do not know the average cost of eliminating a defect depending on the software life cycle, in the industry as a whole, we can estimate the dynamics of this cost, using a widely known "1-10-100" rule. As applied to the software development, this rule states that if the cost of the elimination of the defect on the development phase is 1$, then after passing this code to the testers it grows to 10$ and will rise up to 100$ once the defected code is in the production.
Picture 2
There are a lot reasons for such rapid growth of prices for the defect correction, for example:
  • Changes in one fragment of the code can affect a lot of other parts of the application.
  • Completing once more the tasks that were already closed: changes in the design, coding, editing the documentation and so on.
  • The delivery of the fixed version to the users and the necessity to persuade them to update.
Understanding the importance of fixing the bugs on the earliest stages of the software life cycle, we offer our clients to use a two-level scheme of checking the code by a static analyzer. The first level is to test the code on the computer of the developer before the code gets into the version control system. That is, the developer writes some code and immediately inspects it with a static analyzer. For this purpose we have a plugin for Microsoft Visual Studio (supports all versions from 2010 to 2015 included). The plugin allows you to check one or more source code files, projects or the entire solution.
The second level of protection is to run the static analyzer during the overnight project build. This helps to make sure that new errors were not added to the version control system, or to take the necessary actions to correct these errors. In order to minimize the risk of having errors in the later stages of the software life cycle, we suggest using both levels of static analysis execution: locally on the machines of developers and on a centralized server of continuous integration.
This approach cannot be called completely flawless and doesn't guarantee that the errors, detected by a static analyzer won't get into a version control system, or at the worst will be fixed before the build goes to the testing phase. The compulsion to perform static analysis manually before committing the code will probably face strong resistance. Firstly, if the project has a large code base, no one would want to sit and wait until the project is checked. Or, in case the developer decides to save time and check only those files that were modified by him, he will have to keep track of the edited files, which no one will do, of course. If we consider the build server, where we have, besides the overnight builds the build after the detection of modifications in the version control system, then doing the static analysis of the whole codebase during numerous daily builds also is not possible because static analysis will take a lot of time.
The incremental analysis mode, which allows scanning only recently changed files, helps to solve these problems. Let's consider in more detail, what advantages can the incremental analysis mode bring when used on a computers of developers and the build server.

Incremental analysis on the computer of a developer - a barrier against bugs before they get into a version control system

If a development team decides to use static code analysis and it is done only on the build server, during the overnight builds, for example, then sooner or later the developers will start treating this tool as an enemy. No wonder, because all the team members will see what mistakes their colleagues make. We strive to ensure that all stakeholders perceive the static analyzer as a friend and as a useful tool that helps to improve the quality of the code. If you want the errors detected by the analyzer not to get to the version control system and become publicly seen, the static analysis should be performed on the machines of the developers, so that the possible issues are detected as early as possible.
As I have already mentioned, the manual start of static analysis on the whole codebase may require quite a long time. If the developer has to memorize which files he has worked on, it can become quite annoying too.
The incremental analysis allows both to decrease the time for static analysis and to get rid of the necessity to start the analysis manually. The incremental analysis starts automatically after the build of a project or a solution. We consider this trigger to be the most appropriate for the analysis. It's logical to check if the project gets built before committing changes in the version control system. Thus, the incremental analysis mode allows you to get rid of the annoying actions caused by the need to manually perform static analysis on the computer of the developer. Incremental analysis works in the background mode, so the developer can continue working on the code without waiting for the analysis to finish.
You can enable the after build incremental analysis mode in the menu PVS-Studio/Incremental Analysis After Build (Modified Files Only), this option is enabled in PVS-Studio by default:
Picture 3
Once the incremental analysis mode is enabled, PVS-Studio will analyze all the modified files automatically in the background after the project is built. If PVS-Studio detects such modifications, incremental analysis will be launched automatically, and an animated PVS-Studio icon will appear in the notification area:
Picture 4
You may find more details related to the use of incremental analysis in the article PVS-Studio's incremental analysis mode.

Incremental analysis on the continuous integration server - an additional barrier against bugs

The continuous integration presupposes that the project is built on the build server after every commit to the version control system. As a rule, besides the project build, the existing set of unit tests gets executed. In addition to unit-tests, a lot of teams having the practice of continuous integration, use a build server to ensure processes of continuous quality control. For example, besides running unit and integration tests, these processes may include static and dynamic analysis, performance measurement and so on.
One of the important requirements for the tasks performed on the continuous integration server, is that the project build and execution of any additional actions must be very fast, so that the team can quickly respond to the detected problems. Performing static analysis on a large code base after each commit to the version control system contradicts to this requirement, because it may take a very long time. We couldn't put up with such limitations, that's why after reviewing our Visual Studio plugin, where we have had the incremental analysis mode for quite a long time, we asked ourselves: why not implement the same mode in the command line module PVS-Studio_Cmd.exe?
No sooner said than done, and in our module for command line which is designed to integrate static code analyzer into various build systems, there is now an incremental analysis mode. This mode works just like incremental analysis in the plugin.
Thus, with the added support of incremental analysis in PVS-Studio_Cmd.exe it is possible to use our static analyzer in a continuous integration system during numerous daily builds. Due to the fact that only modified files will be checked since the last update of the code from the version control system, static analysis will be done very quickly, and the duration of the project build will practically stay the same.
To activate the incremental analysis mode for the command line module PVS-Studio_Cmd.exe, specify the key --incremental and set one of the following modes:
  • Scan - analyze all dependencies to determine, which files will be analyzed incrementally. There will be no immediate analysis.
  • Analyze - perform incremental analysis. This step should be done after Scan and can be performed both before and after the build of the solution or the project. The static analysis will be performed only for the files that have been modified since the previous build.
  • ScanAndAnalyze - analyze all the dependencies to determine which files should be analyzed incrementally and perform incremental analysis of the edited files with the source code.
To get more detailed information about the incremental analysis mode in the command line module PVS-Studio_Cmd.exe, read the articles PVS-Studio's incremental analysis mode and Analyzing Visual C++ (.vcxproj) and Visual C# (.csproj) projects from the command line
I should also note that the BlameNotifier utility, that is included in the distribution of PVS-Studio, greatly compliments the functionality of incremental analysis. 
This utility interacts with popular VCSs (currently supported systems are Git, Svn and Mercurial) to get information about those developers who committed the erroneous code and send notifications to them.
Thus we recommend the following scenario of using the analyzer on the continuous integration server:
  • to perform incremental analysis for numerous daily builds so that you can control the code quality only for the newly modified files;
  • it is advisable to perform the analysis of the whole code base during the overnight builds to have the full information about defects in the code.

Peculiarities of the incremental analysis implementation in PVS-Studio

As I have already noted, the incremental analysis mode has existed in the PVS-Studio plugin of Visual Studio for a long time. In the plugin the detection of the modified files to be analyzed incrementally, was implemented with the help of COM-wrappers of Visual Studio. Such an approach is absolutely impossible for implementing the incremental analysis feature in the command line version of our analyzer, as it is completely independent of the inner infrastructure of Visual Studio. It's not the best idea to support different implementations having the same functions in different components, that's why we decided right away that in the plugin for Visual Studio and in the command line utility PVS-Studio_Cmd.exe we'll use the common code base.
Theoretically it's not a hard task to detect modified files since the last build of a project. To solve it, we need to get the time of modification of the target binary file and the modification of all files, involved in the building of the target binary file. Those files with the source code that were modified later than the target file, should be added to the list of files for the incremental analysis. However, things are more complicated in the real world. In particular, for projects implemented in C or C++, it is very difficult to identify all of the files involved in the build of the target file, for example, those header files, which were included directly in code, and are absent in the project file. Here I should note that under Windows, both our Visual Studio plugin (which is obvious) and the command line version PVS-Studio_Cmd.exe support only the analysis of MSBuild projects. This fact greatly simplified our task. It is also worth mentioning that in the Linux version of PVS-Studio you can also use incremental analysis - it works there "out of the box": when you use compile monitoring, only the built files will be analyzed. Accordingly, the incremental analysis will start doing the incremental build; the situation will be the same during the direct integration to the build system (for example, to make files).
MSBuild provides a mechanism for tracking accesses to the file system (File Tracking). For the incremental build of projects, implemented in C and C++, the correspondence between the source files (for example, cpp-fies, header files) and the target files are written to the *.tlog-files. For example, for the CL task, the paths to all the source files, read by the compiler, will be written to the file CL.read.{ID}.tlog, and the paths to the target files will be saved in the CL.write.{ID}.tlog file.
So in the CL.*.tlog files we already have all the information about the source files that have been compiled, and the target files. The task gets gradually simpler. However, we still have the task to traverse all the source and target files and compare dates of their modifications. Can we simplify it even more? Of course! In the Microsoft.Build.Utilities namespace we find classes CanonicalTrackedInputFiles and CanonicalTrackedOutputFiles that are responsible for the work with files CL.read.*.tlog and CL.write.*.tlog accordingly. Having created instances of these classes, and using the method CanonicalTrackedInputFiles.ComputeSourcesNeedingCompilation, we get a list of source files for compilation, based on the analysis of the target files and the dependency graph of the source files.
Let's have a look at an example of the code that allows to get a list of files, for which we should perform incremental analysis using this approach. In this example, sourceFiles is a collection of full normalized paths to all the source files of the project, tlogDirectoryPath is a path to the directory, where the *.tlog-files are located.
var sourceFileTaskItems =
    new ITaskItem[sourceFiles.Count];

for (var index = 0; index < sourceFiles.Count; index++)
    sourceFileTaskItems[index] =
        new TaskItem(sourceFiles[index]);

var tLogWriteFiles =
    GetTaskItemsFromPath("CL.write.*", tlogDirectoryPath);
var tLogReadFiles =
    GetTaskItemsFromPath("CL.read.*", tlogDirectoryPath);

var trackedOutputFiles =
    new CanonicalTrackedOutputFiles(tLogWriteFiles);
var trackedInputFiles =
    new CanonicalTrackedInputFiles(tLogReadFiles,
        sourceFileTaskItems, trackedOutputFiles, false, false);

ITaskItem[] sourcesForIncrementalBuild =
    trackedInputFiles.ComputeSourcesNeedingCompilation(true);
Thus, using standard tools of MSBuild, we managed to make the mechanism of identification of files for incremental analysis the same as the inner mechanism of MSBuild for incremental build, which provides a high quality of this approach.

Conclusion


In this article we had to look at the advantages of using incremental analysis on the machines of the developers and on the build server. Also we "looked under the hood" and got to know how to detect files for incremental analysis using MSBuild abilities. For all who are interested, I suggest downloading the trial version of our PVS-Studio analyzer and see what can be detected in your projects. Bugless code to you!

Wednesday, February 8, 2017

Propose a project for analysis by PVS-Studio: now on GitHub

It's not the easiest task to check projects using a static analyzer and to write review articles, describing the bugs found. Almost always it is work of a team, not just one person. The choice of a project plays quite a significant role in it. Because this directly affects the interest of readers for an article about the project check. In this post I want to explain how you can suggest an interesting project for analysis on GitHub.
Picture 5

Introduction

PVS-Studio is a tool for bug detection in the source code of programs, written in C, C++ and C#.

Recently the abilities of the analyzer and the spheres of its use have considerably grown. Now this is a powerful tool for the improvement of the code quality, available both for Windows and Linux. The analyzer has several operating modes that allow you to easily run the analysis regardless of the build system. This lets a user to explore the analyzer effortlessly and then integrate it fully for regular use.
Also we use the analyzer when we check selected projects to write articles. Our site has more than 270 articles about the check of open source projects. The total number of errors exceeds 10000. All the reviews go to the always up-to-date list of articles on the site. Now you can suggest projects not only via a feedback form, but also via GitHub.

How to propose a project

In the repository on GitHub we have posted a list of projects that we will review before choosing a topic for an article. For now it will be a common list for C/C++ and C# projects.
Anyone can add an interesting open source project and send a Pull Request.
Required information about the project:
  • Project title;
  • Brief description in English;
  • A link to the official website (if it is available);
  • A link to the source code (a repository or a website page with the information, where the source code can be downloaded).
Important. It makes no sense to add your personal small projects to the list. To check such projects, you can use a free analyzer and even use it regularly (How to use PVS-Studio for free). Also, please, avoid adding projects that have already been checked. Sometimes we recheck projects, but usually it happens on a special occasion. The source code that got to the net illegally isn't welcomed as well.

If you a developer of some open source project, we are open to more interesting cooperation - a joint article, discussion of the bugs found and so on.

Tuesday, February 7, 2017

Moving from CruiseControl.NET to Jenkins in the PVS-Studio development team

Now it's hard to imagine software development without automated project builds and testing. There are various ready-made solutions to minimize the time expenses for the integration of the modifications into the project. In this article I am going to speak about the way PVS-Studio team changed the continuous integration server from CruiseControl.NET to Jenkins I will also be talking about the motives behind this decision, the goals we tried to pursue and the issues we had to deal with during that process.
Picture 1

Introduction

Continuous integration - is an automated process of building, deploying and testing software. This practice is popular both among large development teams and individual developers. There is quite a number of solutions for this practice. In this article we are going to talk about free open source projects CruiseControl.NET and Jenkins.

CruiseControl.NET(CCNet) - a tool for continuous integration of software implemented on the .NET Framework. There are also variants of the tool in Java (CruiseControl) and a version for the Ruby-environments (CruiseControl.rb). You can view and manage the information about the builds through web-interface or a desktop utility. It integrates with different version control systems. Is an open source project but, unfortunately, is not developing since 2013.
Jenkins- a tool for continuous integration with open source, written in Java. It was forked from the Hudson project after an argument with Oracle. Providing functions of continuous integration, it allows automating a part of the development process, which does not require the involvement of human developers. The abilities of Jenkins can be extended through plugins. At the moment the project is actively developed and supported by the developers and the community.
Although this article may look like a review in the "CCNet vs. Jenkins" style, the choice is already made for a server with Jenkins. The main reason why we changed the tool for continuous integration is that the CruiseControl.NET project is no longer developing. There were also other issues when working with CCNet that we are going to cover later.
Our project has quite a long history - recently we had a 10-year anniversary, you may read the story in the article "PVS-Studio project - 10 years of failures and successes". We were using CCNet for more than 5 years. We got so used to its interface, settings and functions, that Jenkins seemed to be really uncomfortable. First, we started using it with the release of PVS-Studio for Linux. The switch to the Jenkins platform was preceded by a long study of the tool. Some time was spent on looking for the similar functions from CCNet. Further on, I will describe the most interesting moments from our work.

Our claims for CCNet

  • CCNet is no longer developing. It can be used, but in case you want to expand the functionality or fix existing/potential errors, you will have to do that by yourself.
  • The Source Code Management mode is working unstably, namely, for the automatic launch when there are changes in the version control system. In case of problems with the network, in this mode a project gets the status "Failed", even if it wasn't run. In practice, the problem occurs so often (sadly, our office doesn't have the most stable Internet access), that this mode becomes impossible to use.
  • When querying SCM on the edits, in case of an error in the version control system (for example, if some directory was removed from a repository, specified in the settings, causing a tree conflict), the execution of the project will be immediately interrupted; still, the status will be kept "Success" - a project will cease working, but its status will remain "green " in the web interface and desktop utility. In such a mode the launch of tests can be not done for weeks and there is a risk that no one will pay attention to it, thinking that tests are working correctly.
  • The general server performance log is too verbose and unstructured: it's hard to understand which step of the build failed and find a log specifically for this step. When running multiple projects in parallel, the build log gets "mixed". The XML log of the build of a separate project is available in the web interface, but quite often is not really detailed and doesn't contain all the commands that were run.
  • Inefficient paralleling of subtasks inside a project. The subtasks are run in parallel by groups by the number of the processor cores. If long and quick commands get to the group, then the new tasks won't be run, until all the tasks from the previous run aren't completed.

Comparison of usage scenarios

Server settings

The settings of the CCNet projects (configuration of the server) were stored in one xml file, and various passwords in a different one. Although the settings file reached the size of ~4500 lines, it was still quite convenient to use it. By pressing Alt+2 in Notepad++, the whole list of projects can be minimized and you can edit the one you need (figure 1).
Figure 1 - Editing the CCNet settings in Notepad++
Figure 1 - Editing the CCNet settings in Notepad++
Although the file had some duplicate code, there were no difficulties with the server support.
Here is how the SCM block was configured:
<svn>
  <username>&SVN_USERNAME;</username>
  <password>&SVN_PASSWORD;</password>
  <trunkUrl>&SVN_ROOT;...</trunkUrl>
  <workingDirectory>&PROJECT_ROOT;...</workingDirectory>
  <executable>&SVN_FOLDER;</executable>
  <deleteObstructions>true</deleteObstructions>
  <cleanUp>true</cleanUp>
  <revert>true</revert>
  <timeout units="minutes">30</timeout>
</svn>
Here is how the MSBuild block was configured:
<msbuild>
  <description>PVS-Studio 2015</description>
  <workingDirectory>&PROJECT_ROOT;...</workingDirectory>
  <projectFile>...\PVS-Studio-vs2015.sln</projectFile>
  <buildArgs>/p:Configuration=Release</buildArgs>
  <targets>Build</targets>
  <timeout>600</timeout>
  <executable>&MSBUILD14_PATH;</executable>
</msbuild>
Here is how the block for common tasks was configured
<exec>
  <description>PVS-Studio 2015 sign</description>      
  <executable>&PROJECT_ROOT;...\SignToolWrapper.exe</executable>
  <baseDirectory>&PROJECT_ROOT;...</baseDirectory>
  <buildArgs>"&SIGNTOOL;" ... \PVS-Studio-vs2015.dll"</buildArgs>
  <buildTimeoutSeconds>600</buildTimeoutSeconds>
</exec>
On the basis of such a project file, CCNet displays very conveniently all the executable steps (albeit only in the desktop tray utility, for some reason the web-interface didn't support this). In Jenkins we had some troubles with the "high-level" display of stages of completing an integration project, but we will speak about that later.
In Jenkins you have to store quite a number of settings files: a server configuration, settings files for some plugins, each project has its own configuration file. Although all of these files exist in the xml format, they aren't very convenient to view (at least in comparison with CCNet), because all the commands in the tags are written as plain text. The truth is, this is largely connected with the ideology of the use of the instrument. In CCNet the config is written manually, so it can be "nicely" formatted. Jenkins suggests editing project settings using its web interface and automatically generates configs.
Here is an example of how the commands look like in the Jenkins configs:
<hudson.tasks.BatchFile>
  <command>CD &quot;%BUILD_FOLDERS%\Builder&quot;&#xd;
PVS-Studio_setup.exe /VERYSILENT /SUPPRESSMSGBOXES ...&#xd;
Publisher_setup.exe /VERYSILENT /SUPPRESSMSGBOXES</command>
</hudson.tasks.BatchFile>
And this is quite a small example.

View the task statuses

As I have written before, in CCNet the projects are configured with Task blocks. Here is how a successfully completed task looks with all the steps (figure 2).
Figure 2 - viewing the status of tasks in the CCTray (a desktop client for CCNet)
Figure 2 - viewing the status of tasks in the CCTray (a desktop client for CCNet)
An error in any of the blocks can be clearly seen in the hierarchy of subtasks. It is very convenient and illustrative visualization of the integration process. We almost never had to search for logs, it was clear looking at the description what requires checking on the local computer. We could not find such a feature in Jenkins, that's why we had to study this question really thoroughly before moving to Jenkins.
We can draw such analogy between CCNet and Jenkins: there is a project in CCNet (Project), the 'step' in this project is called a Task (as you can see on the figure above). In Jenkins there is also a project (Job), but its 'steps' are called the same - Steps (figure 3).
Figure 3 - Corresponding names in CCNet and Jenkins
Figure 3 - Corresponding names in CCNet and Jenkins
Unfortunately, the web interface of Jenkins cannot visualize individual steps - the Job has only a full console log of all steps together. A big disadvantage is that you cannot see in Jenkins which of the steps failed - you need to watch the full log of the Job build. And as people get used to good things very quickly, we didn't want to abandon the old usage scenario. At that time we discovered Multijob Plugin
This plugin allowed us making the following improvements:
1. Using Jobs as Steps in other Jobs. Thus we got universal Jobs that allowed separating the log of certain subtasks and the main thing to view statuses of separate subtasks. The Jenkins web interface can visualize quite well the completion of individual Jobs within a Multijob - it's exactly what we were looking for. Figure 4 shows an example of a completed Multijob.
Figure 4 - viewing a completed Multijob.
Figure 4 - viewing a completed Multijob.
2. We managed to get rid of duplicate code using universal Jobs. For example, there is a compilation of some utility: for the distribution, for tests, for the code analysis. In CCNet these were same Task blocks in 3 different projects. To compile this utility in Jenkins there is a Job that is used by several Multijobs.
3. During the creation of projects in Jenkins, we have the following workflow. We divide all the Jobs into Multijob "projects" and universal "steps". The names of the universal Jobs have a prefix "job_" and do not suggest using as a separate (stand-alone) project. They also don't have loading the source code from the repository. The names of Multijobs have the "proj_" prefix and include loading of the source code and running only other Jobs. (We try to avoid Steps, because they are not visualized).
The universal Jobs are run with the following parameter:
WORKSPACE=$WORKSPACE
It means that the Job will be run in the working directory of the Multijob.
Thus we can get a log of the source files updates and logs of all the steps of the build separately. It would be hard and meaningless to follow this ideology for all the projects. It is done only for several biggest and most important projects that should be studied in detail in case of problems.
4. You can configure conditional and parallel launch of the Jobs in Multijob. Multijobs can run Multijobs using the same ways. Thus you can combine launches of different projects: to start the build of all installers or all the tests.

Viewing the build logs

It was extremely inconvenient to view the build logs of the projects, because they were mixed with the server output and had a special markup. There is no such a problem in Jenkins and there is a possibility to separate the logs for subtasks in some projects.

Getting the revision of the source code

In Jenkins a separate revision of the version is defined for every added link to the SVN repository. I.e. if we add several directories, the numbers can vary greatly, but we need only one maximum number.
According to the documentation, we should work with this as follows:
If you have multiple modules checked out, use the svnversion command. If you have multiple modules checked out, you can use the svnversion command to get the revision information, or you can use the SVN_REVISION_<n> environment variables, where <n> is a 1-based index matching the locations configured.
It was done in this very way: of all the obtained values SVN_REVISION_<n> , the maximum value was chosen and added to the built programs.

Useful plugins for Jenkins

The enhancement of the tool with the help of the plugins allows flexible configuration of the server. The plugins to run the tests in Visual Studio are perhaps the only ones that we refused to use. They had additional mandatory launch parameters, that we didn't use that's why it was easier to create a universal Job that would just start the tests from a command line. Unfortunately, Jenkins, being "out of the box" couldn't do much of what we got so used to in CCNet. However, we manage "to get back" all the necessary functionality.
Here is a list of those plugins that we found useful, with a small description:
  • Multijob plugin- allows using other Jobs as stages of the build with the possibility of sequential and parallel execution.
  • Environment Injector Plugin - a plugin that is used to specify global passwords. They are used as environment variables, and the plugin hides the variable value in the log.
  • Pre SCM BuildStep Plugin - adding extra steps before executing the commands of the version control system.
  • MSBuild Plugin - a handy plugin to build projects using MSBuild. In the settings you specify the paths to different versions of MSBuild only once, later in the project you can easily add build steps.
  • Parameterized Trigger plugin - add parameters of a project launch. For example, you can make a choice of a trunk/stable branch to build a distribution.
  • Post Build Script Plug-in - performing additional steps after the build.
  • Throttle Concurrent Builds Plug-in - this plugin allows adjusting the number of concurrently running project builds either globally or within a specific category. It lets you get the functionality in Jenkins similar to the queues in CCNet - an ability to execute in parallel several projects from different categories (queues), while providing consistent execution of the projects within a single queue. For example, we have Installers queues (distributions) and Tests. We want to have an ability to build some distribution in parallel with running the tests, but at the same time, tests shouldn't work in parallel - there won't be enough "cores" on the server.
  • Build Name Setter Plugin - allows specifying the name in a necessary format: Major.Minor.Revision.Build.
  • Dashboard View - allows adding a custom Jobs display in the browser. As we have universal Jobs, that are of no use to run manually, we created a list without them with the help of this plugin.
  • ZenTimestamp Plugin - a handy plugin that adds timestamps in the build logs.

Overview of desktop clients

To receive notifications from the CCNet we used a client for Windows - CCTray.
Here are variants that are now available to work with Jenkins:
1. CCTray - this program can be used for a Jenkins server as well. The projects will look approximately the same as they were before (figure 5).
Figure 5 - CCTray screenshot
Figure 5 - CCTray screenshot
The description of CCTray as a Jenkins client:
  • It is not developing, the same as CCNet;
  • Can't show subtasks (works only for CCNet);
  • Cannot run the projects;
  • You can go to the project page by clicking on the title;
  • Configurable display of projects (Icons, List, Details);
  • Open source.
2. CatLight (figure 6)
Figure 6 - a screenshot of CatLight
Figure 6 - a screenshot of CatLight
Client description:
  • At this moment it is a beta version, the final version will be paid;
  • There are crashes during the setup and work, glitches in the interface;
  • When the PC "wakes up" after the hibernation, the status on the dashboard doesn't get automatically updated.
  • Can't show subtasks for Multijobs;
  • Cannot run the projects;
  • The project display is not customizable (the only possible display - figure 6);
  • You can go to the project page by clicking on the title;
  • Ability to view the status of last 5 launches and jump-click to them;
  • Ability to view the progress of the running project;
  • If several servers are added, they are conveniently separated by a line;
  • Versions for Windows, Linux and Mac.
3. Kato (figure 7)
Figure 7 - screenshot of Kato
Figure 7 - screenshot of Kato
Client description:
  • Can't show subtasks for Multijobs;
  • Can run the projects; Unfortunately, does not support projects with a parameterized launch - the utility crashes when trying to run such a project;
  • Projects from different servers are displayed in one list and are not separated in any way (not always convenient);
  • Configurable project display (List, Grid);
  • You can go to the project page by clicking on the title;
  • You can view the latest log directly in the client, but due to lack of monospace text it is not very convenient;
  • Open source.
  • Only for Windows.
4. CCMenu - a client only for Mac, open source. It is not relevant for us, but perhaps, some may need it.

Conclusion

Using CI is useful in any project. There is a great free tool Jenkins that was reviewed in the article, and also there are a lot of other free and paid Cl. It is rather pleasant to use a developing the project: a large number of upgrades are released for Jenkins and plugins. New solutions get created, as for example, Blue Ocean project that is still in Beta. You may find it on the main page of the Jenkins site.
Although, the clients for monitoring Jenkins projects weren't much impressive to me. A lot of obvious features are missing. Perhaps desktop clients are particularly in demand and it is more correct to use only some web interface.
When moving to a new server, we couldn't use Jenkins as a Windows service, because in this mode the UI tests do not run. We found a way out by setting up a server as a console application with a hidden window.

If you have any comments to this material of some interesting solutions of the problems we've mentioned here, we'll be glad to get them in the comment section of via a feedback form.

Monday, February 6, 2017

Why I Dislike Synthetic Tests

I don't like it when people use artificial code examples to evaluate the diagnostic capabilities of static code analyzers. There is one particular example I'm going to discuss to explain my negative attitude to synthetic tests.
Picture 1
Bill Torpey recently wrote a blog post entitled "Even Mo' Static", where he shared his view on the results of testing Cppcheck and PVS-Studio analyzers on the itc-benchmarks project, which is a set of static analysis benchmarks by Toyota ITC.

That post upset me because it would leave you with an impression that Cppcheck's and PVS-Studio's capabilities were very similar. What follows from the article is that one analyzer is better at diagnosing some types of errors and the other, at diagnosing other types of errors, but their capabilities are generally the same.
I think it's a wrong conclusion. My opinion is that our analyzer, PVS-Studio, is several times more powerful than Cppcheck. Well, it's not even an "opinion" - it's what I know for sure!
However, since it's not obvious to an outside observer that PVS-Studio is ten times better than Cppcheck, there must be a reason for that. I decided to take a look at that project, itc-benchmarks, and figure out why PVS-Studio didn't perform at its best on that code base.
The more I was digging, the greater irritation I felt. There was one particular example that drove me really crazy, and I'm going to tell you about it in a moment. What I have to say as a conclusion is this: I have no complaints against Bill Torpey. He wrote a good, honest article. Thank you, Bill! But I do have complaints against Toyota ITC. I personally think their code base is crap. Yes, it's a blunt statement, but I believe I have enough competence and experience to debate about static code analyzers and ways of evaluating them. In my opinion, itc-benchmarks can't be used to adequately evaluate tools' diagnostic capabilities.
Now, here's the test that killed me.
It's a test for null pointer dereference:
void null_pointer_001 ()
{
  int *p = NULL;
  *p = 1; /*Tool should detect this line as error*/
          /*ERROR:NULL pointer dereference*/
}
Cppcheck analyzer reports an error in this code:
Null pointer dereference: p
PVS-Studio analyzer keeps silent, although it does have diagnostic V522 for cases like that.
So, does it mean that PVS-Studio is worse at diagnosing this example than Cppcheck? No, it's just the opposite: it's better!
PVS-Studio understands that this code was written on purpose and there is no error there.
In certain cases, programmers write code like that intentionally to make the program throw an exception when a null pointer dereference occurs. This trick is used in tests and specific code fragments, and I have seen it more than once. Here's, for example, how it can be in a real-life project:
void GpuChildThread::OnCrash() {
  LOG(INFO) << "GPU: Simulating GPU crash";
  // Good bye, cruel world.
  volatile int* it_s_the_end_of_the_world_as_we_know_it = NULL;
  *it_s_the_end_of_the_world_as_we_know_it = 0xdead;
}
That's why we have included a number of exceptions into PVS-Studio's V522 diagnostic rule so that it doesn't go mad about code like that. The analyzer understands that null_pointer_001 is an artificial function; there are just no errors that deal with assigning zero to a pointer and then immediately dereferencing it in real functions. The function name itself is also a sign for the analyzer that the "null pointer" here is not an accident.
For cases like that, the V522 diagnostic has exception A6. It is this exception that synthetic function null_pointer_001 falls under. This is the description of the A6 exception:
The variable is dereferenced in the body of a function whose name contains one of the following words:
  • error
  • default
  • crash
  • null
  • test
  • violation
  • throw
  • exception
Before being dereferenced, the variable is assigned 0 one line earlier.
The synthetic test in question totally fits into this description. Firstly, the function name contains the word "null". Secondly, the variable is assigned zero exactly one line earlier. The exception revealed unreal code, which it really is because it's a synthetic test.
It's for these subtle details that I dislike synthetic tests!
It's not the only complaint I have against itc-benchmarks. For example, there is another test in the same file:
void null_pointer_006 ()
{
  int *p;
  p = (int *)(intptr_t)rand();
  *p = 1; /*Tool should detect this line as error*/
          /*ERROR:NULL pointer dereference*/
}
The rand function can return 0, which will then turn into NULL. PVS-Studio analyzer doesn't know yet what rand can return, so it has no suspicions about this code.
I asked my colleagues to teach the analyzer to better understand how exactly function rand works. There's no choice; we have to smooth the tool manually so that it could do better on the test base in question. We are forced to do it, since people use test suits like that to evaluate analyzers.
But don't you worry. I promise that we will be still working on real-life, useful diagnostics as before instead of adapting the analyzer for tests. We might polish PVS-Studio slightly for itc-benchmarks, but not as a top-priority task and only for those cases that do make at least some sense.
I want developers to understand that the example with rand does not actually show anything. It's synthetic, totally far-fetched. No one writes programs that way; there are no real errors like that.
By the way, if the rand function returns 1400 instead of 0, it won't be any better. A pointer like that can't be dereferenced in any case. So, this null pointer dereference is some strange private case of completely incorrect code, which was simply made up by the suite authors and which you are never going to see in reality.
I know what the real programming problems are. These are, among others, typos, and our tool is regularly catching hundreds of them using, say, diagnostic V501. It's funny, but I haven't found a test in itc-benchmarks that checks if tools can spot the "if (a.x == a.x)" typo pattern. Not a single test!
It turns out that itc-benchmarks ignores the analyzers' typo-search capabilities, while our readers surely know how widespread defects of this type are. And what that project does have is test cases that I find stupid and that are never found in real programs. I can't imagine stumbling upon code like the one below, resulting in an array overrun, in a real, serious project:
void overrun_st_014 ()
{
  int buf[5];
  int index;
  index = rand();
  buf[index] = 1; /*Tool should detect this line as error*/
                  /*ERROR: buffer overrun */
  sink = buf[idx];
}
The only type of programs where you could probably find that is students' programming exercises.
At the same time, I do know that you are very likely to come across the following typo in a serious project:
return (!strcmp (a->v.val_vms_delta.lbl1,
                 b->v.val_vms_delta.lbl1)
        && !strcmp (a->v.val_vms_delta.lbl1,
                    b->v.val_vms_delta.lbl1));
This error was found by PVS-Studio in GCC compiler's code: the same strings are compared twice.
So, the suite includes tests for diagnosing exotic code with rand but zero tests for classic typos.
I could go on and on, but I'd rather stop. I've let off steam and feel better now. Thank you for reading. Now I have an article to support my opinion about synthetic error bases.
Welcome to install and try a most powerful code analyzer PVS-Studio.
References: