I have unintentionally raised a large debate recently concerning the question of whether it is legal in C/C++ to use the &P->m_foo expression with P being a null pointer. The programmers' community divided into two camps. The first claimed with confidence that it isn't legal, while the others were as sure that it is. Both parties gave various arguments and links, and it occurred to me that at some point I had to make things clear. For that purpose, I contacted Microsoft MVP experts, and the Visual C++ Microsoft development team communicating through a closed mailing list. They helped me to prepare this article and now everyone interested is welcome to read it. For those who can't wait to learn the answer: That code is NOT correct.
Debate history
It all started with an article about a Linux kernel check with the PVS-Studio analyzer. But the issue doesn't have anything to do with the check itself. The point is that in that article I cited the following fragment from Linux' code:
static int podhd_try_init(struct usb_interface *interface,
struct usb_line6_podhd *podhd)
{
int err;
struct usb_line6 *line6 = &podhd->line6;
if ((interface == NULL) || (podhd == NULL))
return -ENODEV;
....
}
I called this code dangerous because I thought it to cause undefined behavior.
After that, I got a pile of emails and comments, readers objecting to that idea of mine, and was even close to giving in to their convincing arguments. For instance, as proof of that code being correct they pointed out the implementation of the offsetof macro, typically looking like this:
#define offsetof(st, m) ((size_t)(&((st *)0)->m))
We deal with null pointer dereferencing here, but the code still works well. There were also some other emails reasoning that since there had been no access by null pointer, there was no problem.
Although I tend to be gullible, I still try to double-check any information I may doubt. I started investigating the subject, and eventually wrote a small article: "Reflections on the Null Pointer Dereferencing Issue".
Everything suggested that I had been right: One cannot write code like that. But I didn't manage to provide convincing proof for my conclusions, and cite the relevant excerpts from the standard.
After publishing that article, I was again bombarded by protesting emails, so I thought I should figure it all out once and for all. I addressed language experts with a question, to find out their opinions. This article is a summary of their answers.
About C
The '&podhd->line6' expression is undefined behavior in the C language when 'podhd' is a null pointer.
The C99 standard says the following about the '&' address-of operator (6.5.3.2 "Address and indirection operators"):
The operand of the unary & operator shall be either a function designator, the result of a [] or unary * operator, or an lvalue that designates an object that is not a bit-field and is not declared with the register storage-class specifier.
The expression 'podhd->line6' is clearly not a function designator, the result of a [] or * operator. It is an lvalue expression. However, when the 'podhd' pointer is NULL, the expression does not designate an object since 6.3.2.3 "Pointers" says:
If a null pointer constant is converted to a pointer type, the resulting pointer, called a null pointer, is guaranteed to compare unequal to a pointer to any object or function.
When "an lvalue does not designate an object when it is evaluated, the behavior is undefined" (C99 6.3.2.1 "Lvalues, arrays, and function designators"):
An lvalue is an expression with an object type or an incomplete type other than void; if an lvalue does not designate an object when it is evaluated, the behavior is undefined.
So, the same idea in brief:
When -> was executed on the pointer, it evaluated to an lvalue where no object exists, and as a result the behavior is undefined.
About C++
In the C++ language, things are absolutely the same. The '&podhd->line6' expression is undefined behavior here when 'podhd' is a null pointer.
The discussion at WG21 (232. Is indirection through a null pointer undefined behavior?), to which I referred to in the previous article, brings in some confusion. The programmers participating in it insist that this expression is not undefined behavior. However, no one has found any clause in the C++ standard permitting the use of "podhd->line6" with "podhd" being a null pointer.
The "podhd" pointer fails the basic constraint (5.2.5/4, second bullet) that it must designate an object. No C++ object has nullptr as address.
Summing it all up
struct usb_line6 *line6 = &podhd->line6;
This code is incorrect in both C and C++, when the podhd pointer equals 0. If the pointer equals 0, undefined behavior occurs.
The program running well is pure luck. Undefined behavior may take different forms, including program execution in just the way the programmer expected. It's just one of the special cases of undefined behavior, and that's all.
You cannot write code like that. The pointer must be checked before being dereferenced.
Additional ideas and links
- When considering the idiomatic implementation of the 'offsetof()' operator, one must take into account that a compiler implementation is permitted to use what would be non-portable techniques to implement its functionality. The fact that a compiler's library implementation uses the null pointer constant in its implementation of 'offsetof()' doesn't make it OK for user code to use '&podhd->line6' when 'podhd' is a null pointer.
- GCC can / does optimize, assuming no undefined behavior ever occurs, and would remove the null checks here -- the Kernel compiles with a bunch of switches to tell the compiler not to do this. As an example, the experts refer to the article "What Every C Programmer Should Know About Undefined Behavior #2/3".
- You may also find it interesting that a similar use of a null pointer was involved in a kernel exploit with the TUN/TAP driver. See "Fun with NULL pointers". The major difference that might cause some people to think the similarity doesn't apply is that in the TUN/TAP driver bug, the structure field that the null pointer accessed was explicitly taken as a value to initialize a variable, instead of simply having the address of the field taken. However, as far as standard C goes, taking the address of the field through a null pointer is still undefined behavior.
- Is there any case when writing &P->m_foo where P == nullptr is OK? Yes, for example when it is an argument of the sizeof operator: sizeof(&P->m_foo).
Acknowledgements
This article was made possible thanks to the experts whose competence I can see no reason to doubt. I want to thank the following people for helping me in writing it:
- Michael Burr is a C/C++ enthusiast who specializes in systems level and embedded software including Windows services, networking, and device drivers. He can often be found on the StackOverflowcommunity answering questions about C and C++ (and occasionally fielding the easier C# questions). He has 6 Microsoft MVP awards for Visual C++.
- Billy O'Neal is a (mostly) C++ developer, and contributor to StackOverflow. He is a Microsoft Software Development Engineer on the Trustworthy Computing Team. He has worked at several security related places previously, including Malware Bytes and PreEmptive Solutions.
- Giovanni Dicanio is a computer programmer, specializing in Windows operating system development. Giovanni wrote computer programming articles on C++, OpenGL, and other programming subjects on Italian computer magazines. He contributed code to some open-source projects as well. Giovanni likes helping people solving C and C++ programming problems on Microsoft MSDN forums, and recently on StackOverflow. He has 8 Microsoft MVP awards for Visual C++.
- Gabriel Dos Reis is a Principal Software Development Engineer at Microsoft. He is also a researcher and a longtime member of the C++ community. His research interests include programming tools for dependable software. Prior to joining Microsoft, he was Assistant Professor at Texas A&M University. Dr. Dos Reis was a recipient of the 2012 National Science Foundation CAREER award for his research in compilers for dependable computational mathematics and educational activities. He is a member of the C++ standardization committee.
References
- Wikipedia. Undefined Behavior.
- A Guide to Undefined Behavior in C and C++. Part 1, 2, 3.
- Wikipedia. offsetof.
- LLVM Blog. What Every C Programmer Should Know About Undefined Behavior #2/3.
- LWN. Fun with NULL pointers. Part 1, 2.
No comments:
Post a Comment