C++26 – How to Understand Requirements for Initializing Uninitialized Variables to a Fixed Byte Pattern in C++

c++c++26language-lawyer

In C++26, reading uninitialized variables is no longer undefined, it's "erroneous" now (What is erroneous behavior? How is it different from undefined behavior?).

However, the wording for this confuses me:

[basic.indet]/1.2

otherwise, the bytes have erroneous values, where each value is determined by the implementation independently of the state of the program.

^{(bold mine)}

To me, this reads like the implementation must overwrite the values with something (e.g. 0xBEBEBEBE), because leaving them truly uninitialized might make them dependent on the "state of the program", contradicting the bold part.

Is my interpretation correct? Are implementations forced to overwrite uninitialized variables now?

Best Answer

The linked P2795R5 says under Performance and security implications:

The automatic storage for an automatic variable is always fully initialized, which has potential performance implications. P2723R1 discusses the costs in some detail. Note that this cost even applies when a class-type variable is constructed that has no padding and whose default constructor initializes all members.

In particular, unions are fully initialized. ...

It also points out that although automatic locals can be annotated [[indeterminate]] to suppress this initialization, there's no way to avoid it for any temporaries.

So it seems like your interpretation is correct.

Oddly, it doesn't seem important what this magic value is - or even whether this initialization really happens - except that it can't be a trap pattern. As already pointed out there's no magic value of a byte that is unambiguously erroneous at runtime and still safe to load, copy, and compare.

Edit - why do I say it doesn't seem to matter what the magic value is, or even whether this initialization really happens?

The motivation is to stop evaluation (ie. glvalue-to-prvalue conversion) of uninitialized automatic variables being Undefined Behaviour. Instead it will be Erroneous Behaviour which implementations are encouraged to diagnose.
- If an implementation doesn't diagnose the erroneous behaviour, the result of the evaluation is valid.
The above can't be contingent on a specific bit pattern if that bit pattern could ever be produced by a valid expression, without the risk of misfiring diagnostics.
- No usual primitives have such magic bit patterns, except for the now-uncommon trap representation.
- eg. you couldn't use either quiet or signalling NaN to mark erroneous values, because if
```
double fine = std::numeric_limits<double>::quiet_NaN;
double errn;

std::isnan(fine); // not erroneous
std::isnan(errn); // erroneous behaviour
```
  needs to treat both values differently, it can't be based on the bit pattern.
- The same is trivially true for integer types, and anyway [basic.indet/2] says
  
  Except in the following cases, ... if an erroneous value is produced by an evaluation, the behavior is erroneous and the result of the evaluation is the value so produced but is not erroneous
  
  where all the exclusions are related to "unsigned ordinary character type" and std::byte, so in:
```
int errn;      // erroneous value
foo(errn ^ 0); // 1, 2
foo(errn);     // 3
```
  1. the XOR has erroneous behaviour, but if not diagnosed must produce a non-erroneous value with exactly the same bit-pattern as the erroneous input
  2. the call to foo with the non-erroneous value must not be diagnosed
  3. the call to foo with exactly the same bit-pattern may be diagnosed
If the only goal is to prevent evaluation of uninitialized (automatic) variables escaping to UB, it's sufficient to require this kind of initialization only for types with trap representations.

It may also be required to disable (or guard with diagnostic checks) some optimizations previously allowed by UB, but it's neither necessary nor sufficient for that to depend on a specific bit pattern.

Related Solutions

C++ – Difference Between Undefined Behavior and Ill-formed, No Diagnostic Message Required

The standard is not always as coherent as we would like, since it is a very large document, written (in practice) by a number of different people, and despite all of the proof-reading that does occur, inconsistencies slip through. In the case of undefined behavior (and errors in general), I think there is an additional problem in that for much of the most basic things (pointers, etc.), the C++ standard inspires from C. But the C standard takes the point of view that all errors are undefined behavior, unless stated otherwise, where as the C++ standard tries to take the point of view that all errors require a diagnostic, unless stated otherwise. (Although they still have to allow for the case where the standard omits to specify a behavior.) I think this accounts for a lot of the inconsistency in the wording.

Globally, the inconsistency is regrettable, but on the whole, if the standard says that something is erroneous, or ill-formed, then it requires a diagnostic, unless the standard says that it doesn't, or that it is undefined behavior. In something like "ill-formed; no diagnostic required", the "no diagnostic required" is important, because otherwise, it would require a diagnostic. As for the difference between "ill-formed; no diagnostic required" and "undefined behavior", there isn't any. The first is probably more frequent in cases where the code is incorrect, the second where it is a run-time issue, but it's not systematic. (The specification of the one definition rule—clearly a compile time issue—ends with "then the behavior is undefined".)

C++ – Why Calling Main Function Is Undefined Behavior

I think your analysis is correct: calls to main are ill-formed.

You have to pass the -pedantic flag to make GCC and Clang conform. In that case, Clang says

warning: ISO C++ does not allow 'main' to be used by a program [-Wmain]

and GCC says

warning: ISO C++ forbids taking address of function '::main' [-Wpedantic]

But they allow calls to main as an extension. The standard permits such an extension, since it doesn't change the meaning of any conforming programs.

Best Answer

Related Solutions

C++ – Difference Between Undefined Behavior and Ill-formed, No Diagnostic Message Required

C++ – Why Calling Main Function Is Undefined Behavior

Related Question