Inline Variables in C++17 – How They Work

c++c++17inline-variable

At the 2016 Oulu ISO C++ Standards meeting, a proposal called Inline Variables was voted into C++17 by the standards committee.

In layman's terms, what are inline variables, how do they work and what are they useful for? How should inline variables be declared, defined and used?

Best Answer

The first sentence of the proposal:

” The inline specifier can be applied to variables as well as to functions.

The ¹guaranteed effect of inline as applied to a function, is to allow the function to be defined identically, with external linkage, in multiple translation units. In practice that means defining the function in a header, that can be included in multiple translation units. The proposal extends this possibility to variables.

So, in practical terms the (now accepted) proposal allows you to use the inline keyword to define an external linkage const namespace scope variable, or any static class data member, in a header file, so that the multiple definitions that result when that header is included in multiple translation units are OK with the linker – it just chooses one of them.

Up until and including C++14 the internal machinery for this has been there, in order to support static variables in class templates, but there was no convenient way to use that machinery. One had to resort to tricks like

template< class Dummy >
struct Kath_
{
    static std::string const hi;
};

template< class Dummy >
std::string const Kath_<Dummy>::hi = "Zzzzz...";

using Kath = Kath_<void>;    // Allows you to write `Kath::hi`.

From C++17 and onwards I believe one can write just

struct Kath
{
    static std::string const hi;
};

inline std::string const Kath::hi = "Zzzzz...";    // Simpler!

… in a header file.

The proposal includes the wording

” An inline static data member can be defined in the class definition and may s‌pecify a brace-or-equal-initializer. If the member is declared with the constexpr specifier, it may be redeclared in namespace scope with no initializer (this usage is deprecated; see‌ D.X). Declarations of other static data members shall not specify a brace-or-equal-in‌itializer

… which allows the above to be further simplified to just

struct Kath
{
    static inline std::string const hi = "Zzzzz...";    // Simplest!
};

… as noted by T.C in a comment to this answer.

Also, the constexpr specifier implies inline for static data members as well as functions.

^{Notes:
¹ For a function `inline` also has a hinting effect about optimization, that the compiler should prefer to replace calls of this function with direct substitution of the function's machine code. This hinting can be ignored.}

Related Solutions

C++11 – Understanding the Standardized Memory Model

First, you have to learn to think like a Language Lawyer.

The C++ specification does not make reference to any particular compiler, operating system, or CPU. It makes reference to an abstract machine that is a generalization of actual systems. In the Language Lawyer world, the job of the programmer is to write code for the abstract machine; the job of the compiler is to actualize that code on a concrete machine. By coding rigidly to the spec, you can be certain that your code will compile and run without modification on any system with a compliant C++ compiler, whether today or 50 years from now.

The abstract machine in the C++98/C++03 specification is fundamentally single-threaded. So it is not possible to write multi-threaded C++ code that is "fully portable" with respect to the spec. The spec does not even say anything about the atomicity of memory loads and stores or the order in which loads and stores might happen, never mind things like mutexes.

Of course, you can write multi-threaded code in practice for particular concrete systems – like pthreads or Windows. But there is no standard way to write multi-threaded code for C++98/C++03.

The abstract machine in C++11 is multi-threaded by design. It also has a well-defined memory model; that is, it says what the compiler may and may not do when it comes to accessing memory.

Consider the following example, where a pair of global variables are accessed concurrently by two threads:

           Global
           int x, y;

Thread 1            Thread 2
x = 17;             cout << y << " ";
y = 37;             cout << x << endl;

What might Thread 2 output?

Under C++98/C++03, this is not even Undefined Behavior; the question itself is meaningless because the standard does not contemplate anything called a "thread".

Under C++11, the result is Undefined Behavior, because loads and stores need not be atomic in general. Which may not seem like much of an improvement... And by itself, it's not.

But with C++11, you can write this:

           Global
           atomic<int> x, y;

Thread 1                 Thread 2
x.store(17);             cout << y.load() << " ";
y.store(37);             cout << x.load() << endl;

Now things get much more interesting. First of all, the behavior here is defined. Thread 2 could now print 0 0 (if it runs before Thread 1), 37 17 (if it runs after Thread 1), or 0 17 (if it runs after Thread 1 assigns to x but before it assigns to y).

What it cannot print is 37 0, because the default mode for atomic loads/stores in C++11 is to enforce sequential consistency. This just means all loads and stores must be "as if" they happened in the order you wrote them within each thread, while operations among threads can be interleaved however the system likes. So the default behavior of atomics provides both atomicity and ordering for loads and stores.

Now, on a modern CPU, ensuring sequential consistency can be expensive. In particular, the compiler is likely to emit full-blown memory barriers between every access here. But if your algorithm can tolerate out-of-order loads and stores; i.e., if it requires atomicity but not ordering; i.e., if it can tolerate 37 0 as output from this program, then you can write this:

           Global
           atomic<int> x, y;

Thread 1                            Thread 2
x.store(17,memory_order_relaxed);   cout << y.load(memory_order_relaxed) << " ";
y.store(37,memory_order_relaxed);   cout << x.load(memory_order_relaxed) << endl;

The more modern the CPU, the more likely this is to be faster than the previous example.

Finally, if you just need to keep particular loads and stores in order, you can write:

           Global
           atomic<int> x, y;

Thread 1                            Thread 2
x.store(17,memory_order_release);   cout << y.load(memory_order_acquire) << " ";
y.store(37,memory_order_release);   cout << x.load(memory_order_acquire) << endl;

This takes us back to the ordered loads and stores – so 37 0 is no longer a possible output – but it does so with minimal overhead. (In this trivial example, the result is the same as full-blown sequential consistency; in a larger program, it would not be.)

Of course, if the only outputs you want to see are 0 0 or 37 17, you can just wrap a mutex around the original code. But if you have read this far, I bet you already know how that works, and this answer is already longer than I intended :-).

So, bottom line. Mutexes are great, and C++11 standardizes them. But sometimes for performance reasons you want lower-level primitives (e.g., the classic double-checked locking pattern). The new standard provides high-level gadgets like mutexes and condition variables, and it also provides low-level gadgets like atomic types and the various flavors of memory barrier. So now you can write sophisticated, high-performance concurrent routines entirely within the language specified by the standard, and you can be certain your code will compile and run unchanged on both today's systems and tomorrow's.

Although to be frank, unless you are an expert and working on some serious low-level code, you should probably stick to mutexes and condition variables. That's what I intend to do.

For more on this stuff, see this blog post.

C++17 Copy Elision – How Guaranteed Copy Elision Works

Copy elision was permitted to happen under a number of circumstances. However, even if it was permitted, the code still had to be able to work as if the copy were not elided. Namely, there had to be an accessible copy and/or move constructor.

Guaranteed copy elision redefines a number of C++ concepts, such that certain circumstances where copies/moves could be elided don't actually provoke a copy/move at all. The compiler isn't eliding a copy; the standard says that no such copying could ever happen.

Consider this function:

T Func() {return T();}

Under non-guaranteed copy elision rules, this will create a temporary, then move from that temporary into the function's return value. That move operation may be elided, but T must still have an accessible move constructor even if it is never used.

Similarly:

T t = Func();

This is copy initialization of t. This will copy initialize t with the return value of Func. However, T still has to have a move constructor, even though it will not be called.

Guaranteed copy elision redefines the meaning of a prvalue expression. Pre-C++17, prvalues are temporary objects. In C++17, a prvalue expression is merely something which can materialize a temporary, but it isn't a temporary yet.

If you use a prvalue to initialize an object of the prvalue's type, then no temporary is materialized. When you do return T();, this initializes the return value of the function via a prvalue. Since that function returns T, no temporary is created; the initialization of the prvalue simply directly initilaizes the return value.

The thing to understand is that, since the return value is a prvalue, it is not an object yet. It is merely an initializer for an object, just like T() is.

When you do T t = Func();, the prvalue of the return value directly initializes the object t; there is no "create a temporary and copy/move" stage. Since Func()'s return value is a prvalue equivalent to T(), t is directly initialized by T(), exactly as if you had done T t = T().

If a prvalue is used in any other way, the prvalue will materialize a temporary object, which will be used in that expression (or discarded if there is no expression). So if you did const T &rt = Func();, the prvalue would materialize a temporary (using T() as the initializer), whose reference would be stored in rt, along with the usual temporary lifetime extension stuff.

One thing guaranteed elision permits you to do is return objects which are immobile. For example, lock_guard cannot be copied or moved, so you couldn't have a function that returned it by value. But with guaranteed copy elision, you can.

Guaranteed elision also works with direct initialization:

new T(FactoryFunction());

If FactoryFunction returns T by value, this expression will not copy the return value into the allocated memory. It will instead allocate memory and use the allocated memory as the return value memory for the function call directly.

So factory functions that return by value can directly initialize heap allocated memory without even knowing about it. So long as these function internally follow the rules of guaranteed copy elision, of course. They have to return a prvalue of type T.

Of course, this works too:

new auto(FactoryFunction());

In case you don't like writing typenames.

It is important to recognize that the above guarantees only work for prvalues. That is, you get no guarantee when returning a named variable:

T Func()
{
   T t = ...;
   ...
   return t;
}

In this instance, t must still have an accessible copy/move constructor. Yes, the compiler can choose to optimize away the copy/move. But the compiler must still verify the existence of an accessible copy/move constructor.

So nothing changes for named return value optimization (NRVO).

Best Answer

Related Solutions

C++11 – Understanding the Standardized Memory Model

C++17 Copy Elision – How Guaranteed Copy Elision Works

Related Question