C++ Core Guidelines – Is Rule F19 Incomplete?

c++c++20cpp-core-guidelines

Flag a function that takes a TP&& parameter (where TP is a template type parameter name) and does anything with it other than std::forwarding it exactly once on every static path.

Literally, that means something like

template<typename R, typename V> bool contains(R&& range, const V& value)
{
  return std::find(range, value) != range.end();
}

Should be flagged because std::forward was not called (and even if it was called, it'd call .end()).

But to my understanding, this contains-implementation is correct and sensible.
I need the universal reference R&& because in real world, I need it to bind to const& values (1, see below) but I also need it to bind to values
that cannot become const (because begin() and end() aren't const) (2).
If I'd change the signature to contains(const R& range, const V& value), it wouldn't be compatible with most views.

int main()
{
  const std::vector<int> vs{1, 2, 3, 4};
  std::cout << contains(vs, 7);  // (1)
  std::cout << contains(vs | std::views::transform(squared), 7);  // (2)
}

Question: Does F19 miss some edge case or am I missing some detail?
The std::contains possible implementation also passes R as a universal reference without calling std::forward.

Alternative contains implementation using std::forward (but also .end(), hence not compliant with F19, either):

template<typename R, typename V> bool contains(R&& range, const V& value)
{
  const auto& end = range.end();
  return std::find(std::forward<R>(range), value) != end;
}

Best Answer

Yes.

Arguably the problem here is that forwarding references were originally introduced to serve one specific purpose: forwarding. That's why we ended up calling them forwarding references. In that context, it certainly makes sense to question using a forwarding reference that isn't forwarded. And indeed this is by far the most common usage of forwarding references.

However, forwarding references also end up serving a different purpose - one more inline with their original name: a universal reference. If you want to write a function that can accept either an lvalue or an rvalue today, you only have three choices:

template <class T> void f(T a); - this may be what you want, but it requires copying lvalues and forces a move of xvalues.
template <class T> void g(T const& b); - this is a good default choice, no move or copy happens, but b is always const
template <class T> void h(T&& c); - no move or copy happens, and c can be mutable

If you want to avoid a copy/move and you need to preserve const-ness, the only option is to use a forwarding reference. Which in this context is better named a universal reference. But it's the same syntax either way. Ranges make use of this pattern heavily - and there's nothing wrong with the pattern. It's the tool available at our disposal to solve this problem.

You might imagine that maybe if we had something like a parameter label, we might be able to differentiate between the contexts where we just want to have a universal reference and where we want to actually forward. But barring something like that, the "universal reference" use-case of forwarding references is perfectly valid. We just cannot syntactically distinguish between the cases where we intended to forward but forgot to (which should be flagged) and the cases where we did not intend to forward, and should not (which should not be flagged).

Introduction

C++ treats variables of user-defined types with value semantics. This means that objects are implicitly copied in various contexts, and we should understand what "copying an object" actually means.

Let us consider a simple example:

class person
{
    std::string name;
    int age;

public:

    person(const std::string& name, int age) : name(name), age(age)
    {
    }
};

int main()
{
    person a("Bjarne Stroustrup", 60);
    person b(a);   // What happens here?
    b = a;         // And here?
}

(If you are puzzled by the name(name), age(age) part, this is called a member initializer list.)

Special member functions

What does it mean to copy a person object? The main function shows two distinct copying scenarios. The initialization person b(a); is performed by the copy constructor. Its job is to construct a fresh object based on the state of an existing object. The assignment b = a is performed by the copy assignment operator. Its job is generally a little more complicated because the target object is already in some valid state that needs to be dealt with.

Since we declared neither the copy constructor nor the assignment operator (nor the destructor) ourselves, these are implicitly defined for us. Quote from the standard:

The [...] copy constructor and copy assignment operator, [...] and destructor are special member functions. [ Note: The implementation will implicitly declare these member functions for some class types when the program does not explicitly declare them. The implementation will implicitly define them if they are used. [...] end note ] [n3126.pdf section 12 §1]

By default, copying an object means copying its members:

The implicitly-defined copy constructor for a non-union class X performs a memberwise copy of its subobjects. [n3126.pdf section 12.8 §16]

The implicitly-defined copy assignment operator for a non-union class X performs memberwise copy assignment of its subobjects. [n3126.pdf section 12.8 §30]

Implicit definitions

The implicitly-defined special member functions for person look like this:

// 1. copy constructor
person(const person& that) : name(that.name), age(that.age)
{
}

// 2. copy assignment operator
person& operator=(const person& that)
{
    name = that.name;
    age = that.age;
    return *this;
}

// 3. destructor
~person()
{
}

Memberwise copying is exactly what we want in this case: name and age are copied, so we get a self-contained, independent person object. The implicitly-defined destructor is always empty. This is also fine in this case since we did not acquire any resources in the constructor. The members' destructors are implicitly called after the person destructor is finished:

After executing the body of the destructor and destroying any automatic objects allocated within the body, a destructor for class X calls the destructors for X's direct [...] members [n3126.pdf 12.4 §6]

Managing resources

So when should we declare those special member functions explicitly? When our class manages a resource, that is, when an object of the class is responsible for that resource. That usually means the resource is acquired in the constructor (or passed into the constructor) and released in the destructor.

Let us go back in time to pre-standard C++. There was no such thing as std::string, and programmers were in love with pointers. The person class might have looked like this:

class person
{
    char* name;
    int age;

public:

    // the constructor acquires a resource:
    // in this case, dynamic memory obtained via new[]
    person(const char* the_name, int the_age)
    {
        name = new char[strlen(the_name) + 1];
        strcpy(name, the_name);
        age = the_age;
    }

    // the destructor must release this resource via delete[]
    ~person()
    {
        delete[] name;
    }
};

Even today, people still write classes in this style and get into trouble: "I pushed a person into a vector and now I get crazy memory errors!" Remember that by default, copying an object means copying its members, but copying the name member merely copies a pointer, not the character array it points to! This has several unpleasant effects:

Changes via a can be observed via b.
Once b is destroyed, a.name is a dangling pointer.
If a is destroyed, deleting the dangling pointer yields undefined behavior.
Since the assignment does not take into account what name pointed to before the assignment, sooner or later you will get memory leaks all over the place.

Explicit definitions

Since memberwise copying does not have the desired effect, we must define the copy constructor and the copy assignment operator explicitly to make deep copies of the character array:

// 1. copy constructor
person(const person& that)
{
    name = new char[strlen(that.name) + 1];
    strcpy(name, that.name);
    age = that.age;
}

// 2. copy assignment operator
person& operator=(const person& that)
{
    if (this != &that)
    {
        delete[] name;
        // This is a dangerous point in the flow of execution!
        // We have temporarily invalidated the class invariants,
        // and the next statement might throw an exception,
        // leaving the object in an invalid state :(
        name = new char[strlen(that.name) + 1];
        strcpy(name, that.name);
        age = that.age;
    }
    return *this;
}

Note the difference between initialization and assignment: we must tear down the old state before assigning it to name to prevent memory leaks. Also, we have to protect against the self-assignment of the form x = x. Without that check, delete[] name would delete the array containing the source string, because when you write x = x, both this->name and that.name contain the same pointer.

Exception safety

Unfortunately, this solution will fail if new char[...] throws an exception due to memory exhaustion. One possible solution is to introduce a local variable and reorder the statements:

// 2. copy assignment operator
person& operator=(const person& that)
{
    char* local_name = new char[strlen(that.name) + 1];
    // If the above statement throws,
    // the object is still in the same state as before.
    // None of the following statements will throw an exception :)
    strcpy(local_name, that.name);
    delete[] name;
    name = local_name;
    age = that.age;
    return *this;
}

This also takes care of self-assignment without an explicit check. An even more robust solution to this problem is the copy-and-swap idiom, but I will not go into the details of exception safety here. I only mentioned exceptions to make the following point: Writing classes that manage resources is hard.

Noncopyable resources

Some resources cannot or should not be copied, such as file handles or mutexes. In that case, simply declare the copy constructor and copy assignment operator as private without giving a definition:

private:

    person(const person& that);
    person& operator=(const person& that);

Alternatively, you can inherit from boost::noncopyable or declare them as deleted (in C++11 and above):

person(const person& that) = delete;
person& operator=(const person& that) = delete;

The rule of three

Sometimes you need to implement a class that manages a resource. (Never manage multiple resources in a single class, this will only lead to pain.) In that case, remember the rule of three:

If you need to explicitly declare either the destructor, copy constructor or copy assignment operator yourself, you probably need to explicitly declare all three of them.

(Unfortunately, this "rule" is not enforced by the C++ standard or any compiler I am aware of.)

The rule of five

From C++11 on, an object has 2 extra special member functions: the move constructor and move assignment. The rule of five states to implement these functions as well.

An example with the signatures:

class person
{
    std::string name;
    int age;

public:
    person(const std::string& name, int age);        // Ctor
    person(const person &) = default;                // 1/5: Copy Ctor
    person(person &&) noexcept = default;            // 4/5: Move Ctor
    person& operator=(const person &) = default;     // 2/5: Copy Assignment
    person& operator=(person &&) noexcept = default; // 5/5: Move Assignment
    ~person() noexcept = default;                    // 3/5: Dtor
};

The rule of zero

The rule of 3/5 is also referred to as the rule of 0/3/5. The zero part of the rule states that you are allowed to not write any of the special member functions when creating your class.

Advice

Most of the time, you do not need to manage a resource yourself, because an existing class such as std::string already does it for you. Just compare the simple code using a std::string member to the convoluted and error-prone alternative using a char* and you should be convinced. As long as you stay away from raw pointer members, the rule of three is unlikely to concern your own code.

C++ Strict Aliasing Rule Explained

A typical situation where you encounter strict aliasing problems is when overlaying a struct (like a device/network msg) onto a buffer of the word size of your system (like a pointer to uint32_ts or uint16_ts). When you overlay a struct onto such a buffer, or a buffer onto such a struct through pointer casting you can easily violate strict aliasing rules.

So in this kind of setup, if I want to send a message to something I'd have to have two incompatible pointers pointing to the same chunk of memory. I might then naively code something like this:

typedef struct Msg
{
    unsigned int a;
    unsigned int b;
} Msg;

void SendWord(uint32_t);

int main(void)
{
    // Get a 32-bit buffer from the system
    uint32_t* buff = malloc(sizeof(Msg));
    
    // Alias that buffer through message
    Msg* msg = (Msg*)(buff);
    
    // Send a bunch of messages    
    for (int i = 0; i < 10; ++i)
    {
        msg->a = i;
        msg->b = i+1;
        SendWord(buff[0]);
        SendWord(buff[1]);   
    }
}

The strict aliasing rule makes this setup illegal: dereferencing a pointer that aliases an object that is not of a compatible type or one of the other types allowed by C 2011 6.5 paragraph 7¹ is undefined behavior. Unfortunately, you can still code this way, maybe get some warnings, have it compile fine, only to have weird unexpected behavior when you run the code.

(GCC appears somewhat inconsistent in its ability to give aliasing warnings, sometimes giving us a friendly warning and sometimes not.)

To see why this behavior is undefined, we have to think about what the strict aliasing rule buys the compiler. Basically, with this rule, it doesn't have to think about inserting instructions to refresh the contents of buff every run of the loop. Instead, when optimizing, with some annoyingly unenforced assumptions about aliasing, it can omit those instructions, load buff[0] and buff[1] into CPU registers once before the loop is run, and speed up the body of the loop. Before strict aliasing was introduced, the compiler had to live in a state of paranoia that the contents of buff could change by any preceding memory stores. So to get an extra performance edge, and assuming most people don't type-pun pointers, the strict aliasing rule was introduced.

Keep in mind, if you think the example is contrived, this might even happen if you're passing a buffer to another function doing the sending for you, if instead you have.

void SendMessage(uint32_t* buff, size_t size32)
{
    for (int i = 0; i < size32; ++i) 
    {
        SendWord(buff[i]);
    }
}

And rewrote our earlier loop to take advantage of this convenient function

for (int i = 0; i < 10; ++i)
{
    msg->a = i;
    msg->b = i+1;
    SendMessage(buff, 2);
}

The compiler may or may not be able to or smart enough to try to inline SendMessage and it may or may not decide to load or not load buff again. If SendMessage is part of another API that's compiled separately, it probably has instructions to load buff's contents. Then again, maybe you're in C++ and this is some templated header only implementation that the compiler thinks it can inline. Or maybe it's just something you wrote in your .c file for your own convenience. Anyway undefined behavior might still ensue. Even when we know some of what's happening under the hood, it's still a violation of the rule so no well defined behavior is guaranteed. So just by wrapping in a function that takes our word delimited buffer doesn't necessarily help.

So how do I get around this?

Use a union. Most compilers support this without complaining about strict aliasing. This is allowed in C99 and explicitly allowed in C11.
```
  union {
      Msg msg;
      unsigned int asBuffer[sizeof(Msg)/sizeof(unsigned int)];
  };
```
You can disable strict aliasing in your compiler (f[no-]strict-aliasing in gcc))
You can use char* for aliasing instead of your system's word. The rules allow an exception for char* (including signed char and unsigned char). It's always assumed that char* aliases other types. However this won't work the other way: there's no assumption that your struct aliases a buffer of chars.

Beginner beware

This is only one potential minefield when overlaying two types onto each other. You should also learn about endianness, word alignment, and how to deal with alignment issues through packing structs correctly.

Footnote

¹ The types that C 2011 6.5 7 allows an lvalue to access are:

a type compatible with the effective type of the object,
a qualified version of a type compatible with the effective type of the object,
a type that is the signed or unsigned type corresponding to the effective type of the object,
a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,
an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or
a character type.