First, you have to learn to think like a Language Lawyer.
The C++ specification does not make reference to any particular compiler, operating system, or CPU. It makes reference to an abstract machine that is a generalization of actual systems. In the Language Lawyer world, the job of the programmer is to write code for the abstract machine; the job of the compiler is to actualize that code on a concrete machine. By coding rigidly to the spec, you can be certain that your code will compile and run without modification on any system with a compliant C++ compiler, whether today or 50 years from now.
The abstract machine in the C++98/C++03 specification is fundamentally single-threaded. So it is not possible to write multi-threaded C++ code that is "fully portable" with respect to the spec. The spec does not even say anything about the atomicity of memory loads and stores or the order in which loads and stores might happen, never mind things like mutexes.
Of course, you can write multi-threaded code in practice for particular concrete systems – like pthreads or Windows. But there is no standard way to write multi-threaded code for C++98/C++03.
The abstract machine in C++11 is multi-threaded by design. It also has a well-defined memory model; that is, it says what the compiler may and may not do when it comes to accessing memory.
Consider the following example, where a pair of global variables are accessed concurrently by two threads:
Global
int x, y;
Thread 1 Thread 2
x = 17; cout << y << " ";
y = 37; cout << x << endl;
What might Thread 2 output?
Under C++98/C++03, this is not even Undefined Behavior; the question itself is meaningless because the standard does not contemplate anything called a "thread".
Under C++11, the result is Undefined Behavior, because loads and stores need not be atomic in general. Which may not seem like much of an improvement... And by itself, it's not.
But with C++11, you can write this:
Global
atomic<int> x, y;
Thread 1 Thread 2
x.store(17); cout << y.load() << " ";
y.store(37); cout << x.load() << endl;
Now things get much more interesting. First of all, the behavior here is defined. Thread 2 could now print 0 0
(if it runs before Thread 1), 37 17
(if it runs after Thread 1), or 0 17
(if it runs after Thread 1 assigns to x but before it assigns to y).
What it cannot print is 37 0
, because the default mode for atomic loads/stores in C++11 is to enforce sequential consistency. This just means all loads and stores must be "as if" they happened in the order you wrote them within each thread, while operations among threads can be interleaved however the system likes. So the default behavior of atomics provides both atomicity and ordering for loads and stores.
Now, on a modern CPU, ensuring sequential consistency can be expensive. In particular, the compiler is likely to emit full-blown memory barriers between every access here. But if your algorithm can tolerate out-of-order loads and stores; i.e., if it requires atomicity but not ordering; i.e., if it can tolerate 37 0
as output from this program, then you can write this:
Global
atomic<int> x, y;
Thread 1 Thread 2
x.store(17,memory_order_relaxed); cout << y.load(memory_order_relaxed) << " ";
y.store(37,memory_order_relaxed); cout << x.load(memory_order_relaxed) << endl;
The more modern the CPU, the more likely this is to be faster than the previous example.
Finally, if you just need to keep particular loads and stores in order, you can write:
Global
atomic<int> x, y;
Thread 1 Thread 2
x.store(17,memory_order_release); cout << y.load(memory_order_acquire) << " ";
y.store(37,memory_order_release); cout << x.load(memory_order_acquire) << endl;
This takes us back to the ordered loads and stores – so 37 0
is no longer a possible output – but it does so with minimal overhead. (In this trivial example, the result is the same as full-blown sequential consistency; in a larger program, it would not be.)
Of course, if the only outputs you want to see are 0 0
or 37 17
, you can just wrap a mutex around the original code. But if you have read this far, I bet you already know how that works, and this answer is already longer than I intended :-).
So, bottom line. Mutexes are great, and C++11 standardizes them. But sometimes for performance reasons you want lower-level primitives (e.g., the classic double-checked locking pattern). The new standard provides high-level gadgets like mutexes and condition variables, and it also provides low-level gadgets like atomic types and the various flavors of memory barrier. So now you can write sophisticated, high-performance concurrent routines entirely within the language specified by the standard, and you can be certain your code will compile and run unchanged on both today's systems and tomorrow's.
Although to be frank, unless you are an expert and working on some serious low-level code, you should probably stick to mutexes and condition variables. That's what I intend to do.
For more on this stuff, see this blog post.
Let's get pedantic, because there are differences that can actually affect your code's behavior. Much of the following is taken from comments made to an Old New Thing article
Sometimes the memory returned by the new operator will be initialized, and sometimes it won't depending on whether the type you're newing up is a POD (plain old data), or if it's a class that contains POD members and is using a compiler-generated default constructor.
- In C++1998 there are 2 types of initialization: zero and default
- In C++2003 a 3rd type of initialization, value initialization was added.
Assume:
struct A { int m; }; // POD
struct B { ~B(); int m; }; // non-POD, compiler generated default ctor
struct C { C() : m() {}; ~C(); int m; }; // non-POD, default-initialising m
In a C++98 compiler, the following should occur:
new A
- indeterminate value
new A()
- zero-initialize
new B
- default construct (B::m is uninitialized)
new B()
- default construct (B::m is uninitialized)
new C
- default construct (C::m is zero-initialized)
new C()
- default construct (C::m is zero-initialized)
In a C++03 conformant compiler, things should work like so:
new A
- indeterminate value
new A()
- value-initialize A, which is zero-initialization since it's a POD.
new B
- default-initializes (leaves B::m uninitialized)
new B()
- value-initializes B which zero-initializes all fields since its default ctor is compiler generated as opposed to user-defined.
new C
- default-initializes C, which calls the default ctor.
new C()
- value-initializes C, which calls the default ctor.
So in all versions of C++ there's a difference between new A
and new A()
because A is a POD.
And there's a difference in behavior between C++98 and C++03 for the case new B()
.
This is one of the dusty corners of C++ that can drive you crazy. When constructing an object, sometimes you want/need the parens, sometimes you absolutely cannot have them, and sometimes it doesn't matter.
Best Answer
C++ does not really deal with physical machines. It targets some abstract machine that just has some sort of uniform memory. For C++ standard, there is no such thing as physical or virtual memory, so it cannot specify either way. So this is the answer to your question as asked.
Now, if you were to ask "will my new[] allocated memory be allocated in physical memory in reality?", the answer would be "it depends on the environment". For example, your C++ program could target some sort of microcontrontroller like ARM Cortex M3, which has no MMU to speak of. In this case,
new[]
would give you a chunk of contiguous physical memory at the address you are given (and there is no concept of virtual memory there at all, so your question makes no sense in that environment). You could target DOS, in which case you also get a contiguous physical memory at the address you are given (unless your program runs under VDM of some sort, like in Windows, in which case it doesn't get physical memory). You could target some OS that has MMU but no paging or on-demand allocation. In that case, you'll get contiguous virtual memory which is backed by physical memory - but that physical memory might not be contiguous, and you won't know its address. Finally, you could target some "usual" OS environment with MMU and paging, like Linux - in which case you'll get virtual memory which might not be backed by physical memory (in case of Linux which has on-demand paging - you'll definitely not get physical memory for most of that returned block of virtual memory)