I recently asked this question:
Why is iterating an std::array much faster than iterating an std::vector?
As people quickly pointed out, my benchmark had many flaws. So as I was trying to fix my benchmark, I noticed that std::vector
wasn't slower than std::array
and, in fact, it was quite the opposite.
#include <vector>
#include <array>
#include <stdio.h>
#include <chrono>
using namespace std;
constexpr int n = 100'000'000;
vector<int> v(n);
//array<int, n> v;
int main()
{
int res = 0;
auto start = chrono::steady_clock::now();
for(int x : v)
res += x;
auto end = chrono::steady_clock::now();
auto diff = end - start;
double elapsed =
std::chrono::duration_cast<
std::chrono::duration<double, std::milli>
>(end - start).count();
printf("result: %d\ntime: %f\n", res, elapsed);
}
Things I've tried to improve from my previous benchmark:
- Made sure I'm using the result, so the whole loop is not optimized away
- Using
-O3
flag for speed - Use
std::chrono
instead of thetime
command. That's so we can isolate the part we want to measure (just the for loop). Static initialization of variables and things like that won't be measured.
The measured times:
array:
$ g++ arrVsVec.cpp -O3
$ ./a.out
result: 0
time: 99.554109
vector:
$ g++ arrVsVec.cpp -O3
$ ./a.out
result: 0
time: 30.734491
I'm just wondering what I'm doing wrong this time.
Best Answer
The difference is due to memory pages of
array
not being resident in process address space (global scope array is stored in.bss
section of the executable that hasn't been paged in, it is zero-initialized). Whereasvector
has just been allocated and zero-filled, so its memory pages are already present.If you add
as the first line of
main
to bring the pages in (pre-fault), that makesarray
time the same as that ofvector
.On Linux, instead of that, you can do
mlock(v.data(), v.size() * sizeof(v[0]));
to bring the pages into the address space. Seeman mlock
for full details.