I was wondering whether the compiler would use different padding on 32-bit and 64-bit systems, so I wrote the code below in a simple VS2019 C++ console project:
struct Z
{
char s;
__int64 i;
};
int main()
{
std::cout << sizeof(Z) <<"\n";
}
What I expected on each "Platform" setting:
x86: 12
X64: 16
Actual result:
x86: 16
X64: 16
Since the memory word size on x86 is 4 bytes, this means it has to store the bytes of i
in two different words. So I thought the compiler would do padding this way:
struct Z
{
char s;
char _pad[3];
__int64 i;
};
So may I know what the reason behind this is?
- For forward-compatibility with the 64-bit system?
- Due to the limitation of supporting 64-bit numbers on the 32-bit processor?
Best Answer
Size and
alignof()
(minimum alignment that any object of that type must have) for each primitive type is an ABI1 design choice separate from the register width of the architecture.Struct-packing rules can also be more complicated than just aligning each struct member to its minimum alignment inside the struct; that's another part of the ABI.
MSVC targeting 32-bit x86 gives
__int64
a minimum alignment of 4, but its default struct-packing rules align types within structs tomin(8, sizeof(T))
relative to the start of the struct. (For non-aggregate types only). That's not a direct quote, that's my paraphrase of the MSVC docs link from @P.W's answer, based on what MSVC seems to actually do. (I suspect the "whichever is less" in the text is supposed to be outside the parens, but maybe they're making a different point about the interaction on the pragma and the command-line option?)(An 8-byte struct containing a
char[8]
still only gets 1-byte alignment inside another struct, or a struct containing analignas(16)
member still gets 16-byte alignment inside another struct.)Note that ISO C++ doesn't guarantee that primitive types have
alignof(T) == sizeof(T)
. Also note that MSVC's definition ofalignof()
doesn't match the ISO C++ standard: MSVC saysalignof(__int64) == 8
, but some__int64
objects have less than that alignment2.So surprisingly, we get extra padding even though MSVC doesn't always bother to make sure the struct itself has any more than 4-byte alignment, unless you specify that with
alignas()
on the variable, or on a struct member to imply that for the type. (e.g. a localstruct Z tmp
on the stack inside a function will only have 4-byte alignment, because MSVC doesn't use extra instructions likeand esp, -8
to round the stack pointer down to an 8-byte boundary.)However,
new
/malloc
does give you 8-byte-aligned memory in 32-bit mode, so this makes a lot of sense for dynamically-allocated objects (which are common). Forcing locals on the stack to be fully aligned would add cost to align the stack pointer, but by setting struct layout to take advantage of 8-byte-aligned storage, we get the advantage for static and dynamic storage.This might also be designed to get 32 and 64-bit code to agree on some struct layouts for shared memory. (But note that the default for x86-64 is
min(16, sizeof(T))
, so they still don't fully agree on struct layout if there are any 16-byte types that aren't aggregates (struct/union/array) and don't have analignas
.)The minimum absolute alignment of 4 comes from the 4-byte stack alignment that 32-bit code can assume. In static storage, compilers will choose natural alignment up to maybe 8 or 16 bytes for vars outside of structs, for efficient copying with SSE2 vectors.
In larger functions, MSVC may decide to align the stack by 8 for performance reasons, e.g. for
double
vars on the stack which actually can be manipulated with single instructions, or maybe also forint64_t
with SSE2 vectors. See the Stack Alignment section in this 2006 article: Windows Data Alignment on IPF, x86, and x64. So in 32-bit code you can't depend on anint64_t*
ordouble*
being naturally aligned.(I'm not sure if MSVC will ever create even less aligned
int64_t
ordouble
objects on its own. Certainly yes if you use#pragma pack 1
or-Zp1
, but that changes the ABI. But otherwise probably not, unless you carve space for anint64_t
out of a buffer manually and don't bother to align it. But assumingalignof(int64_t)
is still 8, that would be C++ undefined behaviour.)If you use
alignas(8) int64_t tmp
, MSVC emits extra instructions toand esp, -8
. If you don't, MSVC doesn't do anything special, so it's luck whether or nottmp
ends up 8-byte aligned or not.Other designs are possible, for example the i386 System V ABI (used on most non-Windows OSes) has
alignof(long long) = 4
butsizeof(long long) = 8
. These choicesOutside of structs (e.g. global vars or locals on the stack), modern compilers in 32-bit mode do choose to align
int64_t
to an 8-byte boundary for efficiency (so it can be loaded / copied with MMX or SSE2 64-bit loads, or x87fild
to do int64_t -> double conversion).This is one reason why modern version of the i386 System V ABI maintain 16-byte stack alignment: so 8-byte and 16-byte aligned local vars are possible.
When the 32-bit Windows ABI was being designed, Pentium CPUs were at least on the horizon. Pentium has 64-bit wide data busses, so its FPU really can load a 64-bit
double
in a single cache access if it's 64-bit aligned.Or for
fild
/fistp
, load/store a 64-bit integer when converting to/fromdouble
. Fun fact: naturally aligned accesses up to 64 bits are guaranteed atomic on x86, since Pentium: Why is integer assignment on a naturally aligned variable atomic on x86?Footnote 1: An ABI also includes a calling convention, or in the case of MS Windows, a choice of various calling conventions which you can declare with function attributes like
__fastcall
), but the sizes and alignment-requirements for primitive types likelong long
are also something that compilers have to agree on to make functions that can call each other. (The ISO C++ standard only talks about a single "C++ implementation"; ABI standards are how "C++ implementations" make themselves compatible with each other.)Note that struct-layout rules are also part of the ABI: compilers have to agree with each other on struct layout to create compatible binaries that pass around structs or pointers to structs. Otherwise
s.x = 10; foo(&x);
might write to a different offset relative to the base of the struct than separately-compiledfoo()
(maybe in a DLL) was expecting to read it at.Footnote 2:
GCC had this C++
alignof()
bug, too, until it was fixed in 2018 for g++8 some time after being fixed for C11_Alignof()
. See that bug report for some discussion based on quotes from the standard which conclude thatalignof(T)
should really report the minimum guaranteed alignment you can ever see, not the preferred alignment you want for performance. i.e. that using anint64_t*
with less thanalignof(int64_t)
alignment is undefined behaviour.(It will usually work fine on x86, but vectorization that assumes a whole number of
int64_t
iterations will reach a 16 or 32-byte alignment boundary can fault. See Why does unaligned access to mmap'ed memory sometimes segfault on AMD64? for an example with gcc.)The gcc bug report discusses the i386 System V ABI, which has different struct-packing rules than MSVC: based on minimum alignment, not preferred. But modern i386 System V maintains 16-byte stack alignment, so it's only inside structs (because of struct-packing rules that are part of the ABI) that the compiler ever creates
int64_t
anddouble
objects that are less than naturally aligned. Anyway, that's why the GCC bug report was discussing struct members as the special case.Kind of the opposite from 32-bit Windows with MSVC where the struct-packing rules are compatible with an
alignof(int64_t) == 8
but locals on the stack are always potentially under-aligned unless you usealignas()
to specifically request alignment.32-bit MSVC has the bizarre behaviour that
alignas(int64_t) int64_t tmp
is not the same asint64_t tmp;
, and emits extra instructions to align the stack. That's becausealignas(int64_t)
is likealignas(8)
, which is more aligned than the actual minimum.(32-bit) x86 MSVC 19.20 -O2 compiles it like so (on Godbolt, also includes 32-bit GCC and the struct test-case):
But without the
alignas()
, or withalignas(4)
, we get the much simplerIt could just
push esp
instead of LEA/push; that's a minor missed optimization.Passing a pointer to a non-inline function proves that it's not just locally bending the rules. Some other function that just gets an
int64_t*
as an arg has to deal with this potentially under-aligned pointer, without having gotten any information about where it came from.If
alignof(int64_t)
was really 8, that function could be hand-written in asm in a way that faulted on misaligned pointers. Or it could be written in C with SSE2 intrinsics like_mm_load_si128()
that require 16-byte alignment, after handling 0 or 1 elements to reach an alignment boundary.But with MSVC's actual behaviour, it's possible that none of the
int64_t
array elements are aligned by 16, because they all span an 8-byte boundary.BTW, I wouldn't recommend using compiler-specific types like
__int64
directly. You can write portable code by usingint64_t
from<cstdint>
, aka<stdint.h>
.In MSVC,
int64_t
will be the same type as__int64
.On other platforms, it will typically be
long
orlong long
.int64_t
is guaranteed to be exactly 64 bits with no padding, and 2's complement, if provided at all. (It is by all sane compilers targeting normal CPUs. C99 and C++ requirelong long
to be at least 64-bit, and on machines with 8-bit bytes and registers that are a power of 2,long long
is normally exactly 64 bits and can be used asint64_t
. Or iflong
is a 64-bit type, then<cstdint>
might use that as the typedef.)I assume
__int64
andlong long
are the same type in MSVC, but MSVC doesn't enforce strict-aliasing anyway so it doesn't matter whether they're the exact same type or not, just that they use the same representation.