GCC – Why Extra Space in Stack Allocation?

assemblygccmemory-alignmentstackx86

I was playing around a bit to get a better grip on calling conventions and how the stack is handled, but I can't figure out why main allocates three extra double words when setting up the stack (at <main+0>). It's neither aligned to 8 bytes nor 16 bytes, so that's not why as far as I know. As I see it, main requires 12 bytes for the two parameters to func and the return value.

What am I missing?

The program is C code compiled with "gcc -ggdb" on a x86 architecture.

Edit: I removed the -O0 flag from gcc, and it made no difference to the output.

(gdb) disas main
Dump of assembler code for function main:
    0x080483d1 <+0>:    sub    esp,0x18
    0x080483d4 <+3>:    mov    DWORD PTR [esp+0x4],0x7
    0x080483dc <+11>:   mov    DWORD PTR [esp],0x3
    0x080483e3 <+18>:   call   0x80483b4 <func>
    0x080483e8 <+23>:   mov    DWORD PTR [esp+0x14],eax
    0x080483ec <+27>:   add    esp,0x18
    0x080483ef <+30>:   ret    
End of assembler dump.

Edit: Of course I should have posted the C code:

int func(int a, int b) {
    int c = 9;
    return a + b + c;
}

void main() {
    int x;
    x = func(3, 7);
}

The platform is Arch Linux i686.

Best Answer

The parameters to a function (including, but not limited to main) are already on the stack when you enter the function. The space you allocate inside the function is for local variables. For functions with simple return types such as int, the return value will normally be in a register (eax, with a typical 32-bit compiler on x86).

If, for example, main was something like this:

int main(int argc, char **argv) { 
   char a[35];

   return 0;
}

...we'd expect to see at least 35 bytes allocated on the stack as we entered main to make room for a. Assuming a 32-bit implementation, that would normally be rounded up to the next multiple of 4 (36, in this case) to maintain 32-bit alignment of the stack. We would not expect to see any space allocated for the return value. argc and argv would be on the stack, but they'd already be on the stack before main was entered, so main would not have to do anything to allocate space for them.

In the case above, after allocating space for a, a would typicaly start at [esp-36], argv would be at [esp-44] and argc would be at [esp-48] (or those two might be reversed -- depending on whether arguments were pushed left to right or right to left). In case you're wondering why I skipped [esp-40], that would be the return address.

Edit: Here's a diagram of the stack on entry to the function, and after setting up the stack frame:

enter image description here

Edit 2: Based on your updated question, what you have is slightly roundabout, but not particularly hard to understand. Upon entry to main, it's allocating space not only for the variables local to main, but also for the parameters you're passing to the function you call from main.

That accounts for at least some of the extra space being allocated (though not necessarily all of it).

Related Solutions

C Stack Allocation – Understanding Padding and Alignment

It's a gcc feature controlled by -mpreferred-stack-boundary=n where the compiler tries to keep items on the stack aligned to 2^n. If you changed n to 2, it would only allocate 8 bytes on the stack. The default value for n is 4 i.e. it will try to align to 16-byte boundaries.

Why there's the "default" 8 bytes and then 24=8+16 bytes is because the stack already contains 8 bytes for leave and ret, so the compiled code must adjust the stack first by 8 bytes to get it aligned to 2^4=16.

GCC Assembly – Why Extra Return Address on Stack?

Update: gcc8 simplifies this at least for normal use-cases (-fomit-frame-pointer, and no alloca or C99 VLAs that require variable-size allocation). Perhaps motivated by increasing usage of AVX leading to more functions wanting a 32-byte aligned local or array.

Except for main in 32-bit code, then it still does the full return address+frame-pointer backtrace-friendly version even with -O3 -fomit-frame-pointer. https://gcc.godbolt.org/z/6cehMP774

Also, probably a duplicate of What's up with gcc weird stack manipulation when it wants extra stack alignment?

This complicated prologue is fine if it only ever runs a couple times (e.g. at the start of main in 32-bit code), but the more it appears the more worthwhile it is to optimize it. GCC sometimes still over-aligns the stack in functions where all >16-byte aligned objects are optimized into registers, which is a missed optimization already but less bad when the stack alignment is cheaper.

gcc makes some clunky code when aligning the stack within a function, even with optimization enabled. I have a possible theory (see below) on why gcc might be copying the return address to just above where it saves ebp to make a stack frame (and yes, I agree that's what gcc is doing). It doesn't look necessary in this function, and clang doesn't do anything like that.

Besides that, the nonsense with ecx is probably just gcc not optimizing away unneeded parts of its align-the-stack boilerplate. (The pre-alignment value of esp is needed to reference args on the stack, so it makes sense that it puts the address of the first would-be arg into a register).

You see the same thing with optimization in 32-bit code (where gcc makes a main that doesn't assume 16B stack alignment, even though the current version of the ABI requires that at process startup, and the CRT code that calls main either aligns the stack itself or preserves the initial alignment provided by the kernel, I forget). You also see this in functions that align the stack to more than 16B (e.g. functions that use __m256 types, sometimes even if they never spill them to the stack. Or functions with an array declared with C++11 alignas(32), or any other way of requesting alignment.) In 64-bit code, gcc always seems to use r10 for this, not rcx.

There's nothing required for ABI compliance about the way gcc does it, because clang does something much simpler.

I added an aligned variable (with volatile as a simple way to force the compiler to actually reserve aligned space for it on the stack, instead of optimizing it away). I put your code on the Godbolt compiler explorer, to look at the asm with -O3. I see the same behaviour from gcc 4.9, 5.3, and 6.1, but different behaviour with clang.

int main(){
    __attribute__((aligned(32))) volatile int v = 1;
    return 0;
}

Clang3.8's -O3 -m32 output is functionally identical to its -m64 output. Note that -O3 enables -fomit-frame-pointer, but some functions make stack frames anyway.

    push    ebp
    mov     ebp, esp                # make a stack frame *before* aligning, so ebp-relative addressing can only access stack args, not aligned locals.
    and     esp, -32
    sub     esp, 32                 # esp is 32B aligned with 32 or 48B above esp reserved (depending on incoming alignment)
    mov     dword ptr [esp], 1      # store v
    xor     eax, eax                # return 0
    mov     esp, ebp                # leave
    pop     ebp
    ret

gcc's output is nearly the same between -m32 and -m64, but it puts v in the red-zone with -m64 so the -m32 output has two extra instructions:

    # gcc 6.1 -m32 -O3 -fverbose-asm.  Most of gcc's comment lines are empty.  I guess that means it has no idea why it's emitting those insns :P
    lea     ecx, [esp+4]      #,   get a pointer to where the first arg would be
    and     esp, -32  #,          align
    xor     eax, eax  #           return 0
    push    DWORD PTR [ecx-4]       #  No clue WTF this is for; this looks batshit insane, but happens even in 64bit mode.
    push    ebp     #             make a stackframe, even though -fomit-frame-pointer is on by default and we can already restore the original esp from ecx (unlike clang)
    mov     ebp, esp  #,
    push    ecx     #             save the old esp value (even though this function doesn't clobber ecx...)
    sub     esp, 52   #,          reserve space for v  (not present with -m64)
    mov     DWORD PTR [ebp-56], 1     # v,
    add     esp, 52   #,          unreserve (not present with -m64)
    pop     ecx       #           restore ecx (even though nothing clobbered it)
    pop     ebp       #           at least it knows it can just pop instead of `leave`
    lea     esp, [ecx-4]      #,  restore pre-alignment esp
    ret

It seems that gcc wants to make its stack frame (with push ebp) after aligning the stack. I guess that makes sense, so it can reference locals relative to ebp. Otherwise it would have to use esp-relative addressing, if it wanted aligned locals.

My theory on why gcc does this:

The extra copy of the return address after aligning but before pushing ebp means that the return address is copied to the expected place relative to the saved ebp value (and the value that will be in ebp when child functions are called). So this does potentially help code that wants to unwind the stack by following the linked list of stack frames, and looking at return-addresses to find out what function is involved.

I'm not sure whether this matters with modern stack-unwind info that allows stack-unwinding (backtraces / exception handling) with -fomit-frame-pointer. (It's metadata in the .eh_frame section. This is what the .cfi_* directives around every modification to esp are for.) I should look at what clang does when it has to align the stack in a non-leaf function.

The original value of esp would be needed inside the function to reference function args on the stack. I think gcc doesn't know how to optimize away unneeded parts of its align-the-stack method. (e.g. out main doesn't look at its args (and is declared not to take any))

This kind of code-gen is typical of what you see in a function that needs to align the stack; it's not extra weird because of using a volatile with automatic storage.

Best Answer

Related Solutions

C Stack Allocation – Understanding Padding and Alignment

GCC Assembly – Why Extra Return Address on Stack?

My theory on why gcc does this:

Related Question