I am currently learning the basics of assembly and came across something odd when looking at the instructions generated by GCC(6.1.1).
Here is the source:
#include <stdio.h>
int foo(int x, int y){
return x*y;
}
int main(){
int a = 5;
int b = foo(a, 0xF00D);
printf("0x%X\n", b);
return 0;
}
Command used to compile: gcc -m32 -g test.c -o test
When examining the functions in GDB I get this:
(gdb) set disassembly-flavor intel
(gdb) disas main
Dump of assembler code for function main:
0x080483f7 <+0>: lea ecx,[esp+0x4]
0x080483fb <+4>: and esp,0xfffffff0
0x080483fe <+7>: push DWORD PTR [ecx-0x4]
0x08048401 <+10>: push ebp
0x08048402 <+11>: mov ebp,esp
0x08048404 <+13>: push ecx
0x08048405 <+14>: sub esp,0x14
0x08048408 <+17>: mov DWORD PTR [ebp-0xc],0x5
0x0804840f <+24>: push 0xf00d
0x08048414 <+29>: push DWORD PTR [ebp-0xc]
0x08048417 <+32>: call 0x80483eb <foo>
0x0804841c <+37>: add esp,0x8
0x0804841f <+40>: mov DWORD PTR [ebp-0x10],eax
0x08048422 <+43>: sub esp,0x8
0x08048425 <+46>: push DWORD PTR [ebp-0x10]
0x08048428 <+49>: push 0x80484d0
0x0804842d <+54>: call 0x80482c0 <printf@plt>
0x08048432 <+59>: add esp,0x10
0x08048435 <+62>: mov eax,0x0
0x0804843a <+67>: mov ecx,DWORD PTR [ebp-0x4]
0x0804843d <+70>: leave
0x0804843e <+71>: lea esp,[ecx-0x4]
0x08048441 <+74>: ret
End of assembler dump.
(gdb) disas foo
Dump of assembler code for function foo:
0x080483eb <+0>: push ebp
0x080483ec <+1>: mov ebp,esp
0x080483ee <+3>: mov eax,DWORD PTR [ebp+0x8]
0x080483f1 <+6>: imul eax,DWORD PTR [ebp+0xc]
0x080483f5 <+10>: pop ebp
0x080483f6 <+11>: ret
End of assembler dump.
The part that confuses me is what it is trying to do with the stack.
From my understanding this is what it does:
- It takes a reference to some memory address 4 bytes higher in the stack which from my knowledge should be the variables passed to main since
esp
currently pointed to the return address in memory. - It aligns the stack to a 0 boundary for performance reasons.
- It pushes onto the new stack area
ecx+4
which should translate to pushing the address we are suppose to be returning to on the stack. - It pushes the old frame pointer onto the stack and sets up the new one.
- It pushes
ecx
(which is still pointing to would should be an argument tomain
) onto the stack.
Then the program does what it should and begins the process of returning:
- It restores
ecx
by using a-0x4
offset onebp
which should access the first local variable. - It executes the leave instruction which really just sets
esp
toebp
and then popsebp
from the stack.
So now the next thing on the stack is the return address and the esp and ebp registers should be back to what they need to be to return right?
Well evidently not because the next thing it does is load esp
with ecx-0x4
which since ecx
is still pointing to that variable passed to main
should put it at the address of return address on the stack.
This works just fine but raises the question: why did it bother to put the return address onto the stack in step 3 since it returned the stack to the original position at the end just before actually returning from the function?
Best Answer
Update: gcc8 simplifies this at least for normal use-cases (
-fomit-frame-pointer
, and noalloca
or C99 VLAs that require variable-size allocation). Perhaps motivated by increasing usage of AVX leading to more functions wanting a 32-byte aligned local or array.Except for
main
in 32-bit code, then it still does the full return address+frame-pointer backtrace-friendly version even with-O3 -fomit-frame-pointer
. https://gcc.godbolt.org/z/6cehMP774Also, probably a duplicate of What's up with gcc weird stack manipulation when it wants extra stack alignment?
This complicated prologue is fine if it only ever runs a couple times (e.g. at the start of
main
in 32-bit code), but the more it appears the more worthwhile it is to optimize it. GCC sometimes still over-aligns the stack in functions where all >16-byte aligned objects are optimized into registers, which is a missed optimization already but less bad when the stack alignment is cheaper.gcc makes some clunky code when aligning the stack within a function, even with optimization enabled. I have a possible theory (see below) on why gcc might be copying the return address to just above where it saves
ebp
to make a stack frame (and yes, I agree that's what gcc is doing). It doesn't look necessary in this function, and clang doesn't do anything like that.Besides that, the nonsense with
ecx
is probably just gcc not optimizing away unneeded parts of its align-the-stack boilerplate. (The pre-alignment value ofesp
is needed to reference args on the stack, so it makes sense that it puts the address of the first would-be arg into a register).You see the same thing with optimization in 32-bit code (where gcc makes a
main
that doesn't assume 16B stack alignment, even though the current version of the ABI requires that at process startup, and the CRT code that callsmain
either aligns the stack itself or preserves the initial alignment provided by the kernel, I forget). You also see this in functions that align the stack to more than 16B (e.g. functions that use__m256
types, sometimes even if they never spill them to the stack. Or functions with an array declared with C++11alignas(32)
, or any other way of requesting alignment.) In 64-bit code, gcc always seems to user10
for this, notrcx
.There's nothing required for ABI compliance about the way gcc does it, because clang does something much simpler.
I added an aligned variable (with
volatile
as a simple way to force the compiler to actually reserve aligned space for it on the stack, instead of optimizing it away). I put your code on the Godbolt compiler explorer, to look at the asm with-O3
. I see the same behaviour from gcc 4.9, 5.3, and 6.1, but different behaviour with clang.Clang3.8's
-O3 -m32
output is functionally identical to its-m64
output. Note that-O3
enables-fomit-frame-pointer
, but some functions make stack frames anyway.gcc's output is nearly the same between
-m32
and-m64
, but it putsv
in the red-zone with-m64
so the-m32
output has two extra instructions:It seems that gcc wants to make its stack frame (with
push ebp
) after aligning the stack. I guess that makes sense, so it can reference locals relative toebp
. Otherwise it would have to useesp
-relative addressing, if it wanted aligned locals.My theory on why gcc does this:
The extra copy of the return address after aligning but before pushing
ebp
means that the return address is copied to the expected place relative to the savedebp
value (and the value that will be inebp
when child functions are called). So this does potentially help code that wants to unwind the stack by following the linked list of stack frames, and looking at return-addresses to find out what function is involved.I'm not sure whether this matters with modern stack-unwind info that allows stack-unwinding (backtraces / exception handling) with
-fomit-frame-pointer
. (It's metadata in the.eh_frame
section. This is what the.cfi_*
directives around every modification toesp
are for.) I should look at what clang does when it has to align the stack in a non-leaf function.The original value of
esp
would be needed inside the function to reference function args on the stack. I think gcc doesn't know how to optimize away unneeded parts of its align-the-stack method. (e.g. outmain
doesn't look at its args (and is declared not to take any))This kind of code-gen is typical of what you see in a function that needs to align the stack; it's not extra weird because of using a
volatile
with automatic storage.