I am trying to understand the assembly level code for a simple C program by inspecting it with gdb's disassembler.
Following is the C code:
#include <stdio.h>
void function(int a, int b, int c) {
char buffer1[5];
char buffer2[10];
}
void main() {
function(1,2,3);
}
Following is the disassembly code for both main
and function
gdb) disass main
Dump of assembler code for function main:
0x08048428 <main+0>: push %ebp
0x08048429 <main+1>: mov %esp,%ebp
0x0804842b <main+3>: and $0xfffffff0,%esp
0x0804842e <main+6>: sub $0x10,%esp
0x08048431 <main+9>: movl $0x3,0x8(%esp)
0x08048439 <main+17>: movl $0x2,0x4(%esp)
0x08048441 <main+25>: movl $0x1,(%esp)
0x08048448 <main+32>: call 0x8048404 <function>
0x0804844d <main+37>: leave
0x0804844e <main+38>: ret
End of assembler dump.
(gdb) disass function
Dump of assembler code for function function:
0x08048404 <function+0>: push %ebp
0x08048405 <function+1>: mov %esp,%ebp
0x08048407 <function+3>: sub $0x28,%esp
0x0804840a <function+6>: mov %gs:0x14,%eax
0x08048410 <function+12>: mov %eax,-0xc(%ebp)
0x08048413 <function+15>: xor %eax,%eax
0x08048415 <function+17>: mov -0xc(%ebp),%eax
0x08048418 <function+20>: xor %gs:0x14,%eax
0x0804841f <function+27>: je 0x8048426 <function+34>
0x08048421 <function+29>: call 0x8048340 <__stack_chk_fail@plt>
0x08048426 <function+34>: leave
0x08048427 <function+35>: ret
End of assembler dump.
I am seeking answers for following things :
- how the addressing is working , I mean (main+0) , (main+1), (main+3)
- In the main, why is $0xfffffff0,%esp being used
- In the function, why is %gs:0x14,%eax , %eax,-0xc(%ebp) being used.
- If someone can explain , step by step happening, that will be greatly appreciated.
Best Answer
The reason for the "strange" addresses such as
main+0
,main+1
,main+3
,main+6
and so on, is because each instruction takes up a variable number of bytes. For example:is a one-byte instruction so the next instruction is at
main+1
. On the other hand,is a three-byte instruction so the next instruction after that is at
main+6
.And, since you ask in the comments why
movl
seems to take a variable number of bytes, the explanation for that is as follows.Instruction length depends not only on the opcode (such as
movl
) but also the addressing modes for the operands as well (the things the opcode are operating on). I haven't checked specifically for your code but I suspect theinstruction is probably shorter because there's no offset involved - it just uses
esp
as the address. Whereas something like:requires everything that
movl $0x1,(%esp)
does, plus an extra byte for the offset0x4
.In fact, here's a debug session showing what I mean:
You can see that the second instruction with an offset is actually different to the first one without it. It's one byte longer (5 bytes instead of 4, to hold the offset) and actually has a different encoding
c745
instead ofc705
.You can also see that you can encode the first and third instruction in two different ways but they basically do the same thing.
The
and $0xfffffff0,%esp
instruction is a way to forceesp
to be on a specific boundary. This is used to ensure proper alignment of variables. Many memory accesses on modern processors will be more efficient if they follow the alignment rules (such as a 4-byte value having to be aligned to a 4-byte boundary). Some modern processors will even raise a fault if you don't follow these rules.After this instruction, you're guaranteed that
esp
is both less than or equal to its previous value and aligned to a 16 byte boundary.The
gs:
prefix simply means to use thegs
segment register to access memory rather than the default.The instruction
mov %eax,-0xc(%ebp)
means to take the contents of theebp
register, subtract 12 (0xc
) and then put the value ofeax
into that memory location.Re the explanation of the code. Your
function
function is basically one big no-op. The assembly generated is limited to stack frame setup and teardown, along with some stack frame corruption checking which uses the afore-mentioned%gs:14
memory location.It loads the value from that location (probably something like
0xdeadbeef
) into the stack frame, does its job, then checks the stack to ensure it hasn't been corrupted.Its job, in this case, is nothing. So all you see is the function administration stuff.
Stack set-up occurs between
function+0
andfunction+12
. Everything after that is setting up the return code ineax
and tearing down the stack frame, including the corruption check.Similarly,
main
consist of stack frame set-up, pushing the parameters forfunction
, callingfunction
, tearing down the stack frame and exiting.Comments have been inserted into the code below:
I think the reason for the
%gs:0x14
may be evident from above but, just in case, I'll elaborate here.It uses this value (a sentinel) to put in the current stack frame so that, should something in the function do something silly like write 1024 bytes to a 20-byte array created on the stack or, in your case:
then the sentinel will be overwritten and the check at the end of the function will detect that, calling the failure function to let you know, and then probably aborting so as to avoid any other problems.
If it placed
0xdeadbeef
onto the stack and this was changed to something else, then anxor
with0xdeadbeef
would produce a non-zero value which is detected in the code with theje
instruction.The relevant bit is paraphrased here: