Stack Alignment in x64 Assembly – Best Practices

assemblycalling-conventionmemory-alignmentstack-pointerx86-64

how is the value of 28h (decimal 40) that is subtracted from rsp calculated in the following:

    option casemap:none

    includelib kernel32.lib
    includelib user32.lib

externdef MessageBoxA : near
externdef ExitProcess : near

    .data

text    db 'Hello world!', 0
caption db 'Hello x86-64', 0

    .code

main proc
    sub rsp, 28h        ; space for 4 arguments + 16byte aligned stack
    xor r9d, r9d        ; 4. argument: r9d = uType = 0
    lea r8,    ; 3. argument: r8  = caption
    lea rdx, [text]     ; 2. argument: edx = window text
    xor rcx, rcx        ; 1. argument: rcx = hWnd = NULL
    call MessageBoxA
    xor ecx, ecx        ; ecx = exit code
    call ExitProcess
main endp

    end

from: http://www.japheth.de/JWasm/Win64_1.html

By my understanding I would have to only subtract 20h since each value I'm using takes 8 bytes into 4 is 20h. so why is 28h being subtracted and how does that result in 16 byte alignment?

see also Is reserving stack space necessary for functions less than four arguments?

Best Answer

I believe it's because before main is called, the stack is aligned. Then after the call, the act of the call was to push an 8-byte pointer (the address within the caller to return to, which is the address right after the call instruction) onto the stack. So at the beginning of main, it's 8 bytes off of the 16-byte alignment. Therefore, instead of 20h you need 28h, bringing the actual total to 28h + 8h (from the call) or 30h. Alignment. :)