assembly 调用方或被调用方是否负责释放x64程序集(windows)中的卷影存储区?

oyt4ldly  于 2022-11-13  发布在  Windows
关注(0)|答案(1)|浏览(195)

来自C和C++,我最近开始学习x86-64汇编,以更好地理解我的程序的工作原理。
我知道x64汇编中的约定是在调用函数之前在堆栈上保留32字节的“影子存储区”(通过执行以下操作:subq $0x20, %rsp)的数据。
我不确定的是:是被调用方负责再次递增%rsp,还是调用方负责?
换句话说(以printf为例),数字1或数字2是正确的吗(或者两者都不是:P)?
1.

subq $0x20, %rsp
movabsq $msg, %rcx
callq printf
subq $0x20, %rsp
movabsq $msg, %rcx
callq printf
addq $0x20, %rsp

(...其中msg是存储在.data部分中的ascii字符串,我将把它传递给printf
我在Windows 10上,使用GAS作为我的汇编程序。
任何帮助都将不胜感激,干杯。

fslejnso

fslejnso1#

Deallocating shadow space is the caller's responsibility.
But normally you'd do it once per function, not once per call-site within a function. Usually you just move RSP once (maybe after some pushes) and leave it alone until you're ready to return. That includes making room to store stack args if any for functions with more than 4 args.
In the Windows x64 calling convention (and x86-64 System V), the callee must return without changing the caller's RSP. i.e. with ret , not ret 32 , and without having copied the return address somewhere else.
MS has some examples in https://learn.microsoft.com/en-us/cpp/build/prolog-and-epilog?view=msvc-170#epilog-code
And specifically documents that RSP mustn't be changed by functions:
The x64 ABI considers registers RBX, RBP, RDI, RSI, RSP, R12, R13, R14, R15, and XMM6-XMM15 nonvolatile. They must be saved and restored by a function that uses them.
(You also need to emit unwind metadata for every instruction that moves the stack pointer, and about where you saved non-volatile aka call-preserved registers, if you want to be fully compliant with the ABI, including for SEH and C++ exception unwinding. Toy programs still work fine without, as long as you don't expect C++ exceptions to work, or debuggers to unwind the stack back to the stack frame of a caller.)
You can see this if you look at MSVC compiler output, e.g. https://godbolt.org/z/xh38jxWqT , or for AT&T syntax, gcc -O2 -mabi=ms to tell it that all the functions it sees are __attribute__((ms_abi)) by default, but it doesn't override the fact that it's targeting Linux. So with -fPIE to make it use LEA instead of 32-bit absolute addressing for symbol addresses, we also get call printf@plt , not Windows style calls to DLL functions.
But the stack management from GCC matches what MSVC -O2 also does.

#include <stdio.h>

void bar();
int foo(){
    printf("%d\n", 1);
    bar();
    return 1;  // make sure this isn't a tailcall
}
# gcc -O2 -mabi=ms  (but still sort of targeting Linux as far as dynamic linking)
.LC0:
        .string "%d\n"      ## in .rodata

foo():
        subq    $40, %rsp
        movl    $1, %edx
        movl    $.LC0, %ecx      # with -fPIE, uses    leaq    .LC0(%rip), %rcx  like you'd want for Windows x64
        call    printf
        call    bar()
        movl    $1, %eax
        addq    $40, %rsp
        ret

See also How to remove "noise" from GCC/clang assembly output? for more about looking at compiler output - you can answer most questions about how things normally work by looking at what compilers do in practice. Sometimes things compilers do are just a coincidence, especially with optimization disabled (which is why I constructed an example that couldn't inline the functions, so I could still see the calls with optimization enabled). But here we can rule out your alternate hypothesis.
I also constructed this example to show two calls using the same allocation of shadow space, not pointlessly deallocating / reallocating with add/sub. Even with optimization disabled, compilers don't do that.
Re: putting symbol addresses into registers, see How to load address of function or label into register - RIP-relative LEA is the go-to option. It's position-independent, and works in any executable or library smaller than 2GiB of static code+data. And more efficient than movabs .

相关问题