MSVC inline asm和GNU C inline asm之间有很大的不同。GCC语法是为优化输出而设计的，没有浪费指令，用于 Package 单个指令或其他东西。MSVC语法被设计得相当简单，但是AFAICT不可能在没有延迟和额外指令的情况下使用，这些指令是为了输入和输出而在内存中往返的。
(MSVC __asm{ ... }语法是also supported乘clang -fasm-blocks，一个不同之处在于MSVC支持在EAX中保留一个值，并支持从非void函数的末尾脱落;clang -fasm-blocks没有，大概clang-cl也没有。）
如果你使用inline asm是出于性能的考虑，这使得MSVC inline asm只有在你用asm编写一个完整的循环时才可行，而不是在一个inline函数中 Package 短序列。下面的例子（用一个函数 Package idiv）是MSVC不擅长的：~8个额外的存储/加载指令。
MSVC inline asm（由MSVC和可能的icc使用，也可能在一些商业编译器中可用）：

查看您的asm以确定您的代码执行哪个寄存器。
只能通过内存传输数据。寄存器中的数据由编译器存储，以便为您的mov ecx, shift_count做准备。因此，使用编译器不会为您生成的单个asm指令涉及到内存的往返访问。
- - 对初学者更友好，但通常无法避免数据输入/输出的开销**。即使除了语法限制之外，MSVC当前版本中的优化器也不擅长围绕内联asm块进行优化。

GNU C inline asm is not a good way to learn asm。你必须非常了解asm，这样你才能告诉编译器你的代码。你还必须了解编译器需要知道什么。这个答案也有其他inline-asm指南和问答的链接。x86标签wiki有很多关于asm的好东西。但只是针对GNU inline asm的链接（答案中的内容也适用于非x86平台上的GNU inline asm）。
GNU C inline asm语法被gcc、clang、icc以及一些实现GNU C的商业编译器使用：

你必须告诉编译器你所破坏的内容，否则会导致周围代码以不明显的、难以调试的方式被破坏。
功能强大但难以阅读、学习和使用语法来告诉编译器如何提供输入以及在哪里找到输出。例如，"c" (shift_count)将使编译器在内联asm运行之前将shift_count变量放入ecx。
对于大的代码块来说，这是非常笨拙的，因为asm必须在字符串常量内。

"insn   %[inputvar], %%reg\n\t"       // comment
  "insn2  %%reg, %[outputvar]\n\t"

- - 非常苛刻/困难，但允许较低的开销，特别是用于 Package 单指令**。（ Package 单指令是最初的设计意图，这就是为什么您必须特别告诉编译器早期的错误，以阻止它使用同一寄存器进行输入和输出，如果这是一个问题的话。）

示例：全角整数除法（`div`）

在32位CPU上，将64位整数除以32位整数，或执行全乘（32x32-〉64），可以从内联asm中获益。gcc和clang不利用idiv来实现(int64_t)a / (int32_t)b，可能是因为如果结果不适合32位寄存器，指令就会出错。关于从一个div中得到商和余数，这是一个内联asm的用例（除非有一种方法通知编译器结果合适，这样idiv就不会出错）。
我们将使用将一些参数放入寄存器的调用约定（hi甚至放在 * right * 寄存器中），以显示更接近于您在内联这样一个小函数时所看到的情况。

MSVC

使用inline-asm时要注意register-arg调用约定。显然，inline-asm支持的设计/实现非常糟糕，以至于the compiler might not save/restore arg registers around the inline asm, if those args aren't used in the inline asm。感谢@RossRidge指出这一点。

// MSVC.  Be careful with _vectorcall & inline-asm: see above
// we could return a struct, but that would complicate things
int _vectorcall div64(int hi, int lo, int divisor, int *premainder) {
    int quotient, tmp;
    __asm {
        mov   edx, hi;
        mov   eax, lo;
        idiv   divisor
        mov   quotient, eax
        mov   tmp, edx;
        // mov ecx, premainder   // Or this I guess?
        // mov   [ecx], edx
    }
    *premainder = tmp;
    return quotient;     // or omit the return with a value in eax
}

- 更新：显然在eax或edx:eax中留下一个值，然后从非空函数的末尾脱落（没有return），即使是在内联**的时候。我假设只有在asm语句之后没有代码的情况下，这才有效。参见Does __asm{}; return the value of eax?这避免了存储/重载输出（至少对于quotient），但是我们不能对输入做任何事情。在一个带有堆栈参数的非内联函数中，它们已经在内存中了，但是在这个用例中，我们正在编写一个可以有效内联的小函数。

使用MSVC 19.00.23026 /O2on rextester（使用main()查找exe和dumps the compiler's asm output to stdout的目录）编译。

## My added comments use. ##
; ... define some symbolic constants for stack offsets of parameters
; 48   : int ABI div64(int hi, int lo, int divisor, int *premainder) {
    sub esp, 16                 ; 00000010H
    mov DWORD PTR _lo$[esp+16], edx      ## these symbolic constants match up with the names of the stack args and locals
    mov DWORD PTR _hi$[esp+16], ecx

    ## start of __asm {
    mov edx, DWORD PTR _hi$[esp+16]
    mov eax, DWORD PTR _lo$[esp+16]
    idiv    DWORD PTR _divisor$[esp+12]
    mov DWORD PTR _quotient$[esp+16], eax  ## store to a local temporary, not *premainder
    mov DWORD PTR _tmp$[esp+16], edx
    ## end of __asm block

    mov ecx, DWORD PTR _premainder$[esp+12]
    mov eax, DWORD PTR _tmp$[esp+16]
    mov DWORD PTR [ecx], eax               ## I guess we should have done this inside the inline asm so this would suck slightly less
    mov eax, DWORD PTR _quotient$[esp+16]  ## but this one is unavoidable
    add esp, 16                 ; 00000010H
    ret 8

这里有大量额外的mov指令，编译器甚至都没有优化它们。我想它可能会看到并理解内联asm中的mov tmp, edx，并将其存储到premainder。但我猜这需要将premainder从堆栈加载到内联asm块之前的寄存器中。

_vectorcall的这个函数实际上比普通的堆栈上所有内容ABI的更糟糕。寄存器中有两个输入，它将它们存储到内存中，这样内联asm就可以从命名变量中加载它们。如果这是内联的，甚至更多的参数可能在寄存器中，它必须存储所有参数，这样asm就有内存操作数了！所以与gcc不同，内联这个函数并没有什么好处。
在asm块中执行*premainder = tmp意味着在asm中写入更多代码，但确实避免了剩余部分的完全脑死的存储/加载/存储路径。这总共减少了2条指令，减少到11条（不包括ret）。
我试图从MSVC中得到最好的代码，而不是“用错了”，创建一个稻草人参数。但是AFAICT Package 非常短的序列是可怕的。假设有一个内在函数为64/32 -〉32除法，允许编译器为这种特殊情况生成好的代码。所以在MSVC上使用inline asm的整个前提可能是一个稻草人参数，但是它确实向您展示了intrinsic比MSVC的inline asm要好 * 得多 *。
GNU C语言（gcc/clang/icc）
当内联div 64时，Gcc甚至比这里显示的输出做得更好，因为它通常可以安排前面的代码首先在edx：eax中生成64位整数。
我不能让gcc编译32位的vectorcall ABI。Clang可以，但是它在带有"rm"约束的内联asm上很糟糕（在godbolt链接上试试：64位MS调用约定接近于32位vectorcall，前两个参数在edx，ecx中。不同之处在于，在使用堆栈之前，还有2个参数进入regs（并且被调用方不会将参数弹出堆栈，这是ret 8在MSVC输出中的作用）。

// GNU C
// change everything to int64_t to do 128b/64b -> 64b division
// MSVC doesn't do x86-64 inline asm, so we'll use 32bit to be comparable
int div64(int lo, int hi, int *premainder, int divisor) {
    int quotient, rem;
    asm ("idivl  %[divsrc]"
          : "=a" (quotient), "=d" (rem)    // a means eax,  d means edx
          : "d" (hi), "a" (lo),
            [divsrc] "rm" (divisor)        // Could have just used %0 instead of naming divsrc
            // note the "rm" to allow the src to be in a register or not, whatever gcc chooses.
            // "rmi" would also allow an immediate, but unlike adc, idiv doesn't have an immediate form
          : // no clobbers
        );
    *premainder = rem;
    return quotient;
}

使用gcc -m64 -O3 -mabi=ms -fverbose-asm编译。使用-m32，你只需要得到3个加载、idiv和一个存储，正如你在godbolt链接中所看到的。

mov     eax, ecx  # lo, lo
idivl  r9d      # divisor
mov     DWORD PTR [r8], edx       # *premainder_7(D), rem
ret

对于32位vectorcall，gcc将执行以下操作

## Not real compiler output, but probably similar to what you'd get
mov     eax, ecx               # lo, lo
mov     ecx, [esp+12]          # premainder
idivl   [esp+16]               # divisor
mov     DWORD PTR [ecx], edx   # *premainder_7(D), rem
ret   8

MSVC使用13条指令（不包括ret），而gcc使用4条。使用内联，正如我所说的，它可能只编译一条指令，而MSVC仍然可能使用9条指令。（它不需要保留堆栈空间或加载premainder;我假设它仍然需要存储3个输入中的2个，然后在asm内部重新加载它们，运行idiv，存储2个输出，然后在asm外部重新加载它们，所以输入需要4次加载/存储，输出需要另外4次加载/存储。）

4条答案

按热度按时间

zaqlnxep1#

赞(0）回复(0）举报 2023-02-11

x759pob22#

使用哪一种取决于你的编译器，它不像C语言那样标准。

sauutmhj3#

- 海湾合作委员会中的asm与__asm__**

asm不能与-std=c99一起使用，您有两种选择：

使用__asm__
使用-std=gnu99

更多详情：错误："asm"未声明（首次在此函数中使用）

我找不到__asm的文档（尤其是www.example.com上没有提到），但从GCC 8.1源代码来看，它们是完全相同的：https://gcc.gnu.org/onlinedocs/gcc-7.2.0/gcc/Alternate-Keywords.html#Alternate-Keywords ), but from the GCC 8.1 source they are exactly the same:

{ "__asm",        RID_ASM,    0 },
  { "__asm__",      RID_ASM,    0 },

所以我只使用__asm__，这是有文档记录的。

luaexgnf4#

对于gcc编译器，这不是一个很大的区别。asm或__asm或__asm__是相同的，它们只是用来避免命名空间冲突（有用户定义的函数命名asm等）

C语言 “asm”、“asm”和“asm__”之间有何区别？

4条答案

示例：全角整数除法（`div`）

MSVC

相关问题

热门标签

最新问答

C语言 “asm”、“__asm”和“__asm__”之间有何区别？

4条答案

示例：全角整数除法（div）

MSVC

相关问题

热门标签

最新问答

C语言 “asm”、“asm”和“asm__”之间有何区别？

示例：全角整数除法（`div`）