assembly 你好，用汇编语言和Linux系统调用的世界？

How does $ work in NASM, exactly?解释了$ - msg如何让NASM将字符串长度计算为汇编时常数，而不是硬编码。
通过将参数放入寄存器，然后运行int 0x80（32位模式）或syscall（64位模式）. What are the calling conventions for UNIX & Linux system calls (and user-space functions) on i386 and x86-64和The Definitive Guide to Linux System Calls，可以完成系统调用。

**可以将int 0x80视为一种跨越用户/内核权限边界“调用”内核的方式。**内核根据int 0x80执行时寄存器中的值执行操作，然后最终返回。返回值在EAX中。

当执行到达内核的入口点时，它会查看EAX并根据EAX中的调用号分派到正确的系统调用。来自其他寄存器的值作为函数参数传递到该系统调用的内核处理程序。（例如，eax=4 /int 0x80将使内核调用其sys_write内核函数，实现POSIX write系统调用。）
另请参见如果您在64位代码中使用32位int 0x 80 Linux ABI会发生什么？-该答案包括查看内核入口点中由int 0x80“调用”的asm。（也适用于32位用户空间，而不仅仅是不应该使用int 0x80的64位）。
如果您还不了解底层Unix系统编程，您可能只想在asm中编写接受args并返回值的函数（或者通过指针参数更新数组）并从C或C++程序中调用它们。然后你就可以只关心学习如何处理寄存器和内存了，而不需要学习POSIX系统调用API和使用它的ABI。这也使得比较你的代码和C实现的编译器输出非常容易。编译器通常在生成高效代码方面做得很好，但很少是完美的。
libc为系统调用提供了 Package 器函数，因此编译器生成的代码将使用call write，而不是直接使用int 0x80（或者如果您关心性能，则使用sysenter）（在x86-64代码中为use syscall for the 64-bit ABI）。
系统调用在第2节手册页中介绍，如write(2)。有关libc Package 器函数和底层Linux系统调用之间的区别，请参见NOTES部分。注意，sys_exit的 Package 器是_exit(2)。而不是先刷新stdio缓冲区和其他清理的exit(3) ISO C函数。实际上_exit()使用ends all threads的exit_group系统调用。exit(3)也使用它，因为单线程进程没有缺点。
给定系统调用调用约定和C手册页，您可以看到哪个args进入哪个寄存器;有些网页上有系统调用和注册表，但你并不需要它们。
此代码执行2个系统调用：

我对它进行了大量的注解（以至于它开始模糊实际代码，而没有用颜色突出显示语法）。这是试图向初学者指出一些事情，而不是通常应该如何注解代码。

section .text             ; Executable code goes in the .text section
global _start             ; The linker looks for this symbol to set the process entry point, so execution start here
;;;a name followed by a colon defines a symbol.  The global _start directive modifies it so it's a global symbol, not just one that we can CALL or JMP to from inside the asm.
;;; note that _start isn't really a "function".  You can't return from it, and the kernel passes argc, argv, and env differently than main() would expect.
 _start:
    ;;; write(1, msg, len);
    ; Start by moving the arguments into registers, where the kernel will look for them
    mov     edx,len       ; 3rd arg goes in edx: buffer length
    mov     ecx,msg       ; 2nd arg goes in ecx: pointer to the buffer
    ;Set output to stdout (goes to your terminal, or wherever you redirect or pipe)
    mov     ebx,1         ; 1st arg goes in ebx: Unix file descriptor. 1 = stdout, which is normally connected to the terminal.

    mov     eax,4         ; system call number (from SYS_write / __NR_write from unistd_32.h).
    int     0x80          ; generate an interrupt, activating the kernel's system-call handling code.  64-bit code uses a different instruction, different registers, and different call numbers.
    ;; eax = return value, all other registers unchanged.

    ;;;Second, exit the process.  There's nothing to return to, so we can't use a ret instruction (like we could if this was main() or any function with a caller)
    ;;; If we don't exit, execution continues into whatever bytes are next in the memory page,
    ;;; typically leading to a segmentation fault because the padding 00 00 decodes to  add [eax],al.

    ;;; _exit(0);
    xor     ebx,ebx       ; first arg = exit status = 0.  (will be truncated to 8 bits).  Zeroing registers is a special case on x86, and mov ebx,0 would be less efficient.
                      ;; leaving out the zeroing of ebx would mean we exit(1), i.e. with an error status, since ebx still holds 1 from earlier.
    mov     eax,1         ; put __NR_exit into eax
    int     0x80          ;Execute the Linux function

section     .rodata       ; Section for read-only constants

             ;; msg is a label, and in this context doesn't need to be msg:.  It could be on a separate line.
             ;; db = Data Bytes: assemble some literal bytes into the output file.
msg     db  'Hello, world!',0xa     ; ASCII string constant plus a newline (0x10)

             ;;  No terminating zero byte is needed, because we're using write(), which takes a buffer + length instead of an implicit-length string.
             ;; To make this a C string that we could pass to puts or strlen, we'd need a terminating 0 byte. (e.g. "...", 0x10, 0)

len     equ $ - msg       ; Define an assemble-time constant (not stored by itself in the output file, but will appear as an immediate operand in insns that use it)
                          ; Calculate len = string length.  subtract the address of the start
                          ; of the string from the current position ($)
  ;; equivalently, we could have put a str_end: label after the string and done   len equ str_end - str

注意，我们 * 不 * 将字符串长度存储在数据内存中的任何地方。它是一个汇编时常量，因此将其作为立即数操作数比load更有效。我们也可以使用三条push imm32指令将字符串数据压入堆栈，但代码太大不是一件好事。
在Linux上，您可以将此文件另存为Hello.asm，并使用以下命令从该文件构建32位可执行文件：

nasm -felf32 Hello.asm                  # assemble as 32-bit code.  Add -Worphan-labels -g -Fdwarf  for debug symbols and warnings
gcc -static -nostdlib -m32 Hello.o -o Hello     # link without CRT startup code or libc, making a static binary

有关将汇编编译成32位或64位静态或动态链接的Linux可执行文件的更多详细信息，请参见此答案，以了解使用GNU as指令的NASM/YASM语法或GNU AT&T语法。在64位主机上构建32位代码时，请确保使用-m32或等效代码，否则您将在运行时遇到令人困惑的问题。）

您可以使用strace跟踪其执行情况，以查看其执行的系统调用：

$ strace ./Hello 
execve("./Hello", ["./Hello"], [/* 72 vars */]) = 0
[ Process PID=4019 runs in 32 bit mode. ]
write(1, "Hello, world!\n", 14Hello, world!
)         = 14
_exit(0)                                = ?
+++ exited with 0 +++

将其与动态链接进程的跟踪（如gcc从hello.c或运行strace /bin/ls生成的跟踪）进行比较，以了解动态链接和C库启动背后发生了多少事情。
stderr上的跟踪和stdout上的常规输出都将到达这里的终端，因此它们会干扰write系统调用的行。如果您关心，请重定向或跟踪到一个文件。请注意，这使我们可以轻松地查看syscall返回值，而不必添加代码来打印它们。实际上比使用常规调试器更容易（如gdb）到单步执行，请查看eax，gdb asm技巧请参见x86 tag wiki底部。（标签wiki的其余部分充满了指向好资源的链接。）
这个程序的x86-64版本非常相似，将相同的参数传递给相同的系统调用，只是在不同的寄存器中，并且使用syscall代替int 0x80。请参阅What happens if you use the 32-bit int 0x80 Linux ABI in 64-bit code?的底部，以获得写入字符串并以64位代码退出的工作示例。

相关文章：A Whirlwind Tutorial on Creating Really Teensy ELF Executables for Linux。可以运行的最小二进制文件，只需执行exit（）系统调用。这是关于最小化二进制文件大小，而不是源代码大小，甚至只是实际运行的指令数。

我最初是为SO Docs (topic ID: 1164, example ID: 19078)写的（除了第一句话），重写了一个基本的注解不太好的例子@runner。* 这看起来是一个更好的地方，而不是作为我对另一个问题的回答的一部分，我之前在SO docs实验结束后移动了它。

assembly 你好，用汇编语言和Linux系统调用的世界？

1条答案

相关问题

热门标签

最新问答