assembly 如何编译和运行汇编代码?

eqfvzcg8  于 2022-11-13  发布在  其他

我看到汇编代码应该在汇编中创建一个 Boot 扇区。代码是:

jmp $
times 510 - ($ - $$) db 0
db 0x55, 0xaa




汇编语言是特定于工具的,而不是特定于目标的,x86尤其麻烦,因为不兼容的汇编语言的列表无法计数(这不是Intel与at&t的语法问题,有无数不兼容的Intel x86汇编器)。
(Some像NASM do 这样的现代汇编器仍然支持生成平面二进制代码,填充符号地址本身而不需要链接器。这就是您问题中的NASM源代码通常是如何构建到一个512字节的传统BIOS MBR Boot 扇区中的,没有元数据,使用nasm -f bin foo.asm。但是GNU汇编器GAS不支持这样做,而这个答案只考虑了GNU工具链是如何完成的。)

jmp $


as so.s
so.s: Assembler messages:
so.s:1: Error: missing or invalid immediate expression `'


jmp .

as so.s -o so.o
objdump -d so.o

so.o:     file format elf64-x86-64

Disassembly of section .text:

0000000000000000 <.text>:
   0:   eb fe                   jmp    0x0


jmp here
jmp here
jmp here
jmp here
jmp here

注意在gnu汇编器中,并理解gnu支持许多“目标”(x86,arm,mips等),并不一定是通过设计,因为每个目标可能是由不同的人或作者团队创建的,但更多的是通过借用现有的目标代码将其转化为一些新的目标代码,如jmp。点表示此地址,它是这里的快捷方式:jmp在这里,而不需要输入那么多的文本。你可以用这些gnu汇编语言中的一些来做jmp .+2。

as so.s -o so.o
ld so.o -o so.elf
ld: warning: cannot find entry symbol _start; defaulting to 0000000000401000
objdump -d so.elf

so.elf:     file format elf64-x86-64

Disassembly of section .text:

0000000000401000 <__bss_start-0x1000>:
  401000:   eb fe                   jmp    401000 <__bss_start-0x1000>

因此,我们将其链接为最终二进制文件,并且加载/入口点为0x 401000。但是,我们没有为链接器指定任何内存地址,它如何确定这是我们需要代码的位置?因为工具链是为我的操作系统构建的(哦,是的,假设没有两个操作系统支持相同的二进制文件格式,也没有关于操作系统如何加载所述二进制格式的文件的相同规则,以及特定于每个操作系统的系统调用等等),并且它是用C库和用于该库的一些引导代码以及用于该目标的与用于该库的引导代码相结合的链接器脚本来构建的......有一个默认值。并且该默认链接器脚本,使用用于GNU链接器的链接器脚本语言,ld(假定没有两个工具链使用相同的链接器脚本语言),包含


它告诉链接器在二进制文件中标记label _start所在的入口点。它不一定是程序中的第一条指令。它几乎可以是任何地方,但要遵循操作系统加载器的规则。由于我没有指定它,所以选择了默认的链接器脚本。

ld -Ttext=0x1000 so.o -o so.elf
ld: warning: cannot find entry symbol _start; defaulting to 0000000000001000

objdump -d so.elf

so.elf:     file format elf64-x86-64

Disassembly of section .text:

0000000000001000 <__bss_start-0x1000>:
    1000:   eb fe                   jmp    1000 <__bss_start-0x1000>


jmp .
.byte 0x55

ld -Ttext=0x1000 so.o -o so.elf

Disassembly of section .text:

0000000000001000 <.text>:
    1000:   eb fe                   jmp    1000 <__bss_start-0x1001>

Disassembly of section .data:

0000000000002000 <__bss_start-0x1>:
    2000:   55                      push   %rbp


    one : ORIGIN = 0x00000000, LENGTH = 0x1000
    two : ORIGIN = 0x80000000, LENGTH = 0x1000
    .text   : { *(.text*)   } > one
    .bss    : { *(.bss*)    } > two


as so.s -o so.o
ld  -Tso.ld so.o -o so.elf
objdump -d so.elf

so.elf:     file format elf64-x86-64

Disassembly of section .text:

0000000000000000 <.text>:
   0:   eb fe                   jmp    0x0


    one : ORIGIN = 0x00000000, LENGTH = 0x1000
    two : ORIGIN = 0x80000000, LENGTH = 0x1000
    .text   : { *(.text*)   } > one
    .bss    : { *(.bss*)    } > two

ld: warning: cannot find entry symbol banana; defaulting to 0000000000000000

所以很容易安装GNU Binutils工具,你可以看到使用它们是多么容易...

jmp here
jmp here
jmp here
jmp here
jmp here
jmp here


    one : ORIGIN = 0x00001000, LENGTH = 0x1000
    .text   : { *(.text*)   } > one

objdump -已删除

so.o:     file format elf64-x86-64

Disassembly of section .text:

0000000000000000 <here>:
   0:   eb fe                   jmp    0 <here>
   2:   eb fc                   jmp    0 <here>
   4:   eb fa                   jmp    0 <here>
   6:   eb f8                   jmp    0 <here>
   8:   eb f6                   jmp    0 <here>
   a:   eb f4                   jmp    0 <here>

objdump -d so.elf

so.elf:     file format elf64-x86-64

Disassembly of section .text:

0000000000001000 <here>:
    1000:   eb fe                   jmp    1000 <here>
    1002:   eb fc                   jmp    1000 <here>
    1004:   eb fa                   jmp    1000 <here>
    1006:   eb f8                   jmp    1000 <here>
    1008:   eb f6                   jmp    1000 <here>
    100a:   eb f4                   jmp    1000 <here>


.globl one
jmp two


.globl two
jmp one


    one : ORIGIN = 0x00001000, LENGTH = 0x1000
    two : ORIGIN = 0x00002000, LENGTH = 0x1000
    .one   : { so.o(.text)   } > one
    .two   : { x.o(.text)   } > two

as so.s -o so.o
as x.s -o x.o
ld -Tso.ld -o so.elf

objdump -d  so.o

so.o:     file format elf64-x86-64

Disassembly of section .text:

0000000000000000 <one>:
   0:   e9 00 00 00 00          jmpq   5 <one+0x5>

objdump -d  x.o

x.o:     file format elf64-x86-64

Disassembly of section .text:

0000000000000000 <two>:
   0:   e9 00 00 00 00          jmpq   5 <two+0x5>

objdump -d so.elf

so.elf:     file format elf64-x86-64

Disassembly of section .one:

0000000000001000 <one>:
    1000:   e9 fb 0f 00 00          jmpq   2000 <two>

Disassembly of section .two:

0000000000002000 <two>:
    2000:   e9 fb ef ff ff          jmpq   1000 <one>

在对象级别(So.o,x.o)汇编器不知道这些标签在哪里,So.s不具有两个标签,因此汇编器假定它是外部的,并且必须假定距离是远跳跃,并且照此编码(例如,如果它要产生EB 00,并且一旦链接标签就太远,现在代码有麻烦并且可能不可用,所以工具链已经设计了规则来解决这些问题,就像我们在上面jmp这里看到的那样。很明显,这是一个pc相对偏移量,而不是一个绝对地址。同样,每个指令都是两个字节,沿着直线向下的偏移量是2。如果在该代码中混合了到外部的长跳转,但汇编程序将其编码为短跳转,则链接器在插入三个字节时会尝试解析它并且改变操作码,则现在将后面的JMP here指令全部处理,因为它们是完整的机器码。
因此,无论如何,汇编器都会为偏移量填充零,然后链接器在执行任务时用真实的偏移量替换这些零。有时候,你会看到汇编器编码了一个到self的跳转(jmp .,here:jmp here),然后链接器将其修补。

b .
b two

00000000 <.text>:
   0:   eafffffe    b   0 <.text>
   4:   eafffffe    b   0 <two>

在对象层次上,而b 0对于第二个真正是b 4的是错误的...但因为这是对象才关心的。它是程序的一小部分而不是完整的程序。

0000000000001000 <one>:
    1000:   e9 fb 0f 00 00          jmpq   2000 <two>

Disassembly of section .two:

0000000000002000 <two>:
    2000:   e9 fb ef ff ff          jmpq   1000 <one>


one: jmp two


0:   eb fe                   jmp    0x0

用eb指令开始执行代码,这两个字节是整个指令,它是说向后跳两个字节。向后跳两个字节(哑)处理器找到一个eb和fe,它告诉它向后跳两个字节,它找到一个eb fe,然后......永远或直到被中断。

And since I do not mess with x86, I would not be surprised if that jmp is not actually executed and the thing you are looking at is a construct defined by the bios as a way to mark something related to booting. bios (x86) is a whole other very long book or set of books, how an x86 boots now and historically, etc. If my guess is right then someone probably said, hey let's put a jump to self up front just in case someone tries to execute this. I could be very wrong on that, your questions were related to tools and execution and not about the boot sector itself. Which while it has the name boot in it, it was designed as a thing the bios used to get that media started. booting is often a series of steps from how the processor finds its first instruction on an often very limited single or set of media it can support, and then that code gets more of the processor or peripherals up to find other media and so on (bios on flash to boot sector on hard drive to file system to load a bootloader that then maybe loads kernel that has its own drivers for the peripherals and finds a file system and so on). And you are looking at but one step in the ladder.
Oh yeah....and "how does it get run" well barring the steps in the ladder above.
Normal (compiled/assembled/linked) programs are put into a binary format that is supported by the operating system (.exe, .coff, .elf, etc). That file format has to conform to rules for that operating system. Then when on a command line or point and click you try to run it the operating system has code that these days sets up a virtual environment/address space, that protects you from others and others from you, then loads the fractions of that file that are actually code and data into that virtual address space, then switches from superuser level for the processor into a user/application mode and jumps to the entry point defined by the binary.
Toolchains like gnu and llvm and others can be used to generate programs that do not conform to the specific host operating system. The tools are somewhat generic. You can make C programs without a main() and without support for the C library, it is somewhat trivial. You could for example create some x86 code that if you knew how and where to put it on the flash that the processor uses to boot on your motherboard, it would run that code instead of the bios. BUT you are now conforming to rules of a different environment, and many of the components of the binary file format that is the default output for that toolchain may not be used. If you are creating the first instructions the processor boots, then there is no file system there is no operating system (sometimes no memory) there is nothing that can parse an exe or elf file, it is pure data and machine code. You need some tool(s) that take the elf file for example and with some hardware or probes or magic box you put a chip in extract the bytes and program them into that chip. Or some tool takes the elf file, makes another file format that the program the chip hardware knows how to use, per the rules of the program the chip hardware.
x86 is pretty much the worst first place you want to try this, "but I have one" is the worst excuse, you have four to 100 times as many arm processors as you have x86 processors. But "I have one" is not a good reason. If you want to work at this level. Start with a better and/or simpler instruction set/processor and a simulator/emulator. You will not brick anything, you will not let smoke out of anything, and your odds of success are significantly higher because you have better visibility/debug into what is going on. You can't see/debug that bios trying to read and use the boot sector on some media. (unless you have an emulator for that or special tools and knowledge for that specific motherboard).
