assembly 内存地址如何放置在二进制文件中?

gcxthw6b  于 2022-12-04  发布在  其他
关注(0)|答案(1)|浏览(130)

我很难理解elf文件中的部分是如何加载到内存中的,以及地址是如何选择的?嵌入式系统通常会为代码分配特定的地址,但这些地址放在哪里呢?
基本上,地址是如何以及何时放入这些部分的,以及如何加载到操作系统中以及嵌入式系统中的ROM或RAM中。

cedebl8k

cedebl8k1#

一个特定的操作系统有一个特定的规则集,或者可能有多个规则集,规定兼容程序可以加载到哪里。为该平台制作的包括默认链接器脚本(想想gcc hello.c -o hello)的工具链符合这些规则。
例如,我决定为具有MMU的平台创建一个操作系统。因为它具有MMU,所以我可以创建操作系统,以便每个程序都看到相同的内容(虚拟)地址空间。所以我可以决定对于我的操作系统上的应用程序,内存空间从0x 00000000开始,但入口点必须是0x 00001000。支持的二进制文件格式是摩托罗拉s记录。
因此,以一个简单的程序和一个简单的链接器脚本为例

MEMORY
{
    ram : ORIGIN = 0x1000, LENGTH = 0x10000
}
SECTIONS
{
    .text : { *(.text*) } > ram
}

简单程序的反汇编

00001000 <_start>:
    1000:   e3a0d902    mov sp, #32768  ; 0x8000
    1004:   eb000001    bl  1010 <main>
    1008:   e3a00000    mov r0, #0
    100c:   ef000000    svc 0x00000000

00001010 <main>:
    1010:   e3a00000    mov r0, #0
    1014:   e12fff1e    bx  lr

而“二进制”文件恰好是人类可读的:

S00F00006E6F746D61696E2E737265631F
S3150000100002D9A0E3010000EB0000A0E3000000EF1E
S30D000010100000A0E31EFF2FE122
S70500001000EA

您可能会注意到,也可能不会注意到,该地址确实是在描述对象所在位置的二进制文件中。
作为一个基于操作系统的程序,加载到内存中,我们不必玩太多的游戏与内存,我们可以假设一个单位的所有内存(读/写),所以如果有.数据,.bss等,它都可以 Package 在那里。
对于一个真实的的操作系统来说,二进制文件应该包含额外的信息,比如程序的大小。所以你可以搜索一下各种常见的文件格式,看看是怎么做到的,要么是一个简单的前面我需要这么多,要么是一个到多个单独定义的部分。是的,“二进制”不仅仅是操作码和数据,我想你明白了。
默认情况下,我使用的工具链输出elf格式的文件,但objcopy可以用于创建许多不同的格式,其中一种是原始内存映像(不包含任何地址/位置信息),其余的许多/大部分包含机器代码和数据,以及调试器/反汇编器的标签或数据块在内存空间中的地址等。
现在,当您提到嵌入式并使用ROM和RAM时,我假设您指的是裸机,例如微控制器,但即使你指的是启动一个x86或全尺寸ARM或任何相同的事情适用。在MCU的情况下,芯片设计师可能已经根据处理器的规则或他们自己的选择确定了存储器空间的规则。就像操作系统将规定规则一样。我们有一点作弊,因为我们今天使用的很多工具但是因为通用编译器是通用编译器并且更重要的是工具链本身适合于这种可移植性,我们可以使用这样的工具。理想情况下,使用交叉编译器意味着输出的机器代码不一定要在生成该输出机器代码的计算机上运行。重要的主要区别在于,我们希望控制链接和库,而不是在基于库的主机操作系统中进行链接,而是让我们控制,或者对于此工具链,我们有一个面向MCU的默认链接器脚本。假设我有一个基于ARM 7 TDMI的MCU,并且芯片设计者说我需要这样的二进制,使得ROM从地址0x 00000000开始并且具有一定的大小,并且RAM从0x 40000000开始并且具有一定的大小。作为ARM 7,处理器通过获取地址0x 00000000处的指令开始执行,芯片设计人员已将0x 00000000Map到罗
现在我的简单程序

unsigned int xyz;
int notmain ( void )
{
    xyz=5;
    return(0);
}

这样链接

MEMORY
{
    bob : ORIGIN = 0x00000000, LENGTH = 0x1000
    ted : ORIGIN = 0x40000000, LENGTH = 0x1000
}
SECTIONS
{
    .text : { *(.text*) } > bob
    .bss : { *(.bss*) } > ted
}

给出了它的分解

Disassembly of section .text:

00000000 <_start>:
   0:   e3a0d101    mov sp, #1073741824 ; 0x40000000
   4:   e38dda01    orr sp, sp, #4096   ; 0x1000
   8:   eb000000    bl  10 <notmain>
   c:   eafffffe    b   c <_start+0xc>

00000010 <notmain>:
  10:   e3a02005    mov r2, #5
  14:   e59f3008    ldr r3, [pc, #8]    ; 24 <notmain+0x14>
  18:   e3a00000    mov r0, #0
  1c:   e5832000    str r2, [r3]
  20:   e12fff1e    bx  lr
  24:   40000000    andmi   r0, r0, r0

Disassembly of section .bss:

40000000 <xyz>:
40000000:   00000000    andeq   r0, r0, r0

这将是一个完全有效的程序,虽然没有什么有趣的功能,但仍然是一个完全有效的程序。
首先也是最重要的一点是,如果您不输入out _start,工具链会发出警告,但仍然可以正常工作。(嗯,实际上那次没有发出警告,很有趣)。

arm-none-eabi-as --warn --fatal-warnings vectors.s -o vectors.o
arm-none-eabi-gcc -Wall -Werror -O2 -nostdlib -nostartfiles -ffreestanding -c notmain.c -o notmain.o
arm-none-eabi-ld vectors.o notmain.o -T memmap -o notmain.elf
arm-none-eabi-objdump -D notmain.elf > notmain.list
arm-none-eabi-objcopy --srec-forceS3 notmain.elf -O srec notmain.srec
arm-none-eabi-objcopy notmain.elf -O binary notmain.bin

现在你有加载问题。每个MCU是不同的,你如何加载它,什么工具是可用的和/或你使自己的工具。Ihex和srec是流行的prom程序员,你说一个单独的rom旁边的处理器和/或通孔MCU将得到插入prom程序员。原始二进制图像也工作,但可以很快得到大,如将显示在第二。如上所述,有.bss但没有.data,因此

ls -al notmain.bin
-rwxr-xr-x 1 user user 40 Oct 21 22:05 notmain.bin

40个字节。但如果我这样做是为了演示的目的,即使它不能正确工作:

unsigned int xyz=5;
int notmain ( void )
{
    return(0);
}

MEMORY
{
    bob : ORIGIN = 0x00000000, LENGTH = 0x1000
    ted : ORIGIN = 0x40000000, LENGTH = 0x1000
}
SECTIONS
{
    .text : { *(.text*) } > bob
    .bss : { *(.bss*) } > ted
    .data : { *(.data*) } > ted
}

给予

Disassembly of section .text:

00000000 <notmain-0x10>:
   0:   e3a0d101    mov sp, #1073741824 ; 0x40000000
   4:   e38dda01    orr sp, sp, #4096   ; 0x1000
   8:   eb000000    bl  10 <notmain>
   c:   eafffffe    b   c <notmain-0x4>

00000010 <notmain>:
  10:   e3a00000    mov r0, #0
  14:   e12fff1e    bx  lr

Disassembly of section .data:

40000000 <xyz>:
40000000:   00000005    andeq   r0, r0, r5

-rwxr-xr-x  1 user user 1073741828 Oct 21 22:08 notmain.bin

OUCH!0x 40000004字节,这是预期的,我要求一个内存映像,我在一个地址(机器码)定义了一些东西,在另一个地址(0x 40000000)定义了一些字节,所以原始内存映像必须是整个范围。

hexdump notmain.bin 
0000000 d101 e3a0 da01 e38d 0000 eb00 fffe eaff
0000010 0000 e3a0 ff1e e12f 0000 0000 0000 0000
0000020 0000 0000 0000 0000 0000 0000 0000 0000
*
40000000 0005 0000                              
40000004

相反,您可以使用工具链生成的elf文件或ihex或srecord。

S00F00006E6F746D61696E2E737265631F
S3150000000001D1A0E301DA8DE3000000EBFEFFFFEA79
S30D000000100000A0E31EFF2FE132
S3094000000005000000B1
S70500000000FA

所有的信息,我需要的,但不是一个巨大的文件这么少的字节。

Not a hard and fast rule but moving data around is easier today (than a floppy from one computer to another with the prom programmer on it). And particularly i if you have a bundled IDE that vendor likely uses the toolchains default format, but even if not elf and other similar formats are supported and you don't have to go the route of a raw binary or an ihex or srec. But it still depends on the tool that takes the "binary" and programs it into the ROM(/FLASH) on the MCU.
Now I cheated to demonstrate the large file problem above, instead you have to do more work when it is not a ram only system. If you feel the need to have .data or desire to have .bss zeroed then you need to write or use a more complicated linker script that helps you out with the locations and boundaries. And that linker script is married to the bootstrap that uses linker generated information to perform those tasks. Basically a copy of .data needs to be preserved in non-volatile memory (ROM/FLASH) but it cant live there at runtime .data is read/write so ideally/typically you use the linker scripts language/magic to state that the .data read/write space is blah, and the flash space is boo at this address and this size so the bootstrap can copy from flash at that address for that amount of data to ram. And for .bss the linker script generates variables that we save into flash that tell the bootstrap to zero ram from this address to this address.
So operating system defines the memory space, the linker script matches that if you want the program to work. The system designers or chip designers determine the address space for something embedded and the linker script matches that. The bootstrap is married to the linker script for that build and target.

Edit

toolchain basics...

mov sp,#0x40000000
orr sp,sp,#0x1000
bl notmain
b .

unsigned int xyz;
int notmain ( void )
{
    xyz=5;
    return(0);
}

MEMORY
{
    bob : ORIGIN = 0x1000, LENGTH = 0x1000
    ted : ORIGIN = 0x2000, LENGTH = 0x1000
}
SECTIONS
{
    .text : { *(.text*) } > bob
    .bss : { *(.bss*) } > ted
}

My bootstrap, main program and linker script

arm-none-eabi-as --warn --fatal-warnings vectors.s -o vectors.o
arm-none-eabi-gcc -Wall -Werror -O2 -nostdlib -nostartfiles -ffreestanding -save-temps -c notmain.c -o notmain.o
arm-none-eabi-ld vectors.o notmain.o -T memmap -o notmain.elf
arm-none-eabi-objdump -D notmain.elf > notmain.list
arm-none-eabi-objcopy --srec-forceS3 notmain.elf -O srec notmain.srec
arm-none-eabi-objcopy notmain.elf -O binary notmain.bin

Some folks will argue and is sometimes true that compiles don't generate assembly any more. Still the sane way to do it and you will find it more often than not, as in this case...
The bootstrap makes an object which we can disassemble.

00000000 <.text>:
   0:   e3a0d101    mov sp, #1073741824 ; 0x40000000
   4:   e38dda01    orr sp, sp, #4096   ; 0x1000
   8:   ebfffffe    bl  0 <notmain>
   c:   eafffffe    b   c <.text+0xc>

It's not "linked" so the address this disassembler uses is zero based, and you can see the call to notmain is incomplete, not yet linked.
the compiler generated assembly for the C code

.cpu arm7tdmi
    .fpu softvfp
    .eabi_attribute 20, 1
    .eabi_attribute 21, 1
    .eabi_attribute 23, 3
    .eabi_attribute 24, 1
    .eabi_attribute 25, 1
    .eabi_attribute 26, 1
    .eabi_attribute 30, 2
    .eabi_attribute 34, 0
    .eabi_attribute 18, 4
    .file   "notmain.c"
    .text
    .align  2
    .global notmain
    .type   notmain, %function
notmain:
    @ Function supports interworking.
    @ args = 0, pretend = 0, frame = 0
    @ frame_needed = 0, uses_anonymous_args = 0
    @ link register save eliminated.
    mov r2, #5
    ldr r3, .L2
    mov r0, #0
    str r2, [r3]
    bx  lr
.L3:
    .align  2
.L2:
    .word   xyz
    .size   notmain, .-notmain
    .comm   xyz,4,4
    .ident  "GCC: (15:4.9.3+svn231177-1) 4.9.3 20150529 (prerelease)"

that gets assembled into an object which we can also disassemble.

Disassembly of section .text:

00000000 <notmain>:
   0:   e3a02005    mov r2, #5
   4:   e59f3008    ldr r3, [pc, #8]    ; 14 <notmain+0x14>
   8:   e3a00000    mov r0, #0
   c:   e5832000    str r2, [r3]
  10:   e12fff1e    bx  lr
  14:   00000000    andeq   r0, r0, r0

Now not shown but that object also contains information for the global variable xyz and its size.
The linkers job is perhaps part of your confusion. It links the objects together such that the result will be sane or will work on the final destination (bare-metal or operating system).

Disassembly of section .text:

00001000 <notmain-0x10>:
    1000:   e3a0d101    mov sp, #1073741824 ; 0x40000000
    1004:   e38dda01    orr sp, sp, #4096   ; 0x1000
    1008:   eb000000    bl  1010 <notmain>
    100c:   eafffffe    b   100c <notmain-0x4>

00001010 <notmain>:
    1010:   e3a02005    mov r2, #5
    1014:   e59f3008    ldr r3, [pc, #8]    ; 1024 <notmain+0x14>
    1018:   e3a00000    mov r0, #0
    101c:   e5832000    str r2, [r3]
    1020:   e12fff1e    bx  lr
    1024:   00002000    andeq   r2, r0, r0

Disassembly of section .bss:

00002000 <xyz>:
    2000:   00000000    andeq   r0, r0, r0

I made this linker script so that you can see both .data and .bss moving around. The linker has filled in all of the .text into the 0x1000 address space and has patched in the call to notmain() as well as how to reach xyz. It has also allocated/defined the space for the xyz variable in the 0x2000 address space.
And then to your next question or confusion. It is very much up to the tools that load the system, be it the operating system loading a program into memory to be run, or programming the flash of an MCU or programming the ram of some other embedded system (like a mouse for example which you might not know some of them the firmware is downloaded from the operating system and not all of it burned into a flash /lib/firmware or other locations).

相关问题