linux 在pt_regs中,为什么bp和sp的值相差如此之大,以及为什么bp小于sp

t9aqgxwy  于 2023-08-03  发布在  Linux
关注(0)|答案(2)|浏览(122)

当在终端中点击'ls'时观察进程的创建时,使用gdb在arch/x86/kernel/process. c的copy_thread处设置断点,然后打印pt_regs的值。

{bx = 0x1200011, cx = 0x0, dx = 0x0, si = 0x0, di = 0xa0f38e8, bp = 0x8266000,
  ax = 0xffffffda, ds = 0x7b, __dsh = 0x0, es = 0x7b, __esh = 0x0, fs = 0x0, __fsh = 0x0,
  gs = 0x33, __gsh = 0x0, orig_ax = 0x78, ip = 0xb7f29549, cs = 0x73, __csh = 0x0, flags = 0x206,
  sp = 0xbfab35f0, ss = 0x7b, __ssh = 0x0}

字符串
pt_regs的bp是0x 8266000,pt_regs的sp是0xbfab 35 f0。我已经找到了他们被分配的地方。pt_regs的sp在arch/x86/entry/common.c的do_SYSENTER_32中分配

__visible noinstr long do_SYSENTER_32(struct pt_regs *regs)
{
    /* SYSENTER loses RSP, but the vDSO saved it in RBP. */
    regs->sp = regs->bp;

    /* SYSENTER clobbers EFLAGS.IF.  Assume it was set in usermode. */
    regs->flags |= X86_EFLAGS_IF;

    return do_fast_syscall_32(regs);
}


pt_regs的bp由get_user在__do_fast_syscall_32中分配。这似乎是从用户空间的价值。

static noinstr bool __do_fast_syscall_32(struct pt_regs *regs)
{
    // do other stuff...

    /* Fetch EBP from where the vDSO stashed it. */
    if (IS_ENABLED(CONFIG_X86_64)) {
        /*
         * Micro-optimization: the pointer we're following is
         * explicitly 32 bits, so it can't be out of range.
         */
        res = __get_user(*(u32 *)&regs->bp,
             (u32 __user __force *)(unsigned long)(u32)regs->sp);
    } else {
        res = get_user(*(u32 *)&regs->bp,
               (u32 __user __force *)(unsigned long)(u32)regs->sp);
    }

    // do other stuff...
    return true;
}


堆栈显示函数的顺序。

#0  copy_thread (clone_flags=clone_flags@entry=18874368, sp=0, arg=0, p=0xc31c0a00, tls=0)
    at arch/x86/kernel/process.c:133
#1  0xc1058722 in copy_process (pid=pid@entry=0x0, trace=trace@entry=0, node=node@entry=-1, 
    args=<optimized out>) at kernel/fork.c:2122
#2  0xc10593cc in kernel_clone (args=args@entry=0xc68e9f38) at kernel/fork.c:2500
#3  0xc1059807 in __do_sys_clone (child_tidptr=0xa0f38e8, tls=0, parent_tidptr=0x0, newsp=0, 
    clone_flags=<optimized out>) at kernel/fork.c:2617
#4  __se_sys_clone (child_tidptr=168769768, tls=0, parent_tidptr=0, newsp=0, 
    clone_flags=<optimized out>) at kernel/fork.c:2585
#5  __ia32_sys_clone (regs=<optimized out>) at kernel/fork.c:2585
#6  0xc1b04b85 in do_syscall_32_irqs_on (nr=<optimized out>, regs=0xc68e9fb4)
    at arch/x86/entry/common.c:77
#7  __do_fast_syscall_32 (regs=regs@entry=0xc68e9fb4) at arch/x86/entry/common.c:140
#8  0xc1b04c29 in do_fast_syscall_32 (regs=0xc68e9fb4) at arch/x86/entry/common.c:165
#9  0xc1b04c75 in do_SYSENTER_32 (regs=<optimized out>) at arch/x86/entry/common.c:208
#10 0xc1b0e32f in entry_SYSENTER_32 () at arch/x86/entry/entry_32.S:952
#11 0x01200011 in ?? ()
#12 0x00000000 in ?? ()


下面我有疑问:

  • 为什么存储在pt_regs中的ebp和esp差异如此之大?
  • 为什么存储在pt_regs中的ebp的值小于

特别是存储在pt_regs中,因为堆栈向下增长?
我使用了可调试的linux-5.12.10,命令'ls'是从busybox编译的。

x759pob2

x759pob21#

考虑传统INT $0x80系统调用机制和IA 32的现代快速系统调用机制在寄存器和堆栈使用方面的差异:
| 遗留系统调用|快速系统调用| Fast system call |
| --|--| ------------ |
| 系统调用号|系统调用号| system call number |
| arg1| arg1| arg1 |
| arg2| arg2| arg2 |
| arg3| arg3| arg3 |
| arg4| arg4| arg4 |
| arg5| arg5| arg5 |
| arg6|用户栈指针| user stack pointer |
| |arg6| arg6 |
对于快速系统调用机制,当entry_SYSENTER_32在内核堆栈上构造struct pt_regs条目时,sp成员将指向内核堆栈,bp成员将指向用户堆栈。因此,快速系统调用机制修复了spbp成员,以便与遗留系统调用机制兼容。在do_SYSENTER_32()中更正了sp成员值:

/* SYSENTER loses RSP, but the vDSO saved it in RBP. */
    regs->sp = regs->bp;

字符串
__do_fast_syscall_32()中更正了bp成员值,将其设置为用户堆栈中的 * arg 6 * 值:

/* Fetch EBP from where the vDSO stashed it. */
    if (IS_ENABLED(CONFIG_X86_64)) {
        /*
         * Micro-optimization: the pointer we're following is
         * explicitly 32 bits, so it can't be out of range.
         */
        res = __get_user(*(u32 *)&regs->bp,
             (u32 __user __force *)(unsigned long)(u32)regs->sp);
    } else {
        res = get_user(*(u32 *)&regs->bp,
               (u32 __user __force *)(unsigned long)(u32)regs->sp);
    }


当从do_int80_syscall_32()(对于传统系统调用机制)或__do_fast_syscall_32()(对于快速系统调用机制)调用do_syscall_32_irqs_on()时,无论使用哪种系统调用机制,regs->bpregs->sp值都将与预期相同。
另一个针对快速系统调用的修复发生在regs->ip上。EIP寄存器的原始值会被sysenter指令丢失,该指令通常从vDSO中的__kernel_vsyscall()函数执行。在do_fast_syscall_32()中更正了regs->ip

/*
     * Called using the internal vDSO SYSENTER/SYSCALL32 calling
     * convention.  Adjust regs so it looks like we entered using int80.
     */
    unsigned long landing_pad = (unsigned long)current->mm->context.vdso +
                    vdso_image_32.sym_int80_landing_pad;

    /*
     * SYSENTER loses EIP, and even SYSCALL32 needs us to skip forward
     * so that 'regs->ip -= 2' lands back on an int $0x80 instruction.
     * Fix it up.
     */
    regs->ip = landing_pad;


vDSO包含紧接在sysenter指令之后的int $0x80指令。landing_pad值是紧接在int $0x80指令之后的地址,因此从快速系统调用返回时不会到达该指令。
在vDSO中使用int $0x80指令的原因是为了支持缺少sysentersysexit指令的旧版CPU。在这种情况下,vDSO中__kernel_vsyscall()中的mov %esp, %ebp; sysenter指令序列将被nop指令替换,CPU将到达紧接着该指令序列的int $0x80指令,有效地将快速系统调用更改为旧CPU的遗留系统调用。该遗留系统调用将返回到int $0x80指令之后的点,就像快速系统调用一样。

vs3odd8k

vs3odd8k2#

18天后,我又做了一次实验,注意到一个奇怪的现象。我用32位ubuntu系统编译了一个程序,简化后的程序是

#include<unistd.h>

int main(int argc, char* argv[]) {
    fork();
    return 0;
}

字符串
将编译好的a.out放入rootfs.img.gz并启动qemu

qemu-system-i386 -m 256m -kernel ./bzImage -initrd ./rootfs.img.gz -append "root=/dev/ram init=/linuxrc nokaslr" -serial file:output.txt -s -S


然后使用gdb,设置break copy_thread
我在qemu中的linux shell中输入命令./a.out。
因为copy_thread函数中有一个代码*childregs = *current_pt_regs(),所以我可以通过打印childregs来查看用户堆栈信息
当shell创建一个.out的进程时,Linux内核在copy_thread处停止。当我输入p/x *childregs

(gdb) p/x *childregs
$10 = {bx = 0x1200011, cx = 0x0, dx = 0x0, si = 0x0, di = 0x9acb3e8, 
  bp = 0x8289000, ax = 0xffffffda, ds = 0x7b, __dsh = 0x0, es = 0x7b, 
  __esh = 0x0, fs = 0x0, __fsh = 0x0, gs = 0x33, __gsh = 0x0, orig_ax = 0x78, 
  ip = 0xb7f93549, cs = 0x73, __csh = 0x0, flags = 0x216, sp = 0xbfe797ec, 
  ss = 0x7b, __ssh = 0x0}


shell的堆栈信息是bp = 0x8289000,sp = 0xbfe797ec。bp的值很奇怪。
当.out运行fork()时,Linux内核再次停止在copy_thread。在输入p/x *childregs时。

(gdb) p/x *childregs
$11 = {bx = 0x1200011, cx = 0x0, dx = 0x0, si = 0x0, di = 0xb7eeb128, 
  bp = 0xbfa23818, ax = 0xffffffda, ds = 0x7b, __dsh = 0x0, es = 0x7b, 
  __esh = 0x0, fs = 0x0, __fsh = 0x0, gs = 0x33, __gsh = 0x0, orig_ax = 0x78, 
  ip = 0xb7ef0549, cs = 0x73, __csh = 0x0, flags = 0x246, sp = 0xbfa237d0, 
  ss = 0x7b, __ssh = 0x0}


a.out堆栈信息是bp = 0xbfa23818,sp = 0xbfa237d0。bp值大于sp值,两者相差不大。这正是我所期望的
但是当shell创建一个.out进程时,bp是0x 8289000,我不知道当时到底发生了什么。

相关问题