rust 如何在一个我知道将不再被修改的集合中创建一个长期的引用?

q7solyqu  于 2023-01-02  发布在  其他
关注(0)|答案(1)|浏览(126)

我正在实现一个字节码虚拟机,并且正在努力引用存储在字节码解析表示中的数据。作为(大多数)字节码的本质,它和它的解析表示在初始化后保持不变。一个单独的Vm包含可变部分(堆栈等)沿着该模块。我做了一个带有附加解释性注解的MCVE来说明这个问题;它位于playground的底部。解析后的字节码可能如下所示:

Module { struct_types: {"Bar": StructType::Named(["a", "b"])} }

字符串"Bar""a""b"是对字节码的引用,它们的生存期为'b,因此类型Module<'b>StructType<'b>也有生存期。
创建完之后,我将创建struct示例,比如let bar = Bar { a: (), b: () };,至少目前,每个struct示例需要保存一个对其类型的引用,因此该类型可能如下所示:

pub struct Struct<'b> {
    struct_type: &'b bytecode::StructType<'b>,
    fields: Vec<Value<'b>>,
}

结构体的字段值可以是常量,其值存储在字节码中,因此Value枚举也有一个生存期'b,这是可行的,问题是我在第一个字段中有一个&'b bytecode::StructType<'b>:**如何获得一个生存时间足够长的引用?我认为该引用实际上会 * 有效足够长。
我怀疑代码中最关键的部分如下:

pub fn struct_type(&self, _name: &str) -> Option<&'b StructType<'b>> {
            // self.struct_types.get(name)
            todo!("fix lifetime problems")
        }

对于注解掉的代码,我无法获得'b引用,因为引用self.struct_types的寿命太短;要解决这个问题,我需要做&'b self,这将通过代码病毒式传播;而且,大多数时候我需要可变地借用Vm,如果所有那些独占的self引用都必须存活很长时间,这就不起作用了。
引入一个单独的生存期'm以便我可以返回一个&'m StructType<'b>听起来像是我也可以尝试的事情,但这听起来像病毒一样,此外还引入了一个我需要跟踪的单独的生存期;能够用X1 M18 N1 X替换X1 M17 N1 X(或者至少在每个地方只有ON)会好一点。
最后,这感觉像是固定可能会有帮助的东西,但我对这个主题的理解还不足以对如何处理这个问题做出有根据的猜测。
MCVE

#![allow(dead_code)]

mod bytecode {
    use std::collections::BTreeMap;
    
    #[derive(Debug, Clone, PartialEq, Eq)]
    pub enum StructType<'b> {
        /// unit struct type; doesn't have fields
        Empty,
        /// tuple struct type; fields are positional
        Positional(usize),
        /// "normal" struct type; fields are named
        Named(Vec<&'b str>),
    }
        
    impl<'b> StructType<'b> {
        pub fn field_count(&self) -> usize {
            match self {
                Self::Empty => 0,
                Self::Positional(field_count) => *field_count,
                Self::Named(fields) => fields.len(),
            }
        }
    }
    
    #[derive(Debug, Clone)]
    pub struct Module<'b> {
        struct_types: BTreeMap<&'b str, StructType<'b>>,
    }

    impl<'b> Module<'b> {
        // here is the problem: I would like to return a reference with lifetime 'b.
        // from the point I start executing instructions, I know that I won't modify
        // the module (particularly, I won't add entries to the map), so I think that
        // lifetime should be possible - pinning? `&'b self` everywhere? idk
        pub fn struct_type(&self, _name: &str) -> Option<&'b StructType<'b>> {
            // self.struct_types.get(name)
            todo!("fix lifetime problems")
        }
    }
    
    pub fn parse<'b>(bytecode: &'b str) -> Module<'b> {
        // this would use nom to parse actual bytecode
        assert_eq!(bytecode, "struct Bar { a, b }");

        let bar = &bytecode[7..10];
        let a = &bytecode[13..14];
        let b = &bytecode[16..17];

        let fields = vec![a, b];
        let bar_struct = StructType::Named(fields);
        let struct_types = BTreeMap::from_iter([
            (bar, bar_struct),
        ]);
        Module { struct_types }
    }
}

mod vm {
    use crate::bytecode::{self, StructType};

    #[derive(Debug, Clone)]
    pub enum Value<'b> {
        Unit,
        Struct(Struct<'b>),
    }
    
    #[derive(Debug, Clone)]
    pub struct Struct<'b> {
        struct_type: &'b bytecode::StructType<'b>,
        fields: Vec<Value<'b>>,
    }

    impl<'b> Struct<'b> {
        pub fn new(struct_type: &'b bytecode::StructType<'b>, fields: Vec<Value<'b>>) -> Self {
            Struct { struct_type, fields }
        }
    }

    #[derive(Debug, Clone)]
    pub struct Vm<'b> {
        module: bytecode::Module<'b>,
    }

    impl<'b> Vm<'b> {
        pub fn new(module: bytecode::Module<'b>) -> Self {
            Self { module }
        }

        pub fn create_struct(&mut self, type_name: &'b str) -> Value<'b> {
            let struct_type: &'b StructType<'b> = self.module.struct_type(type_name).unwrap();
            // just initialize the fields to something, we don't care
            let fields = vec![Value::Unit; struct_type.field_count()];

            let value = Value::Struct(Struct::new(struct_type, fields));
            value
        }
    }
}

pub fn main() {
    // the bytecode contains all constants needed at runtime;
    // we're just interested in how struct types are handled
    // obviously the real bytecode is not as human-readable
    let bytecode = "struct Bar { a, b }";
    // we parse that into a module that, among other things,
    // has a map of all struct types
    let module = bytecode::parse(bytecode);
    println!("{:?}", module);

    // we create a Vm that is capable of running commands
    // that are stored in the module
    let mut vm = vm::Vm::new(module);

    // now we try to execute an instruction to create a struct value
    // the instruction for this contains a reference to the type name
    // stored in the bytecode.
    // the struct value contains a reference to its type and holds its field values.
    let value = {
        let bar = &bytecode[7..10];
        vm.create_struct(bar)
    };
    println!("{:?}", value);
}
gr8qqesn

gr8qqesn1#

&'b bytecode::StructType<'b>是Rust中的一个经典反模式,它强烈地指出了错误注解的生存期。一个对象依赖于某个生存期并借用它创建相同的生存期是没有意义的。这是非常罕见的故意发生的。
所以我猜想你需要两个生命周期,我称之为'm'b

  • 'bbytecode字符串的生存期,引用它的所有内容都将使用&'b str
  • 'mModule对象的生存期。所有引用它或它包含的StructType的对象都将使用此生存期。

如果分成两个生命周期并进行正确调整,它将简单地工作:

#![allow(dead_code)]

mod bytecode {
    use std::{collections::BTreeMap, iter::FromIterator};

    #[derive(Debug, Clone, PartialEq, Eq)]
    pub enum StructType<'b> {
        /// unit struct type; doesn't have fields
        Empty,
        /// tuple struct type; fields are positional
        Positional(usize),
        /// "normal" struct type; fields are named
        Named(Vec<&'b str>),
    }

    impl<'b> StructType<'b> {
        pub fn field_count(&self) -> usize {
            match self {
                Self::Empty => 0,
                Self::Positional(field_count) => *field_count,
                Self::Named(fields) => fields.len(),
            }
        }
    }

    #[derive(Debug, Clone)]
    pub struct Module<'b> {
        struct_types: BTreeMap<&'b str, StructType<'b>>,
    }

    impl<'b> Module<'b> {
        // here is the problem: I would like to return a reference with lifetime 'b.
        // from the point I start executing instructions, I know that I won't modify
        // the module (particularly, I won't add entries to the map), so I think that
        // lifetime should be possible - pinning? `&'b self` everywhere? idk
        pub fn struct_type(&self, name: &str) -> Option<&StructType<'b>> {
            self.struct_types.get(name)
        }
    }

    pub fn parse<'b>(bytecode: &'b str) -> Module<'b> {
        // this would use nom to parse actual bytecode
        assert_eq!(bytecode, "struct Bar { a, b }");

        let bar = &bytecode[7..10];
        let a = &bytecode[13..14];
        let b = &bytecode[16..17];

        let fields = vec![a, b];
        let bar_struct = StructType::Named(fields);
        let struct_types = BTreeMap::from_iter([(bar, bar_struct)]);
        Module { struct_types }
    }
}

mod vm {
    use crate::bytecode::{self, StructType};

    #[derive(Debug, Clone)]
    pub enum Value<'b, 'm> {
        Unit,
        Struct(Struct<'b, 'm>),
    }

    #[derive(Debug, Clone)]
    pub struct Struct<'b, 'm> {
        struct_type: &'m bytecode::StructType<'b>,
        fields: Vec<Value<'b, 'm>>,
    }

    impl<'b, 'm> Struct<'b, 'm> {
        pub fn new(struct_type: &'m bytecode::StructType<'b>, fields: Vec<Value<'b, 'm>>) -> Self {
            Struct {
                struct_type,
                fields,
            }
        }
    }

    #[derive(Debug, Clone)]
    pub struct Vm<'b> {
        module: bytecode::Module<'b>,
    }

    impl<'b> Vm<'b> {
        pub fn new(module: bytecode::Module<'b>) -> Self {
            Self { module }
        }

        pub fn create_struct(&mut self, type_name: &str) -> Value<'b, '_> {
            let struct_type: &StructType<'b> = self.module.struct_type(type_name).unwrap();
            // just initialize the fields to something, we don't care
            let fields = vec![Value::Unit; struct_type.field_count()];

            let value = Value::Struct(Struct::new(struct_type, fields));
            value
        }
    }
}

pub fn main() {
    // the bytecode contains all constants needed at runtime;
    // we're just interested in how struct types are handled
    // obviously the real bytecode is not as human-readable
    let bytecode = "struct Bar { a, b }";
    // we parse that into a module that, among other things,
    // has a map of all struct types
    let module = bytecode::parse(bytecode);
    println!("{:?}", module);

    // we create a Vm that is capable of running commands
    // that are stored in the module
    let mut vm = vm::Vm::new(module);

    // now we try to execute an instruction to create a struct value
    // the instruction for this contains a reference to the type name
    // stored in the bytecode.
    // the struct value contains a reference to its type and holds its field values.
    let value = {
        let bar = &bytecode[7..10];
        vm.create_struct(bar)
    };
    println!("{:?}", value);
}
Module { struct_types: {"Bar": Named(["a", "b"])} }
Struct(Struct { struct_type: Named(["a", "b"]), fields: [Unit, Unit] })

然而,由于'm连接到'b,因此依赖于'm的任何东西也自动地访问'b对象,因为'b被保证比'm活得长,所以它可以被进一步简化。
因此,让我们引入'a,我们现在将在vm mod中使用它来引用bytecode mod中的任何内容。这将进一步允许在几个点上发生生存期elysion,从而进一步简化代码:
一个二个一个一个

    • 有趣的事实:**现在这是我们必须合法使用&'a bytecode::StructType<'a>的罕见情况之一,所以请对我的开场白持保留态度,您一直都是对的:)

疯狂的是,如果我们将'a重命名为'b,以与原始代码保持一致,那么我们得到的几乎是您的代码,只有一些微小的差异:

#![allow(dead_code)]

mod bytecode {
    use std::{collections::BTreeMap, iter::FromIterator};

    #[derive(Debug, Clone, PartialEq, Eq)]
    pub enum StructType<'b> {
        /// unit struct type; doesn't have fields
        Empty,
        /// tuple struct type; fields are positional
        Positional(usize),
        /// "normal" struct type; fields are named
        Named(Vec<&'b str>),
    }

    impl<'b> StructType<'b> {
        pub fn field_count(&self) -> usize {
            match self {
                Self::Empty => 0,
                Self::Positional(field_count) => *field_count,
                Self::Named(fields) => fields.len(),
            }
        }
    }

    #[derive(Debug, Clone)]
    pub struct Module<'b> {
        struct_types: BTreeMap<&'b str, StructType<'b>>,
    }

    impl<'b> Module<'b> {
        // here is the problem: I would like to return a reference with lifetime 'b.
        // from the point I start executing instructions, I know that I won't modify
        // the module (particularly, I won't add entries to the map), so I think that
        // lifetime should be possible - pinning? `&'b self` everywhere? idk
        pub fn struct_type(&self, name: &str) -> Option<&StructType<'b>> {
            self.struct_types.get(name)
        }
    }

    pub fn parse<'b>(bytecode: &'b str) -> Module<'b> {
        // this would use nom to parse actual bytecode
        assert_eq!(bytecode, "struct Bar { a, b }");

        let bar = &bytecode[7..10];
        let a = &bytecode[13..14];
        let b = &bytecode[16..17];

        let fields = vec![a, b];
        let bar_struct = StructType::Named(fields);
        let struct_types = BTreeMap::from_iter([(bar, bar_struct)]);
        Module { struct_types }
    }
}

mod vm {
    use crate::bytecode::{self, StructType};

    #[derive(Debug, Clone)]
    pub enum Value<'b> {
        Unit,
        Struct(Struct<'b>),
    }

    #[derive(Debug, Clone)]
    pub struct Struct<'b> {
        struct_type: &'b bytecode::StructType<'b>,
        fields: Vec<Value<'b>>,
    }

    impl<'b> Struct<'b> {
        pub fn new(struct_type: &'b bytecode::StructType, fields: Vec<Value<'b>>) -> Self {
            Struct {
                struct_type,
                fields,
            }
        }
    }

    #[derive(Debug, Clone)]
    pub struct Vm<'b> {
        module: bytecode::Module<'b>,
    }

    impl<'b> Vm<'b> {
        pub fn new(module: bytecode::Module<'b>) -> Self {
            Self { module }
        }

        pub fn create_struct(&mut self, type_name: &str) -> Value {
            let struct_type: &StructType = self.module.struct_type(type_name).unwrap();
            // just initialize the fields to something, we don't care
            let fields = vec![Value::Unit; struct_type.field_count()];

            let value = Value::Struct(Struct::new(struct_type, fields));
            value
        }
    }
}

pub fn main() {
    // the bytecode contains all constants needed at runtime;
    // we're just interested in how struct types are handled
    // obviously the real bytecode is not as human-readable
    let bytecode = "struct Bar { a, b }";
    // we parse that into a module that, among other things,
    // has a map of all struct types
    let module = bytecode::parse(bytecode);
    println!("{:?}", module);

    // we create a Vm that is capable of running commands
    // that are stored in the module
    let mut vm = vm::Vm::new(module);

    // now we try to execute an instruction to create a struct value
    // the instruction for this contains a reference to the type name
    // stored in the bytecode.
    // the struct value contains a reference to its type and holds its field values.
    let value = {
        let bar = &bytecode[7..10];
        vm.create_struct(bar)
    };
    println!("{:?}", value);
}
Module { struct_types: {"Bar": Named(["a", "b"])} }
Struct(Struct { struct_type: Named(["a", "b"]), fields: [Unit, Unit] })

因此,对原始代码的实际修复如下所示:

4c4
<     use std::collections::BTreeMap;
---
>     use std::{collections::BTreeMap, iter::FromIterator};
36,38c36,37
<         pub fn struct_type(&self, _name: &str) -> Option<&'b StructType<'b>> {
<             // self.struct_types.get(name)
<             todo!("fix lifetime problems")
---
>         pub fn struct_type(&self, name: &str) -> Option<&StructType<'b>> {
>             self.struct_types.get(name)
73c72
<         pub fn new(struct_type: &'b bytecode::StructType<'b>, fields: Vec<Value<'b>>) -> Self {
---
>         pub fn new(struct_type: &'b bytecode::StructType, fields: Vec<Value<'b>>) -> Self {
91,92c90,91
<         pub fn create_struct(&mut self, type_name: &'b str) -> Value<'b> {
<             let struct_type: &'b StructType<'b> = self.module.struct_type(type_name).unwrap();
---
>         pub fn create_struct(&mut self, type_name: &str) -> Value {
>             let struct_type: &StructType = self.module.struct_type(type_name).unwrap();

我希望逐步得出这些结论能在一定程度上说明为什么需要进行这些修改。

相关问题