json 导出和加载嵌套的Pydantic模型

ego6inou  于 2023-08-08  发布在  其他
关注(0)|答案(2)|浏览(198)

我有一个带有嵌套数据结构的简单pydantic模型。我希望能够简单地将这个模型的示例保存和加载为.json文件。
所有模型都继承自Base类,配置简单。

class Base(pydantic.BaseModel):
    class Config:
        extra = 'forbid'   # forbid use of extra kwargs

字符串
有一些带有继承的简单数据模型

class Thing(Base):
    thing_id: int

class SubThing(Thing):
    name: str


和一个Container类,该类保存一个Thing

class Container(Base):
    thing: Thing


我可以创建一个Container示例并将其保存为.json

# make instance of container
c = Container(
    thing = SubThing(
        thing_id=1,
        name='my_thing')
)

json_string = c.json(indent=2)
print(json_string)

"""
{
  "thing": {
    "thing_id": 1,
    "name": "my_thing"
  }
}
"""


但是JSON字符串没有指定thing字段是使用SubThing构造的。因此,当我试图将这个字符串加载到一个新的Container示例中时,我得到一个错误。

print(c)
"""
Traceback (most recent call last):
  File "...", line 36, in <module>
    c = Container.parse_raw(json_string)
  File "pydantic/main.py", line 601, in pydantic.main.BaseModel.parse_raw
  File "pydantic/main.py", line 578, in pydantic.main.BaseModel.parse_obj
  File "pydantic/main.py", line 406, in pydantic.main.BaseModel.__init__
pydantic.error_wrappers.ValidationError: 1 validation error for Container
thing -> name
  extra fields not permitted (type=value_error.extra)
"""


有没有一种简单的方法可以保存Container示例,同时保留有关thing类类型的信息,以便我可以可靠地重建初始Container示例?如果可能的话,我想避免酸洗对象。
一种可能的解决方案是手动序列化,例如使用

def serialize(attr_name, attr_value, dictionary=None):
    if dictionary is None:
        dictionary = {}
    if not isinstance(attr_value, pydantic.BaseModel):
        dictionary[attr_name] = attr_value
    else:
        sub_dictionary = {}
        for (sub_name, sub_value) in attr_value:
            serialize(sub_name, sub_value, dictionary=sub_dictionary)
        dictionary[attr_name] = {type(attr_value).__name__: sub_dictionary}
    return dictionary

c1 = Container(
    container_name='my_container',
    thing=SubThing(
        thing_id=1,
        name='my_thing')
)

from pprint import pprint as print
print(serialize('Container', c1))

{'Container': {'Container': {'container_name': 'my_container',
                             'thing': {'SubThing': {'name': 'my_thing',
                                                    'thing_id': 1}}}}}


但这会失去利用包进行序列化的大部分好处。

k3fezbri

k3fezbri1#

尝试这个解决方案,我能够让它与pydantic一起工作。它有点丑陋,有点黑客,但至少它的工作和预期。

import pydantic

class Base(pydantic.BaseModel):
    class Config:
        extra = 'forbid'   # forbid use of extra kwargs

class Thing(Base):
    thing_id: int

class SubThing(Thing):
    name: str

class Container(Base):
    thing: Thing

    def __init__(self, **kwargs):
        # This answer helped steer me towards this solution:
        #   https://stackoverflow.com/a/66582140/10237506
        if not isinstance(kwargs['thing'], SubThing):
            kwargs['thing'] = SubThing(**kwargs['thing'])
        super().__init__(**kwargs)

def main():
    # make instance of container
    c1 = Container(
        thing=SubThing(
            thing_id=1,
            name='my_thing')
    )

    d = c1.dict()
    print(d)
    # {'thing': {'thing_id': 1, 'name': 'my_thing'}}

    # Now it works!
    c2 = Container(**d)

    print(c2)
    # thing=SubThing(thing_id=1, name='my_thing')
    
    # assert that the values for the de-serialized instance is the same
    assert c1 == c2

if __name__ == '__main__':
    main()

字符串
如果您不需要pydantic提供的一些特性,例如数据验证,您可以很容易地使用普通的数据类。您可以将其与dataclass-wizard这样的(反)序列化库配对使用,后者提供自动大小写转换和类型转换(例如。字符串到带注解的int),其工作原理与pydantic基本相同。下面是一个非常简单的用法:

from dataclasses import dataclass

from dataclass_wizard import asdict, fromdict

@dataclass
class Thing:
    thing_id: int

@dataclass
class SubThing(Thing):
    name: str

@dataclass
class Container:
    # Note: I had to update the annotation to `SubThing`. otherwise
    # when de-serializing, it creates a `Thing` instance which is not
    # what we want.
    thing: SubThing

def main():
    # make instance of container
    c1 = Container(
        thing=SubThing(
            thing_id=1,
            name='my_thing')
    )

    d = asdict(c1)
    print(d)
    # {'thing': {'thingId': 1, 'name': 'my_thing'}}

    # De-serialize a dict object in a new `Container` instance
    c2 = fromdict(Container, d)

    print(c2)
    # Container(thing=SubThing(thing_id=1, name='my_thing'))

    # assert that the values for the de-serialized instance is the same
    assert c1 == c2

if __name__ == '__main__':
    main()

tsm1rwdh

tsm1rwdh2#

从pydantic 2.0开始,pydantic不再默认挖掘所有模型,只将直接模型输出到dict,string,json等。
他们这样做是为了
[...]确保在序列化时精确地知道可以包括哪些字段,即使在示例化对象时传递了子类。特别是,这可以帮助防止在添加敏感信息(如secret)作为子类的字段时出现意外。
请参阅此处的迁移警告。
建议的解决方案是使用duck类型进行序列化:

from pydantic import BaseModel, SerializeAsAny

class Thing(BaseModel):
    thing_id: int

class SubThing(Thing):
    name: str

class Container(BaseModel):
    thing: SerializeAsAny[Thing]

字符串
这似乎解决了我的问题:.dict().model_dump()现在可以正常工作。

相关问题