json 阅读嵌套字典而不使用.get

hxzsmxv2  于 2023-10-21  发布在  其他
关注(0)|答案(1)|浏览(96)

我有一个名为sample(无文件格式)的文件,看起来像这样:

{
    "schema": "abc",
    "region": "asia",
    "values": {
        "before_values": {
            "id": 123,
            "created_at": "2023-07-28 19:21:39",
            "name": "alex"
        },
        "after_values": {
            "id": 123,
            "created_at": "2024-07-28 19:21:39",
            "name": null
        },
        "file_name": "my_file.1234"
    }
}{
    "schema": "abc",
    "region": "asia",
    "values": {
        "values": {
            "id": 456,
            "created_at": "2023-10-10 17:15:59",
            "name": null
        },
    "file_name": "my_file.1234"
    }
}

请注意,该文件包含在多个字典中,而没有重新命名。所以我需要像这里写的那样读取文件(完美地工作!):

import json

decoder = json.JSONDecoder()

with open('/path/to/sample', 'r') as content_file: 

    content = content_file.read()

    content_length = len(content)
    decode_index = 0
    raw_data_list = []

    while decode_index < content_length:
        try:
            data, decode_index = decoder.raw_decode(content, decode_index)
            # print("File index:", decode_index)

            print(type(data)) # returns dict

            print(data["schema"]) # works
            print(data["values"]) # works
            print(data["values"]["values"]) # KeyError: 'values'
            # WORKROUND
            raw_data = data.get("values", {})
            # Append raw_data to the list
            raw_data_list.append(raw_data)
    

        except json.JSONDecodeError as e:
            print("JSONDecodeError:", e)
            # Scan forward and keep trying to decode
            decode_index += 1

显然,获取data["values"]["values"]的解决方法是在try块中添加raw_data = data.get("values", {})。并将其添加到列表中,然后将其添加到例如:

for raw_data in raw_data_list:

    raw_data = raw_data.get('values', {})

    print(raw_data)

应该有更好的方法来处理这件事吧?因为在valuesbefore_valuesafter_values中检索值也会有同样的KeyError问题,例如:访问created_at..

xmakbtuz

xmakbtuz1#

IIUC,你想在每个字典中列出一个特定的“如果可用”项(例如idcreated_at,..),而不管它有多深。如果是这样的话,如果你想使用regex,你不仅可以修复你的JSON,还可以建立预期的列表:

import re, json
from itertools import chain

with (
    open("sample", "r") as inf,
    open("sample.json", "w") as ouf
):
    obj = json.loads("[%s]" % ",".join(
        re.findall(r"({.+?})(?={|\Z)", inf.read().strip(),
                   flags=re.M|re.S)))
    
    json.dump(obj, ouf) # this will make a new/valid JSON file

def fn(o, e):
    def walk(o, e):
        for k, v in o.items():
            if k == e and (val:=o.get(e)):
                yield val
            elif isinstance(v, dict):
                yield from walk(v, e)
    return list(chain.from_iterable([walk(d, e) for d in o]))

演示:[ regex101 ]
输出量:

fn(obj, "id")
# [123, 123, 456]

fn(obj, "created_at")
# ['2023-07-28 19:21:39', '2024-07-28 19:21:39', '2023-10-10 17:15:59']

相关问题