基于前一个,当前和下一个值处理Python字典

q3qa4bjr  于 2023-10-21  发布在  Python
关注(0)|答案(2)|浏览(135)

我有一个Python字典如下:

ip_dict = {'GLArch': {'GLArch-0.png':  ['OTHER', 'Figure 28 TAC '],
                      'GLArch-1.png':  ['DCDFP', 'This insurance '],
                      'GLArch-2.png':  ['DCDNP', 'Item 3'],
                      'GLArch-3.png':  ['OTHER', 'SCHEDULE OF'],
                      'GLArch-4.png':  ['OTHER', 'SCHEDULEed OF3'],
                      'GLArch-5.png':  ['DCCFP', 'COMMERCIAL GENERAL'],
                      'GLArch-6.png':  ['OTHER', 'a The'],
                      'GLArch-7.png':  ['OTHER', 'c Such attorney'],
                      'GLArch-8.png':  ['DCCNP', '2 To any'],
                      'GLArch-9.png':  ['OTHER', 'e as part'],
                      'GLArch-10.png': ['OTHER', '1 A watercraft'],
                      'GLArch-11.png': ['OTHER', '5 That particular'],
                      'GLArch-12.png': ['DCCNP', 'Damages claimed'],
                      'GLArch-13.png': ['OTHER', 'resulting from the'],
                      'GLArch-14.png': ['DCCNP', 'processing or packaging'],
                      'GLArch-15.png': ['DCCNP', 's. Fungi'],
                      'GLArch-16.png': ['OTHER', '1 the actual'],
                      'GLArch-17.png': ['OTHER', '5 6 9 10 11']}}

如果在DCCFPDCCNP之间出现了一个或多个OTHER,或者在DCCNP和另一个DCCNP之间出现了一个或多个OTHER,那么应该将其重命名为DCCNP。因此,在元素GLArch-6.pngGLArch-7.png中,由于它们都出现在DCCFPDCCNP之间,因此列表中的OTHER将重命名为DCCNP。类似于GLArch-9.pngGLArch-10.pngGLArch-11.png,其中存在的OTHER将被重命名为DCCNP,因为这些元素位于DCCNP和另一个DCCNP之间。GLArch-13.png也是如此。所以输出字典看起来像这样:

op_dict = {'GLArch': {'GLArch-0.png':  ['OTHER', 'Figure 28 TAC '],
                      'GLArch-1.png':  ['DCDFP', 'This insurance '],
                      'GLArch-2.png':  ['DCDNP', 'Item 3'],
                      'GLArch-3.png':  ['OTHER', 'SCHEDULE OF'],
                      'GLArch-4.png':  ['OTHER', 'SCHEDULEed OF3'],
                      'GLArch-5.png':  ['DCCFP', 'COMMERCIAL GENERAL'],
                      'GLArch-6.png':  ['DCCNP', 'a The'],
                      'GLArch-7.png':  ['DCCNP', 'c Such attorney'],
                      'GLArch-8.png':  ['DCCNP', '2 To any'],
                      'GLArch-9.png':  ['DCCNP', 'e as part'],
                      'GLArch-10.png': ['DCCNP', '1 A watercraft'],
                      'GLArch-11.png': ['DCCNP', '5 That particular'],
                      'GLArch-12.png': ['DCCNP', 'Damages claimed'],
                      'GLArch-13.png': ['DCCNP', 'resulting from the'],
                      'GLArch-14.png': ['DCCNP', 'processing or packaging'],
                      'GLArch-15.png': ['DCCNP', 's. Fungi'],
                      'GLArch-16.png': ['OTHER', '1 the actual'],
                      'GLArch-17.png': ['OTHER', '5 6 9 10 11']}}

我尝试了下面的脚本,但它不工作:

def process_dict(ip_dict):
    op_dict = {}
    for key, value in ip_dict.items():
        op_dict[key] = {}
        prev_val = None

        for k, v in value.items():
            if prev_val is not None and ("DCCFP" in prev_val and "DCCNP" in v[0]):
                op_dict[key][k] = ["DCCFP", v[1]]
            else:
                op_dict[key][k] = v

            prev_val = v[0]

    return op_dict
mfpqipee

mfpqipee1#

您可以使用一个临时列表来保存OTHER项目,同时等待知道如何处理它们。在这里使用一个函数来泛化嵌套字典。

def process(d):
    start_set = {'DCCFP', 'DCCNP'}
    end_set = {'DCCNP'}

    last = 'OTHER'
    out = {}
    tmp = []
    for k, l in d.items():
        #print(k, flag, l, tmp)
        if l[0] in start_set:      # if START, set name
            last = l[0]
        if l[0] == 'OTHER':        # if OTHER, collect
            tmp.append([k, l[1:]])
            continue
        if l[0] not in end_set:   # if not END keep "OTHER" name
            last = 'OTHER'
        for k2, v in tmp:         # append collected values
            out[k2] = [last, *v]
        tmp = []                  # reset tmp list
        out[k] = l[:]             # add current non-OTHER

    for k, v in tmp:              # handle final items if any
        out[k] = ['OTHER', *v]
    return out

final_out = {k: process(d) for k,d in ip_dict.items()}

输出量:

{'GLArch': {'GLArch-0.png': ['OTHER', 'Figure 28 TAC '],
            'GLArch-1.png': ['DCDFP', 'This insurance '],
            'GLArch-2.png': ['DCDNP', 'Item 3'],
            'GLArch-3.png': ['OTHER', 'SCHEDULE OF'],
            'GLArch-4.png': ['OTHER', 'SCHEDULEed OF3'],
            'GLArch-5.png': ['DCCFP', 'COMMERCIAL GENERAL'],
            'GLArch-6.png': ['DCCNP', 'a The'],
            'GLArch-7.png': ['DCCNP', 'c Such attorney'],
            'GLArch-8.png': ['DCCNP', '2 To any'],
            'GLArch-9.png': ['DCCNP', 'e as part'],
            'GLArch-10.png': ['DCCNP', '1 A watercraft'],
            'GLArch-11.png': ['DCCNP', '5 That particular'],
            'GLArch-12.png': ['DCCNP', 'Damages claimed'],
            'GLArch-13.png': ['DCCNP', 'resulting from the'],
            'GLArch-14.png': ['DCCNP', 'processing or packaging'],
            'GLArch-15.png': ['DCCNP', 's. Fungi'],
            'GLArch-16.png': ['OTHER', '1 the actual'],
            'GLArch-17.png': ['OTHER', '5 6 9 10 11']}}
wnrlj8wa

wnrlj8wa2#

我的答案使用了与@mozway相同的基本技术,但它修改了dict,这使得它更简单。如果需要保留ip_dict,可以先创建一个deepcopy(使用copy.deepcopy)并处理副本。

START_TOKENS = {"DCCFP", "DCCNP"}
END_TOKENS = {"DCCNP"}

def process_dict_inplace(ip_dict):
    for inner_dict in ip_dict.values():
        others_to_change = None

        for name, (kind, _) in inner_dict.items():
            # others_to_change is None until the first occurrence of a START_TOKEN
            if others_to_change is not None:
                if kind == "OTHER":
                    others_to_change.append(name)
                    continue

                if kind in END_TOKENS:
                    for other in others_to_change:
                        inner_dict[other][0] = "DCCNP"

            if kind in START_TOKENS:
                others_to_change = []

相关问题