我有以下字典:
ip_dict =
{
"doc_1" : {
"img_1" : ("FP","some long text"),
"img_2" : ("LP", "another long text"),
"img_3" : ("Others", "long text"),
"img_4" : ("Others", "some loong text"),
"img_5" : ("FP", "one more text"),
"img_6" : ("FP", "another one"),
"img_7" : ("LP", "ANOTHER ONE"),
"img_8" : ("Others", "some text"),
"img_9" : ("Others", "some moretext"),
"img_10" : ("FP", "more text"),
"img_11" : ("Others", "whatever"),
"img_12" : ("Others", "more whatever"),
"img_13" : ("LP", "SoMe TeXt"),
"img_14" : ("Others", "some moretext"),
"img_15" : ("FP", "whatever"),
"img_16" : ("Others", "whatever"),
"img_17" : ("LP", "whateverrr")
},
"doc_2" : {
"img_1" : ("FP", "text"),
"img_2" : ("FP", "more text"),
"img_3" : ("LP", "more more text"),
"img_4" : ("Others", "some more"),
"img_5" : ("Others", "text text"),
"img_6" : ("FP", "more more text"),
"img_7" : ("Others", "lot of text"),
"img_8" : ("LP", "still more text")
}
}
字符串
这里,FP
表示第一页,LP
表示最后一页。对于所有的docs
,我只想提取FP
和LP
。对于Others
,如果它们位于FP
和LP
之间,则仅提取它们,因为它们表示FP
和LP
之间的页。如果它们位于FP
和LP
之外,则忽略它们。同样,对于后面没有LP
的FP
,请将其视为单个页面并将其提取。因此,我的输出字典将如下所示:
op_dict =
{
"doc_1" : [
{
"img_1" : ("FP","some long text"),
"img_2" : ("LP", "another long text")
},
{
"img_5" : ("FP", "one more text")
},
{
"img_6" : ("FP", "another one"),
"img_7" : ("LP", "ANOTHER ONE")
},
{
"img_10" : ("FP", "more text"),
"img_11" : ("Others", "whatever"),
"img_12" : ("Others", "more whatever"),
"img_13" : ("LP", "SoMe TeXt"),
},
{
"img_15" : ("FP", "whatever"),
"img_16" : ("Others", "whatever"),
"img_17" : ("LP", "whateverrr"),
}
],
"doc_2" : [
{
"img_1" : ("FP", "text")
},
{
"img_2" : ("FP", "more text"),
"img_3" : ("LP", "more more text")
},
{
"img_6" : ("FP", "more more text"),
"img_7" : ("Others", "lot of text"),
"img_8" : ("LP", "still more text")
},
]
}
型
如您所见,已经提取了所有FP
和LP
,而且还提取了FP
和LP
之间的Others
,并将其存储在字典中。此外,还提取了后面没有LP
的那些FP
。
附言:
ip_dict =
{
"doc_1" : {
"img_0" : ("Others","some long text"),
"img_01" : ("Others","some long text"),
"img_1" : ("FP","some long text"),
"img_2" : ("LP", "another long text"),
"img_3" : ("Others", "long text"),
"img_4" : ("Others", "some loong text"),
"img_5" : ("FP", "one more text"),
"img_6" : ("FP", "another one"),
"img_7" : ("LP", "ANOTHER ONE"),
"img_61" : ("FP", "another one"),
"img_71" : ("LP", "ANOTHER ONE"),
"img_62" : ("FP", "another one"),
"img_72" : ("LP", "ANOTHER ONE"),
"img_8" : ("Others", "some text"),
"img_9" : ("Others", "some text"),
"img_10" : ("Others", "some text")
"img_54" : ("FP", "one more text"),
"img_540" : ("FP", "one more text"),
"img_541" : ("FP", "one more text"),
"img_11" : ("Others", "some text"),
"img_12" : ("Others", "some moretext"),
"img_13" : ("FP", "more text"),
"img_14" : ("Others", "whatever"),
"img_140" : ("Others", "whatever"),
"img_141" : ("Others", "whatever"),
"img_142" : ("Others", "whatever"),
"img_15" : ("Others", "more whatever"),
"img_16" : ("LP", "SoMe TeXt"),
"img_17" : ("Others", "some moretext"),
"img_18" : ("FP", "whatever"),
"img_19" : ("Others", "whatever"),
"img_20" : ("LP", "whateverrr")
},
"doc_2" : {
"img_1" : ("FP", "text"),
"img_2" : ("FP", "more text"),
"img_3" : ("LP", "more more text"),
"img_4" : ("Others", "some more"),
"img_5" : ("Others", "text text"),
"img_6" : ("FP", "more more text"),
"img_7" : ("Others", "lot of text"),
"img_8" : ("LP", "still more text"),
"img_9" : ("Others", "still more text")
"img_69" : ("FP", "more more text"),
}
}
op_dict =
{
"doc_1" : [
{
"img_1" : ("FP","some long text"),
"img_2" : ("LP", "another long text")
},
{
"img_5" : ("FP", "one more text")
},
{
"img_6" : ("FP", "another one"),
"img_7" : ("LP", "ANOTHER ONE")
},
{
"img_61" : ("FP", "another one"),
"img_71" : ("LP", "ANOTHER ONE"),
},
{
"img_62" : ("FP", "another one"),
"img_72" : ("LP", "ANOTHER ONE"),
},
{
"img_54" : ("FP", "one more text"),
},
{
"img_540" : ("FP", "one more text"),
},
{
"img_541" : ("FP", "one more text"),
},
{
"img_13" : ("FP", "more text"),
"img_14" : ("Others", "whatever"),
"img_140" : ("Others", "whatever"),
"img_141" : ("Others", "whatever"),
"img_142" : ("Others", "whatever"),
"img_15" : ("Others", "more whatever"),
"img_16" : ("LP", "SoMe TeXt"),
}
{
"img_18" : ("FP", "whatever"),
"img_19" : ("Others", "whatever"),
"img_20" : ("LP", "whateverrr")
}
],
"doc_2" : [
{
"img_1" : ("FP", "text")
},
{
"img_2" : ("FP", "more text"),
"img_3" : ("LP", "more more text")
},
{
"img_6" : ("FP", "more more text"),
"img_7" : ("Others", "lot of text"),
"img_8" : ("LP", "still more text")
},
{
"img_69" : ("FP", "more more text"),
}
]
}
型
附言2:
ip_dict = {
"doc_2" : {
"img_1" : ("FP", "text"),
"img_2" : ("Others", "more text"),
"img_3" : ("Others", "more more text"),
"img_4" : ("Others", "some more"),
"img_5" : ("Others", "text text"),
"img_6" : ("Others", "more more text"),
"img_7" : ("Others", "lot of text"),
"img_8" : ("Others", "still more text")
}
}
op_dict =
{
"doc_2" : {
"img_1" : ("FP", "text"),
}
}
型
感谢您的帮助!
4条答案
按热度按时间cunj1qz11#
使用扩展时序逻辑:
个字符
ql3eal8s2#
一种可能的方法:
字符串
其给出:
型
chy5wohz3#
试试这个方法。这是flag方法的一个经典用法,但正如评论所说,它只有在按顺序输入字典时才能工作。现在,它给出了所需的输出
字符串
bqujaahr4#
这是我的解决方案,它很长:
字符串
结果:
型