我有一块文本需要重新排列(使用python),如下所示:
foo
bar
inner 1
inner 3
inner 2
another
stuff c
stuff b
stuff a
more
items z
items x
items y
这个排序函数的输出必须如下所示
foo
another
stuff a
stuff b
stuff c
bar
inner 1
inner 2
inner 3
more
items x
items y
items z
重要的细节包括:
如上面的示例所示,每个新的“深度”用4个空格表示
. 这在整个文本中是一致的。
在每个深度,项目应按字母顺序排序。但是,即使在排序之后,树的结构也必须保持不变。所以“物料a/b/c”必须始终将“bar”作为其父级。“项目x/y/z”必须始终将“更多”作为其父项。
这里有一个尝试,接近工作,但不完全。
import re
import textwrap
_EXPECTED_INDENTATION = " "
_PARSER = re.compile(r"(?P<indentation>\s*)(?P<words>.+)")
def _iter_lists(item):
if not isinstance(item, list):
return
yield item
for group in item:
for inner in _iter_lists(group):
yield inner
def _group_by_depth(names):
previous_depth = -1
all_groups = []
inner_group = []
for depth, name in names:
if previous_depth != -1 and depth != previous_depth:
all_groups.append(inner_group)
inner_group = []
inner_group.append((depth, name))
previous_depth = depth
if inner_group:
# Add the last group, just in case it was missed
all_groups.append(inner_group)
return all_groups
def _parse_by_depth(text):
output = []
for line in text.split("\n"):
if not line.strip():
continue
match = _PARSER.match(line)
count = int(match.group("indentation").count(_EXPECTED_INDENTATION))
word = match.group("words")
output.append((count, word))
return output
def _sort_all(all_groups):
for group in all_groups:
for inner in _iter_lists(group):
inner.sort()
def flatten_sequence(sequence):
if not sequence:
return sequence
if isinstance(sequence[0], list):
return flatten_sequence(sequence[0]) + flatten_sequence(sequence[1:])
return sequence[:1] + flatten_sequence(sequence[1:])
def main():
"""Run the main execution of the current script."""
text = textwrap.dedent(
"""\
foo
bar
inner 1
inner 3
inner 2
another
stuff c
stuff b
stuff a
more
items z
items x
items y
"""
)
names = _parse_by_depth(text)
# `_parse_by_depth` should generate
# names = [
# (0, 'foo'),
# (1, 'bar'),
# (2, 'inner 1'),
# (2, 'inner 3'),
# (2, 'inner 2'),
# (1, 'another'),
# (2, 'stuff c'),
# (2, 'stuff b'),
# (2, 'stuff a'),
# (1, 'more'),
# (2, 'items z'),
# (2, 'items x'),
# (2, 'items y'),
# ]
all_groups = _group_by_depth(names)
_sort_all(all_groups)
flattened = flatten_sequence(all_groups)
for depth, name in flattened:
print("{indentation}{name}".format(indentation=_EXPECTED_INDENTATION * depth, name=name))
if __name__ == "__main__":
main()
但它不起作用
foo
bar
inner 1
inner 2
inner 3
another
stuff a
stuff b
stuff c
more
items x
items y
items z
因为 _sort_all
只能对连续块进行正确排序。e、 g.“内部1/2/3”和“物料a/b/c”将被正确排序,但父项(如酒吧、另一家等)的顺序仍然错误。如何修改 _group_by_depth
和/或 _sort_all
要得到预期的订单?
1条答案
按热度按时间8xiog9wr1#
我建议采取这种做法:
我们可以将输入解释为一个包含几列的表,其中缩进对应于跳转到下一列。假定跳过的列与“父”行具有相同的值。我们可以想象,此表删除了那些“重复值”:
第1列第2列第3列foo(foo)bar(foo)(bar)internal 1(foo)(bar)internal 3(foo)(bar)internal 2(foo)other(foo)(其他)stuff c(foo)(其他)stuff b(foo)(其他)stuff a(foo)more(foo)(更多)项目z(foo)(更多)项目x(foo)(更多)项目y
一个想法是构建这个2d列表(包括重复值),然后对其排序,然后将其转换回原始格式。
以下是代码:
您可以按如下方式使用它: