python-3.x 在文件中搜索一个字符串,然后用另一个字符串替换下一行,而不是预期的结果

yacmzcpb  于 2023-06-25  发布在  Python
关注(0)|答案(1)|浏览(109)

我有一个文件.sdf,像这样:

$$$$
compound1
#lots of text

$$$$
compound1
#lots of text

$$$$
compound2
#lots of text

$$$$
compound2
#lots of text

我试图重命名所有的复合名称,基本上是$$$$之后的行,以包括一个计数。预期结果如下:

$$$$
compound1_1
#lots of text

$$$$
compound1_2
#lots of text

$$$$
compound2_1
#lots of text

$$$$
compound2_2
#lots of text

我使用的代码是:

import sys
import re
import os
file = sys.argv[1]
file2 = file + '_fixed'
name2 = 'compound'
with open(file2,'w') as new_file:
    with open(file) as fp:
        # read all lines in a list
        for line in fp:
        # check if string present on a current line
            if "$$$$" in line:
                new_file.write(line)
                name = next(fp)
                if name2 != name:
                    name2 = name
                    j = 0
                    new_name = name + '_' + str(j)
                    new_file.write(new_name)
                else:
                    j = j + 1
                    new_name = name + '_' + str(j)
                    new_file.write(new_name)
            else:
                new_file.write(line)

但是输出文件是这样的:

$$$$
compound1
_1 #lots of text

$$$$
compound1
_2 #lots of text

$$$$
compound2
_1 #lots of text

$$$$
compound2
_2 #lots of text

感谢任何输入!

ryhaxcpt

ryhaxcpt1#

使用re + defaultdict的一种可能的解决方案:

import re
from itertools import count
from collections import defaultdict

txt = '''\
$$$$
compound1
#lots of text

$$$$
compound1
#lots of text

$$$$
compound2
#lots of text

$$$$
compound2
#lots of text
'''

d = defaultdict(lambda: count(1))

txt = re.sub(r'(?<=^\${4}\n)([^\n]+)', lambda g: f'{g[1]}_{next(d[g[1]])}', txt, flags=re.M)
print(txt)

图纸:

$$$$
compound1_1
#lots of text

$$$$
compound1_2
#lots of text

$$$$
compound2_1
#lots of text

$$$$
compound2_2
#lots of text

相关问题