我有一个文件.sdf,像这样:
$$$$
compound1
#lots of text
$$$$
compound1
#lots of text
$$$$
compound2
#lots of text
$$$$
compound2
#lots of text
我试图重命名所有的复合名称,基本上是$$$$
之后的行,以包括一个计数。预期结果如下:
$$$$
compound1_1
#lots of text
$$$$
compound1_2
#lots of text
$$$$
compound2_1
#lots of text
$$$$
compound2_2
#lots of text
我使用的代码是:
import sys
import re
import os
file = sys.argv[1]
file2 = file + '_fixed'
name2 = 'compound'
with open(file2,'w') as new_file:
with open(file) as fp:
# read all lines in a list
for line in fp:
# check if string present on a current line
if "$$$$" in line:
new_file.write(line)
name = next(fp)
if name2 != name:
name2 = name
j = 0
new_name = name + '_' + str(j)
new_file.write(new_name)
else:
j = j + 1
new_name = name + '_' + str(j)
new_file.write(new_name)
else:
new_file.write(line)
但是输出文件是这样的:
$$$$
compound1
_1 #lots of text
$$$$
compound1
_2 #lots of text
$$$$
compound2
_1 #lots of text
$$$$
compound2
_2 #lots of text
感谢任何输入!
1条答案
按热度按时间ryhaxcpt1#
使用
re
+defaultdict
的一种可能的解决方案:图纸: