如何使Python重命名代码更快？

snvhrwxg 于 2023-03-21 发布在 Python

关注(0)|答案(2)|浏览(123)

我正在编写一个重命名文件（几种不同的格式）的代码，所有类型为000_PROTEINNAME_....目标是用其各自的基因名称替换蛋白质名称。
我已经将这两个值保存在字典中，其中蛋白质名称作为键，基因名称作为值（名称本身被提取到包含此信息的tsv文件的两列列表中，然后我使用列表来制作字典）.我目前正在运行此程序，尽管它可以工作，这个文件夹包含了4000个文件，运行时间太长了。2你有什么建议可以减少这个文件夹的运行时间吗？

def RenameAlphaFold(files_path, reference_table_path):
    with open(reference_table_path) as names:
        TSV_table = csv.reader(names, delimiter= '\t')
        original_names = []
        new_names = []
        for line in TSV_table:
            original_names.append(line[2])
            new_names.append(line[5])
    names = dict(zip(original_names, new_names))
    for old, new in names.items():
        for file in os.listdir(files_path):
            destination = file.replace(old, new)
            source = files_path + '\\' + file
            destination = files_path + '\\' + destination
            os.rename(source, destination)

python

来源：https://stackoverflow.com/questions/75798963/how-do-i-make-this-python-renaming-code-faster

2条答案

按热度按时间

rkue9o1l1#

我 * 认为 * 您运行`os.rename`的次数过多

看起来你是在为files_path中的 * 每一个 * 文件运行这个，不管它是否需要重命名，* 为每一个替换组合 *。

快速修复

改变这个

os.rename(source, destination)

到这个

if source != destination:
            os.rename(source, destination)

可能更优雅的修复

我不知道这是否可行，但如果可行的话，它将是优雅的。
首先循环遍历文件，然后替换 * 字符串 *，但每个文件只重命名一次。

for file in os.listdir(files_path):
        source = files_path + '\\' + file
        destination = files_path + '\\' + file
        
        for old, new in names.items():
            destination = destination.replace(old, new)
        
        if source != destination:
            os.rename(source, destination)

子串问题？

with open(reference_table_path) as names:
        TSV_table = csv.reader(names, delimiter= '\t')
        name_pairs = []
        for line in TSV_table:
            name_pairs.append(line[2],line[5])

    name_pairs = sorted(name_pairs,
        key = -len(name_pairs[0])
    )
     

    for source in os.listdir(files_path):
       destination = source
       for name_pair in name_pairs:
            destination = destination.replace(old, new)

       if source != destination:
            os.rename(
                files_path + '\\' + source, 
                files_path + '\\' + destination
            )

赞(0）回复(0）举报 2023-03-21

ukxgm1gy2#

你有两个循环，一个在字典上，另一个在文件夹上。目前，你迭代一次字典，但在文件夹上迭代多次（在tsv文件中每行一次），而磁盘访问已知要比内存访问慢得多。你至少应该反转你的循环：

for file in os.listdir(files_path):
    for old, new in names.items():
        ...

另一个（不太重要的）可能的改进是，如果你可以很容易地从文件名中提取蛋白质名称，即如果它们有一个简单而常见的模式，那么可以访问每个键的字典。如果没有意义，不要费心构建字典，直接构建一个对列表：

with open(reference_table_path) as names:
    TSV_table = csv.reader(names, delimiter= '\t')
    names = [(line[2], line[5]) for line in TSV_table
for file in os.listdir(files_path):
    for old, new in names:
        ...

至少，这比你原来的代码更简单，所以更容易输入和维护。。

赞(0）回复(0）举报 2023-03-21

我来回答

如何使Python重命名代码更快？

2条答案

我 * 认为 * 您运行`os.rename`的次数过多

快速修复

可能更优雅的修复

子串问题？

相关问题

热门标签

最新问答

如何使Python重命名代码更快？

2条答案

我 * 认为 * 您运行os.rename的次数过多

快速修复

可能更优雅的修复

子串问题？

相关问题

热门标签

最新问答

我 * 认为 * 您运行`os.rename`的次数过多