csv 循环通过列表时python脚本变慢

ifmq2ha2  于 2023-02-06  发布在  Python
关注(0)|答案(2)|浏览(132)

我根据原始csv中的一个列的值将一个csv拆分为两个csv。这段代码可以工作,但是在一个包含大约10000条记录的csv上运行需要大约一个小时。我尝试过枚举列表,但是我不认为这是加快速度的正确方法。
我的速度非常慢,而且对这个编程还很陌生,如果有人能解释一下我下一步应该把重点放在哪里来加快速度,我会很感激的。我知道最少的行数是最好的,但是我不知道在创建两个单独的csv时如何循环。循环甚至是这里的问题吗?

myList = ['2','12','20','33'...]
with open(originalCSV, 'rb') as f:
   reader = csv.DictReader(f)
   rows = [row for row in reader if row['Column 10'] in myList]
for row in rows:
   with open(inmylistCSV, 'wb') as w:
       fieldnames = ['Column 1', 'Column 2', 'Column 5', 'Column 10']
       csvwriter = csv.DictWriter(w, fieldnames=fieldnames)
       csvwriter.writeheader()
       csvwriter.writerows(rows)

with open(originalCSV, 'rb') as f:
   reader = csv.DictReader(f)
   rows = [row for row in reader if row['Column 10'] not in myList]
for row in rows:
   with open(notinmylistCSV, 'wb') as w:
       fieldnames = ['Column 1', 'Column 2', 'Column 5', 'Column 10']
       csvwriter = csv.DictWriter(w, fieldnames=fieldnames)
       csvwriter.writeheader()
       csvwriter.writerows(rows)
h7appiyu

h7appiyu1#

问题是,您要对10,000条记录重复两次循环,导致工作量加倍,即20,000条记录。

# This is what your doing

for x in range(10000):
    if is_odd(x):
       print('I am odd')

for x in range(10000):
    if is_even(x):
       print('I am even')

一个简单的解决方法就是将您的逻辑组合到其中

# This is what you should be doing

for x in range(10000):
    if is_odd(x):
       print('I am odd')
    else:
       print('I am even')

因此,总而言之,您现在应该做两件事
1.逻辑地合并下列行

rows = [row for row in reader if row['Column 10'] in myList]
rows = [row for row in reader if row['Column 10'] not in myList]

1.优化代码的csv写入部分

with open(notinmylistCSV | inmylistCSV, 'wb') as w:
   fieldnames = ['Column 1', 'Column 2', 'Column 5', 'Column 10']
   csvwriter = csv.DictWriter(w, fieldnames=fieldnames)
   csvwriter.writeheader()
   csvwriter.writerows(rows)
zaq34kh6

zaq34kh62#

为什么不通读原始CSV并将行分发给其他CSV呢?

myList = ['2','12','20','33'...]

fieldnames = ['Column 1', 'Column 2', 'Column 5', 'Column 10']

in_list = open(inmylistCSV, 'wb')
in_list_csvwriter = csv.DictWriter(in_list, fieldnames=fieldnames)
in_list_csvwriter.writeheader()

not_in_list = with open(notinmylistCSV, 'wb')
not_in_list_csvwriter = csv.DictWriter(not_in_list, fieldnames=fieldnames)
not_in_list_csvwriter.writeheader()

with open(originalCSV, 'rb') as f:
   reader = csv.DictReader(f)
   for row in reader:
       if row['Column 10'] in myList:
           in_list_csvwriter.writerow(row)
       else:
           not_in_list_csvwriter.writerow(row)

相关问题