Pandas和写入csv文件的问题

qf9go6mv  于 2022-12-06  发布在  其他
关注(0)|答案(1)|浏览(136)

我有一个问题与Pandas和写入CSV文件。当我运行python脚本,我要么运行内存或我的计算机开始运行缓慢后,脚本运行完成。有没有什么办法把数据分块,并把块写入CSV?我有点新的编程在Python。

import itertools, hashlib, pandas as pd,time
chars = ['0','1','2','3','4','5','6','7','8','9','a','b','c','d','e','f']
numbers_list = list(range(0,25))
chunksize = 1_000_000
rows = []
for combination in itertools.combinations_with_replacement(chars, 10):
        for A in numbers_list:
            pure = str(A) + ':' + str(combination) 
            B = pure.replace(")", "").replace("(", "").replace("'", "").replace(",", "").replace(" ", "") 
            C = hashlib.sha256(B.encode('utf-8')).hexdigest()
            rows.append([A , B, C])
t0 = time.time()
df = pd.DataFrame(data=rows, columns=['A', 'B', 'C'])
df.to_csv('data.csv', index=False)
tdelta = time.time() - t0
print(tdelta)

我会非常感激你的帮助!谢谢!

omhiaaxx

omhiaaxx1#

因为你只使用dataframe来写文件,所以完全跳过它。你把完整的数据集构建到python列表的内存中,然后再构建到dataframe中,不必要地消耗RAM。标准库中的csv模块允许你逐行写入。

import itertools, hashlib, time, csv
chars = ['0','1','2','3','4','5','6','7','8','9','a','b','c','d','e','f']
numbers_list = list(range(0,25))
chunksize = 1_000_000
with open('test.csv', 'w', newline='') as fileobj:
    writer = csv.writer(fileobj)
    for combination in itertools.combinations_with_replacement(chars, 10):
        for A in numbers_list:
            pure = str(A) + ':' + str(combination) 
            B = pure.replace(")", "").replace("(", "").replace("'", "").replace(",", "").replace(" ", "") 
            C = hashlib.sha256(B.encode('utf-8')).hexdigest()
            writer.writerow([A , B, C])

这将很快进行,直到你已经填满了RAM缓存,为您的存储,然后将以任何速度的操作系统可以获得数据到磁盘。

相关问题