我有一个由1000万行和很多列组成的SQL表,查询时表的大小约为44 GB。
然而,我试图从这个表中只获取3列,并将其保存到csv / load中,python将永远运行。即
pd.read_sql("select a,b,c from table") is taking more than 1 hour and not returning data
How to achieve this? Can I load this entire data in dataframe at once is that a viable option.
After this I should be able to perform some data manipulations on these rows.
2. OR should I download this to csv and read this data part by part to in-memory.
如果是2,如何为2编码?
到目前为止尝试的2的代码是:
def iter_row(cursor, size=10):
while True:
rows = cursor.fetchmany(size)
if not rows:
break
for row in rows:
yield row
def query_with_fetchmany():
cursor.execute("SELECT * FROM books")
for row in iter_row(cursor, 10):
print(row)
cursor.close()
2条答案
按热度按时间9vw9lbht1#
你可以按块读取数据:
vyswwuz22#
我搜索并试验了许多不同的方法,发现
fetchmany
方法效果最好--在只有8 GB RAM的笔记本电脑上下载大小> 10 GB的数据库表时,没有RAM或CPU问题。它还允许您查看查询的进度。