如何从SQL表中下载大量数据,并通过一次获取1000条左右的记录连续保存到csv中

niwlg2el  于 2023-09-28  发布在  其他
关注(0)|答案(2)|浏览(158)

我有一个由1000万行和很多列组成的SQL表,查询时表的大小约为44 GB。
然而,我试图从这个表中只获取3列,并将其保存到csv / load中,python将永远运行。即

pd.read_sql("select a,b,c from table") is taking more than 1 hour and not returning data
How to achieve this? Can I load this entire data in dataframe at once is that a viable option.
After this I should be able to perform some data manipulations on these rows.

2. OR should I download this to csv and read this data part by part to in-memory.

如果是2,如何为2编码?
到目前为止尝试的2的代码是:

def iter_row(cursor, size=10):
while True:
    rows = cursor.fetchmany(size)
    if not rows:
        break
    for row in rows:
        yield row

  def query_with_fetchmany():

    cursor.execute("SELECT * FROM books")

    for row in iter_row(cursor, 10):
        print(row)
    cursor.close()
9vw9lbht

9vw9lbht1#

你可以按块读取数据:

for c in pd.read_sql("select a,b,c from table", con=connection, chunksize=10**5):
    c.to_csv(r'/path/to/file.csv', index=False, mode='a')
vyswwuz2

vyswwuz22#

我搜索并试验了许多不同的方法,发现fetchmany方法效果最好--在只有8 GB RAM的笔记本电脑上下载大小> 10 GB的数据库表时,没有RAM或CPU问题。它还允许您查看查询的进度。

from sqlalchemy import create_engine, text
import csv

csv_path = 'file.csv'
csv_columns = ['a', 'b', 'c']
sql_string = text("""select a, b, c from table_name;""")
batch_size = 50_000 # 50_000 rows worked best for me, ymmv
db_url = 'db_type:db_api://user:pwd@server:port/database'

engine = create_engine(db_url)
with engine.connect() as conn:

    with open(csv_path, mode='w', newline='', encoding='utf-8') as f:
        c = csv.writer(f, quoting=csv.QUOTE_MINIMAL)
        c.writerow(csv_columns)
        result = conn.execute(sql_string)
        counter = 1
        while True:
            print(f'Fetch # {counter}')
            rows = result.fetchmany(batch_size)
            if not rows:
                break
            for row in rows:
                c.writerow(row)
            counter += 1
        result.close()

相关问题