如何从SQL表中下载大量数据，并通过一次获取1000条左右的记录连续保存到csv中

niwlg2el 于 2023-09-28 发布在其他

关注(0)|答案(2)|浏览(158)

我有一个由1000万行和很多列组成的SQL表，查询时表的大小约为44 GB。
然而，我试图从这个表中只获取3列，并将其保存到csv / load中，python将永远运行。即

pd.read_sql("select a,b,c from table") is taking more than 1 hour and not returning data

How to achieve this? Can I load this entire data in dataframe at once is that a viable option.
After this I should be able to perform some data manipulations on these rows.

2. OR should I download this to csv and read this data part by part to in-memory.

如果是2，如何为2编码？
到目前为止尝试的2的代码是：

def iter_row(cursor, size=10):
while True:
    rows = cursor.fetchmany(size)
    if not rows:
        break
    for row in rows:
        yield row

  def query_with_fetchmany():

    cursor.execute("SELECT * FROM books")

    for row in iter_row(cursor, 10):
        print(row)
    cursor.close()

csv

来源：https://stackoverflow.com/questions/44203644/how-to-download-large-data-from-a-sql-table-and-consecutively-save-into-csv-by-f

2条答案

按热度按时间

9vw9lbht1#

你可以按块读取数据：

for c in pd.read_sql("select a,b,c from table", con=connection, chunksize=10**5):
    c.to_csv(r'/path/to/file.csv', index=False, mode='a')

赞(0）回复(0）举报 2023-09-28

vyswwuz22#

我搜索并试验了许多不同的方法，发现fetchmany方法效果最好--在只有8 GB RAM的笔记本电脑上下载大小> 10 GB的数据库表时，没有RAM或CPU问题。它还允许您查看查询的进度。

from sqlalchemy import create_engine, text
import csv

csv_path = 'file.csv'
csv_columns = ['a', 'b', 'c']
sql_string = text("""select a, b, c from table_name;""")
batch_size = 50_000 # 50_000 rows worked best for me, ymmv
db_url = 'db_type:db_api://user:pwd@server:port/database'

engine = create_engine(db_url)
with engine.connect() as conn:

    with open(csv_path, mode='w', newline='', encoding='utf-8') as f:
        c = csv.writer(f, quoting=csv.QUOTE_MINIMAL)
        c.writerow(csv_columns)
        result = conn.execute(sql_string)
        counter = 1
        while True:
            print(f'Fetch # {counter}')
            rows = result.fetchmany(batch_size)
            if not rows:
                break
            for row in rows:
                c.writerow(row)
            counter += 1
        result.close()

赞(0）回复(0）举报 2023-09-28

我来回答

如何从SQL表中下载大量数据，并通过一次获取1000条左右的记录连续保存到csv中

2条答案

相关问题

热门标签

最新问答