如何使用pandas DataFrame中的新值列表更新SQL表列?

hmmo2u0o  于 11个月前  发布在  其他
关注(0)|答案(1)|浏览(129)

有没有一种方法可以使用pandas用一些新数据更新SQL表中的列?也许是用一个值列表?本质上我想做的是...
1.连接到数据库
1.从数据库中抓取一个表->转换为DataFrame
1.运行脚本以更新该DataFrame的列的值
1.使用新值/新DataFrame更新数据库表
我使用的表很大(> 100,000行)。
我可以完成步骤1、2和3,但我不知道如何完成步骤4并将更新后的值放回数据库表中。

示例脚本

import pandas as pd
import pyodbc as odbc

sql_conn = odbc.connect(<connections tuff>)

query = "SELECT * FROM myTable"
df = pd.read_sql(query, sql_conn)

myNewValueList = [11,12,13,14,15,16,17,18,19,20,….]  # long list of new values to update with
df[newColumnValues] = myNewValueList 

sql = "UPDATE myTable SET myColumn = %s"
val = df[newColumnValues]

mycursor.execute(sql_conn , val)

字符串

m4pnthwp

m4pnthwp1#

import pandas as pd
import pyodbc as odbc

# Connect to the database
sql_conn = odbc.connect(<connection_stuff>)

# Read the data
query = "SELECT * FROM myTable"
df = pd.read_sql(query, sql_conn)

# Update the DataFrame
myNewValueList = [11, 12, 13, 14, ...]  # Your new values
df['myColumn'] = myNewValueList

# Update statement
sql = "UPDATE myTable SET myColumn = ? WHERE <PrimaryKeyColumn> = ?"

# Update the database
cursor = sql_conn.cursor()
for index, row in df.iterrows():
    cursor.execute(sql, (row['myColumn'], row['<PrimaryKeyColumn>']))

# Commit the changes and close the connection
sql_conn.commit()
cursor.close()
sql_conn.close()

字符串
在上面的代码片段中,将<connection_stuff><PrimaryKeyColumn>myColumn替换为实际的连接详细信息、主键列和要更新的列。主键用于唯一标识更新的每行。
此方法对每一行执行SQL更新,这对于大型数据集可能不是很有效。为了更有效的批量更新,您可能需要考虑pandas的to_sql方法。它允许您直接将DataFrame写入SQL表,这比像上面的方法那样单独更新每一行更快,更直接。
然而,to_sql有其细微差别。它用于将新行插入到数据库中,并且处理更新需要考虑if_exists参数。因此,您可以做的是将修改后的DataFrame写入数据库中的临时表,从临时表执行更新到实际目标表,最后删除临时表。以下是使用此方法的实现:

import pandas as pd
import pyodbc as odbc
from sqlalchemy import create_engine

# Connect to the database using SQLAlchemy (needed for to_sql)
# Replace <connection_stuff> with your database details
engine = create_engine('mssql+pyodbc://<connection_stuff>')

# Existing code to modify the DataFrame
sql_conn = odbc.connect(<connection_stuff>)
query = "SELECT * FROM myTable"
df = pd.read_sql(query, sql_conn)
myNewValueList = [11, 12, 13, 14, 15, 16, 17, 18, 19, 20]
df['newColumnValues'] = myNewValueList

# Write the modified DataFrame to a temporary table
temp_table_name = 'temp_myTable'
df.to_sql(temp_table_name, engine, if_exists='replace', index=False)

# Execute an SQL query to update the original table from the temporary table
with engine.connect() as conn:
    update_query = f"""
    UPDATE myTable
    SET myColumn = temp.newColumnValues
    FROM myTable
    INNER JOIN {temp_table_name} as temp
    ON myTable.id = temp.id  -- Assuming 'id' is the unique identifier
    """
    conn.execute(update_query)

    # Drop the temporary table
    conn.execute(f"DROP TABLE {temp_table_name}")


请注意,此方法需要创建一个临时表,这可能需要权限,具体取决于您的数据库设置。

相关问题