SQL Server Delete records from table before writing dataframe - pyspark

jchrr9hc 于 2023-05-16 发布在 Spark

关注(0)|答案(4)|浏览(181)

I'm trying to delete records from my table before writing data into it from dataframe. Its not working for me ... What am I doing wrong?

Goal: "delete from xx_files_tbl" before writing new dataframe to table.
 
query = "(delete from xx_files_tbl)"
spark.write.format("jdbc")\
            .option("url", "jdbc:sqlserver://"+server+":1433;databaseName="+db_name)\
            .option("driver", driver_name)\
            .option("dbtable", query)\
            .option("user", user)\
            .option("password", password)\
            .option("truncate", "true")\
            .save()

Thanks.

sql-server

来源：https://stackoverflow.com/questions/64343789/delete-records-from-table-before-writing-dataframe-pyspark

4条答案

按热度按时间

aor9mmx11#

Instead of deleting the data in sql server table before writing your dataframe, you can directly write your dataframe with .mode("overwrite") and .option("truncate",true).

https://learn.microsoft.com/en-us/sql/big-data-cluster/spark-mssql-connector?view=sql-server-ver15

赞(0）回复(0）举报 2023-05-16

bf1o4zei2#

Spark documentations says that dbtable is used for passing table that should be read from or written into. FROM clause can be use only while reading data with JDBC connector. (resource: https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html )

My suggestion would be either to use overwrite writing mode or to open a separate connection for data deletion. Spark is not required for data deletion and connection to MySQL server. It will be enough to use Python MySQL connector or to open a separate jdbc connection.

赞(0）回复(0）举报 2023-05-16

oknrviil3#

This is always a limitation to execute DML operations using pyspark. But I have created a simple stored procedure in SQL server to accept any DML operation as parameter. I am calling that procedure from pyspark to run the dml operations in SQL server. its been working fine for me so far.

Create PROCEDURE DBO.dml_operations (@query varchar(2500)) AS BEGIN SET NOCOUNT ON;

-- print(@query) if you want to see how this is being passed on to procedure.
exec(@query)
select 0

END GO

declare @query varchar(2500) set @query = 'update <> set << my field >> = 4.33 where << char field >> = ''Something'''

exec DBO.dml_operations @query

I know we use different types of functions to run the stored procedures in pyspark.

let me know if you want to know how to run stored procs from pyspark.

赞(0）回复(0）举报 2023-05-16

ckocjqey4#

You can not delete the data,as dataframes are immutable. You can do filter operation and create new data frame and write to your location.Something like this will help you i think.

newdf=spark.sql("select * from xx_files_tbl WHERE value <= 1")

赞(0）回复(0）举报 2023-05-16

我来回答

SQL Server Delete records from table before writing dataframe - pyspark

4条答案

相关问题

热门标签

最新问答