SQL Server Delete records from table before writing dataframe - pyspark

jchrr9hc  于 2023-05-16  发布在  Spark
关注(0)|答案(4)|浏览(169)

I'm trying to delete records from my table before writing data into it from dataframe. Its not working for me ... What am I doing wrong?

Goal: "delete from xx_files_tbl" before writing new dataframe to table.
 
query = "(delete from xx_files_tbl)"
spark.write.format("jdbc")\
            .option("url", "jdbc:sqlserver://"+server+":1433;databaseName="+db_name)\
            .option("driver", driver_name)\
            .option("dbtable", query)\
            .option("user", user)\
            .option("password", password)\
            .option("truncate", "true")\
            .save()

Thanks.

aor9mmx1

aor9mmx11#

Instead of deleting the data in sql server table before writing your dataframe, you can directly write your dataframe with .mode("overwrite") and .option("truncate",true).

https://learn.microsoft.com/en-us/sql/big-data-cluster/spark-mssql-connector?view=sql-server-ver15

bf1o4zei

bf1o4zei2#

Spark documentations says that dbtable is used for passing table that should be read from or written into. FROM clause can be use only while reading data with JDBC connector. (resource: https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html )

My suggestion would be either to use overwrite writing mode or to open a separate connection for data deletion. Spark is not required for data deletion and connection to MySQL server. It will be enough to use Python MySQL connector or to open a separate jdbc connection.

oknrviil

oknrviil3#

This is always a limitation to execute DML operations using pyspark. But I have created a simple stored procedure in SQL server to accept any DML operation as parameter. I am calling that procedure from pyspark to run the dml operations in SQL server. its been working fine for me so far.

Create PROCEDURE DBO.dml_operations (@query varchar(2500)) AS BEGIN SET NOCOUNT ON;

-- print(@query) if you want to see how this is being passed on to procedure.
exec(@query)
select 0

END GO

declare @query varchar(2500) set @query = 'update <> set << my field >> = 4.33 where << char field >> = ''Something'''

exec DBO.dml_operations @query

I know we use different types of functions to run the stored procedures in pyspark.

let me know if you want to know how to run stored procs from pyspark.

ckocjqey

ckocjqey4#

You can not delete the data,as dataframes are immutable. You can do filter operation and create new data frame and write to your location.Something like this will help you i think.

newdf=spark.sql("select * from xx_files_tbl WHERE value <= 1")

相关问题