I'm trying to delete records from my table before writing data into it from dataframe. Its not working for me ... What am I doing wrong?
Goal: "delete from xx_files_tbl" before writing new dataframe to table.
query = "(delete from xx_files_tbl)"
spark.write.format("jdbc")\
.option("url", "jdbc:sqlserver://"+server+":1433;databaseName="+db_name)\
.option("driver", driver_name)\
.option("dbtable", query)\
.option("user", user)\
.option("password", password)\
.option("truncate", "true")\
.save()
Thanks.
4条答案
按热度按时间aor9mmx11#
Instead of deleting the data in sql server table before writing your dataframe, you can directly write your dataframe with .mode("overwrite") and .option("truncate",true).
https://learn.microsoft.com/en-us/sql/big-data-cluster/spark-mssql-connector?view=sql-server-ver15
bf1o4zei2#
Spark documentations says that
dbtable
is used for passing table that should be read from or written into.FROM
clause can be use only while reading data with JDBC connector. (resource: https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html )My suggestion would be either to use
overwrite
writing mode or to open a separate connection for data deletion. Spark is not required for data deletion and connection to MySQL server. It will be enough to usePython MySQL
connector or to open a separate jdbc connection.oknrviil3#
This is always a limitation to execute DML operations using pyspark. But I have created a simple stored procedure in SQL server to accept any DML operation as parameter. I am calling that procedure from pyspark to run the dml operations in SQL server. its been working fine for me so far.
Create PROCEDURE DBO.dml_operations (@query varchar(2500)) AS BEGIN SET NOCOUNT ON;
END GO
declare @query varchar(2500) set @query = 'update <> set << my field >> = 4.33 where << char field >> = ''Something'''
exec DBO.dml_operations @query
I know we use different types of functions to run the stored procedures in pyspark.
let me know if you want to know how to run stored procs from pyspark.
ckocjqey4#
You can not delete the data,as dataframes are immutable. You can do filter operation and create new data frame and write to your location.Something like this will help you i think.
newdf=spark.sql("select * from xx_files_tbl WHERE value <= 1")