我有一个源表,我想根据以下场景将数据更新/插入到输出表中。
源表:
Name|age|dept|sal |school|college|deg|blood_group
aaa |10 |ece |1000|svv |sas |be |0+
bbb |20 |it |2000|svv |sas |be |A+
scenario 1: If value name,age,dept doesn't exists on output table,create new record
scenario 2: If value name,age,dept exists on output table , if no changes in school,college then do nothing
scenario 3: If value name,age,dept exists on output table , if changes in school,college then do nothing then create new record
I want to insert data's into output table based on above scenario using either spark sql or spark scala dataframe.
Please suggest me.
1条答案
按热度按时间3j86kqsm1#
我不太确定这是否管用
在编写之前,请先在代码中调用配置单元表,然后用它创建一个表/Dataframe,并将其称为prior\u df
现在,与前面的\u df表连接,因为您已经有了一个条件,使用withcolumn和when condition为“no action/filter”事务创建一个新的列。前一个参数将帮助您获得前一个值
将新的df写入配置单元表位置