我有一个分区配置单元表,其中包含一些记录。具有列(名称、城市、性别、联系人号码)的分区列(加载日期)
以及一个csv文件和一些其他列记录(姓名、城市、性别、联系人号码、加载日期)
现在我想用一些条件将csv中的新记录添加到现有配置单元表-->>i.如果存在新记录,则直接添加到配置单元表-->>ii.如果contactnumber相同,则更新该记录并保存在配置单元中。
''代码在下面,我试过了
from pyspark.sql import SparkSession
sparkHive=SparkSession.builder.enableHiveSupport().getOrCreate()
//Reading Hive Table as Dataframe
df=sparkHive.table("employee_details_p")
//Reading CSV for new records
df2=sparkHive.read.format("csv").load(filepath)
df2.write.format("orc").saveAsTable("new_details")
sparkHive.sql('''merge into employee_details_p
using new_details on employee_details_p.Contact =
new_details.Contact
when matched then update set
Name=new_details.Name,
City=new_details.City
Gender=new_details.Gender
when not matched then insert
values(new_details.Name, new_details.City,
new_details.Gender,
new_details.Contact,
new_details.Start_Date,new_details.End_Date,
new_details.Load_Date)_''')
'''1.示例分区配置单元数据
Name,City,Gender,Contact,Load_Date
Rajesh,Pune,M,8585655456,2020-01-27
Sandeep,Akola,M,9795969598,2020-01-27
Briana,Munster,F,9856321452,2020-01-27
Aniket,Lonavala,F,7502123321,2020-01-27
2.新的csv文件数据,我想添加上述条件
Name,City,Gender,Contact,Load_Date
Rajesh,Mumbai,M,8585655456,2020-01-30
Ajikya,Nagpur,M,7896512568,2020-01-30
3.预期产出
Name,City,Gender,Contact,Load_Date
Rajesh,Mumbai,M,8585655456,2020-01-30 -->Updated City from new entry(Contact is same)
Sandeep,Akola,M,9795969598,2020-01-27
Briana,Munster,F,9856321452,2020-01-27
Aniket,Lonavala,F,7502123321,2020-01-27
Ajikya,Nagpur,M,7896512568,2020-01-30 -->Complete New record Added directly
错误parseexception:“\nmismatched input'merge'应为{'(','select','from','add','desc','with','values','create','table','insert','delete','description','show','use','drop','alter','map','set','reset','start','commit','rollback','reduce,'list'、'revoke'、'grant'、'lock'、'unlock'、'msck'、'export'、'import'、'load'}(第1行,位置0)\n\n==sql==\n合并到employee\u details\n^^^^\n使用employee\u details上的employee\u details\u p.contact==employee\u details.contact\n匹配后更新set\n name=employee\u details.name,\n city=employee\u details.city\n gender=employee\u details.gender\n如果不匹配,则插入值(employee\u details.name,employee\u details.city,employee\u details.gender,\n employee\u details.contact,employee\u details.start\u date,employee\u details.end\u date,employee\u details
暂无答案!
目前还没有任何答案,快来回答吧!