在现有配置单元分区表中添加新记录:pyspark

ezykj2lf  于 2021-06-25  发布在  Hive
关注(0)|答案(0)|浏览(190)

我有一个分区配置单元表,其中包含一些记录。具有列(名称、城市、性别、联系人号码)的分区列(加载日期)
以及一个csv文件和一些其他列记录(姓名、城市、性别、联系人号码、加载日期)
现在我想用一些条件将csv中的新记录添加到现有配置单元表-->>i.如果存在新记录,则直接添加到配置单元表-->>ii.如果contactnumber相同,则更新该记录并保存在配置单元中。
''代码在下面,我试过了

from pyspark.sql import SparkSession
    sparkHive=SparkSession.builder.enableHiveSupport().getOrCreate()
    //Reading Hive Table as Dataframe
    df=sparkHive.table("employee_details_p")   
    //Reading CSV for new records 
    df2=sparkHive.read.format("csv").load(filepath)
    df2.write.format("orc").saveAsTable("new_details")

   sparkHive.sql('''merge into employee_details_p
   using new_details on employee_details_p.Contact = 
   new_details.Contact
   when matched then update set
   Name=new_details.Name,
   City=new_details.City
   Gender=new_details.Gender
   when not matched then insert
   values(new_details.Name, new_details.City, 
   new_details.Gender,
   new_details.Contact, 
   new_details.Start_Date,new_details.End_Date,
   new_details.Load_Date)_''')

'''1.示例分区配置单元数据

Name,City,Gender,Contact,Load_Date
Rajesh,Pune,M,8585655456,2020-01-27        
Sandeep,Akola,M,9795969598,2020-01-27
Briana,Munster,F,9856321452,2020-01-27
Aniket,Lonavala,F,7502123321,2020-01-27

2.新的csv文件数据,我想添加上述条件

Name,City,Gender,Contact,Load_Date
Rajesh,Mumbai,M,8585655456,2020-01-30
Ajikya,Nagpur,M,7896512568,2020-01-30

3.预期产出

Name,City,Gender,Contact,Load_Date
Rajesh,Mumbai,M,8585655456,2020-01-30      -->Updated City from new entry(Contact is same)
Sandeep,Akola,M,9795969598,2020-01-27
Briana,Munster,F,9856321452,2020-01-27
Aniket,Lonavala,F,7502123321,2020-01-27
Ajikya,Nagpur,M,7896512568,2020-01-30    -->Complete New record Added directly

错误parseexception:“\nmismatched input'merge'应为{'(','select','from','add','desc','with','values','create','table','insert','delete','description','show','use','drop','alter','map','set','reset','start','commit','rollback','reduce,'list'、'revoke'、'grant'、'lock'、'unlock'、'msck'、'export'、'import'、'load'}(第1行,位置0)\n\n==sql==\n合并到employee\u details\n^^^^\n使用employee\u details上的employee\u details\u p.contact==employee\u details.contact\n匹配后更新set\n name=employee\u details.name,\n city=employee\u details.city\n gender=employee\u details.gender\n如果不匹配,则插入值(employee\u details.name,employee\u details.city,employee\u details.gender,\n employee\u details.contact,employee\u details.start\u date,employee\u details.end\u date,employee\u details

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题