json中的重复键,如何只考虑一个键

yk9xbfzb  于 2021-06-26  发布在  Hive
关注(0)|答案(0)|浏览(157)

从配置单元加载文件时,出现异常:
pyspark.sql.utils.analysisexception:u'重复列:找到“department\u name”,无法保存为json格式;
代码是:

conf = SparkConf().setAppName('pyspark')
sc=SparkContext(conf=conf)
sqlContext = SQLContext(sc)
result = sqlContext.read.json("path/department_dup_key.json")
result.registerTempTable("djson")
result_set=sqlContext.sql("select * from djson").collect()

“department\u dup\u key.json”文件内容为:

{"department_id":7,"department_name":"golf"}
{"department_id":8,"department_name":"apparel"}
{"department_id":9,"department_name":"fitness"}
{"department_id":10,"department_name":"testing","department_name":"Hellloooo"}

在读取Dataframe时可以忽略第二个“部门名称”吗?

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题