从配置单元加载文件时,出现异常:
pyspark.sql.utils.analysisexception:u'重复列:找到“department\u name”,无法保存为json格式;
代码是:
conf = SparkConf().setAppName('pyspark')
sc=SparkContext(conf=conf)
sqlContext = SQLContext(sc)
result = sqlContext.read.json("path/department_dup_key.json")
result.registerTempTable("djson")
result_set=sqlContext.sql("select * from djson").collect()
“department\u dup\u key.json”文件内容为:
{"department_id":7,"department_name":"golf"}
{"department_id":8,"department_name":"apparel"}
{"department_id":9,"department_name":"fitness"}
{"department_id":10,"department_name":"testing","department_name":"Hellloooo"}
在读取Dataframe时可以忽略第二个“部门名称”吗?
暂无答案!
目前还没有任何答案,快来回答吧!