在spark中以csv格式保存Dataframe时,如何从列名中删除双引号?

wtlkbnrh  于 2021-05-27  发布在  Spark
关注(0)|答案(1)|浏览(1003)

我正在将sparkDataframe保存到csv文件中。所有记录都以双引号保存,这很好,但列名也以双引号保存。你能帮我把它们取下来吗?
例子:

"Source_System"|"Date"|"Market_Volume"|"Volume_Units"|"Market_Value"|"Value_Currency"|"Sales_Channel"|"Competitor_Name"
"IMS"|"20080628"|"183.0"|"16470.0"|"165653.256349"|"AUD"|"AUSTRALIA HOSPITAL"|"PFIZER"

理想输出:

Source_System|Date|Market_Volume|Volume_Units|Market_Value|Value_Currency|Sales_Channel|Competitor_Name
"IMS"|"20080628"|"183.0"|"16470.0"|"165653.256349"|"AUD"|"AUSTRALIA HOSPITAL"|"PFIZER"

我使用以下代码:

df4.repartition(1).write.csv(Output_Path_ASPAC, quote='"', header=True, quoteAll=True, sep='|', mode='overwrite')
a64a0gku

a64a0gku1#

我认为唯一的解决办法是在列中加引号 values 在Dataframe中 before writing to csv . Example: ```
df.show()

+---+----+------+

| id|name|salary|

+---+----+------+

| 1| a| 100|

+---+----+------+

from pyspark.sql.functions import col, concat, lit

cols = [concat(lit('"'), col(i), lit('"')).alias(i) for i in df.columns]
df1=df.select(*cols)

df1.show()

+---+----+------+

| id|name|salary|

+---+----+------+

|"1"| "a"| "100"|

+---+----+------+

df1.
write.
csv("", header=True, sep='|',escape='', quote='',mode='overwrite')

output

cat tmp4/part*

id|name|salary

"1"|"a"|"100"

相关问题