Spark无法获取CSV文件的分隔符

sq1bmfud  于 2022-12-15  发布在  Spark
关注(0)|答案(1)|浏览(233)

我有一个类似CSV read by pandas的CSV文件
但是当我用PySpark读它的时候,它变成了这样CSV read by PySpark Spark中的分隔符有什么问题,我该如何修复它?

lymgl2op

lymgl2op1#

从发布的图片来看,%2C(相当于,URL encode)似乎是您的分隔符。
delimiter设置为%2C,并使用header选项:

df = spark.read.option("header",True).option("delimiter", "%2C").csv(path)

输入CSV文件:

date%2Copening%2Chigh%2Clow%2Cclose%2Cadjclose%2Cvolume
2022-12-09%2C100%2C101%2C99%2C99.5%2C99.5%2C10000000
2022-12-09%2C200%2C202%2C199%2C199%2C199.1%2C20000000
2022-12-09%2C300%2C303%2C299%2C299%2C299.2%2C30000000

输出 Dataframe :

+----------+-------+----+---+-----+--------+--------+
|date      |opening|high|low|close|adjclose|volume  |
+----------+-------+----+---+-----+--------+--------+
|2022-12-09|100    |101 |99 |99.5 |99.5    |10000000|
|2022-12-09|200    |202 |199|199  |199.1   |20000000|
|2022-12-09|300    |303 |299|299  |299.2   |30000000|
+----------+-------+----+---+-----+--------+--------+

相关问题