我有一个类似CSV read by pandas的CSV文件但是当我用PySpark读它的时候,它变成了这样CSV read by PySpark Spark中的分隔符有什么问题,我该如何修复它?
lymgl2op1#
从发布的图片来看,%2C(相当于,的URL encode)似乎是您的分隔符。将delimiter设置为%2C,并使用header选项:
%2C
,
delimiter
header
df = spark.read.option("header",True).option("delimiter", "%2C").csv(path)
输入CSV文件:
date%2Copening%2Chigh%2Clow%2Cclose%2Cadjclose%2Cvolume 2022-12-09%2C100%2C101%2C99%2C99.5%2C99.5%2C10000000 2022-12-09%2C200%2C202%2C199%2C199%2C199.1%2C20000000 2022-12-09%2C300%2C303%2C299%2C299%2C299.2%2C30000000
输出 Dataframe :
+----------+-------+----+---+-----+--------+--------+ |date |opening|high|low|close|adjclose|volume | +----------+-------+----+---+-----+--------+--------+ |2022-12-09|100 |101 |99 |99.5 |99.5 |10000000| |2022-12-09|200 |202 |199|199 |199.1 |20000000| |2022-12-09|300 |303 |299|299 |299.2 |30000000| +----------+-------+----+---+-----+--------+--------+
型
1条答案
按热度按时间lymgl2op1#
从发布的图片来看,
%2C
(相当于,
的URL encode)似乎是您的分隔符。将
delimiter
设置为%2C
,并使用header
选项:输入CSV文件:
输出 Dataframe :
型