如何使用Regexp求解此Pyspark代码块

pdkcd3nj 于 2023-01-09 发布在 Apache

关注(0)|答案(2)|浏览(121)

我有这个CSV文件

但当我运行我的笔记本regex显示一些错误

from pyspark.sql.functions import regexp_replace

path="dbfs:/FileStore/df/test.csv"
dff = spark.read.option("header", "true").option("inferSchema", "true").option('multiline', 'true').option('encoding', 'UTF-8').option("delimiter", "‡‡,‡‡").csv(path)

dff.show(truncate=False)
#dffs_headers = dff.dtypes

for i in dffs_headers:
  columnLabel = i[0]
  print(columnLabel)
  newColumnLabel = columnLabel.replace('‡‡','').replace('‡‡','')
  
  dff=dff.withColumn(newColumnLabel,regexp_replace(columnLabel,'^\\‡‡|\\‡‡$','')).drop(newColumnLabel)
  
  if columnLabel != newColumnLabel:
    dff = dff.drop(columnLabel)
    dff.show(truncate=False)

结果我得到了这个

任何人都可以即兴这个代码，这将是一个很大的帮助。
预期输出为
第一个月
但我得到了
��Id��,��Version��,��Questionnaire��,��Date��
第二列显示截断值

apache-spark

来源：https://stackoverflow.com/questions/75052606/how-to-solve-this-pyspark-code-block-using-regexp

2条答案

按热度按时间

6rqinv9w1#

您需要首先导入要使用的库，然后才能使用它们。
第一个月

赞(0）回复(0）举报 2023-01-09

h7appiyu2#

这是一个

from pyspark.sql.functions import regexp_replace

path="dbfs:/FileStore/df/test.csv"
dff = spark.read.option("header", "true").option("inferSchema", "true").option('multiline', 'true').option('encoding', 'UTF-8').option("delimiter", "‡‡,‡‡").csv(path)

#dffs_headers = dff.dtypes

for i in dffs_headers:
  columnLabel = i[0]
  newColumnLabel = columnLabel.replace('‡‡','').replace('‡‡','')
  
  dff=dff.withColumn(newColumnLabel,regexp_replace(columnLabel,'^\\‡‡|\\‡‡$',''))
  
  if columnLabel != newColumnLabel:
    dff = dff.drop(columnLabel)
  dff.show(truncate=False)

赞(0）回复(0）举报 2023-01-09

我来回答

如何使用Regexp求解此Pyspark代码块

2条答案

相关问题

热门标签

最新问答