spark-replace列值-regex模式值有斜杠值-如何处理?

xsuvu9jc  于 2021-05-19  发布在  Spark
关注(0)|答案(1)|浏览(404)

Dataframe:

+-------------------+-------------------+
|               Desc|   replaced_columns|
+-------------------+-------------------+
|India is my Country|India is my Country|
| Delhi is my Nation| Delhi is my Nation|
| I Love India\Delhi| I Love India\Delhi|
|         I Love USA|         I Love USA|
|I am stay in USA\SA|I am stay in USA\SA|
+-------------------+-------------------+

“desc”列是Dataframe中的原始列名。替换列是在我们进行一些转换之后。在desc列中,我需要将“india\delhi”值替换为“-”。我试过下面的代码。

dataDF.withColumn("replaced_columns", regexp_replace(dataDF("Desc"), "India\\Delhi", "-")).show()

它不替换为“-”字符串。我该怎么做呢?

h79rfbju

h79rfbju1#

我找到了三种解决上述问题的方法:

val approach1 = dataDF.withColumn("replaced_columns", regexp_replace(col("Desc"), "\\\\","-")).show() // (it should be 4 backslash in actual while running in IDE)

val approach2 = dataDF.select($"Desc",translate($"Desc","\\","-").as("replaced_columns")).show()

以下是你在上面要求的具体记录——(在 desc 列,我需要替换 "India\Delhi" 价值 "-" . 我试过下面的代码。)

val approach3 = dataDF
   .withColumn("replaced_columns",when(col("Desc").like("%Delhi")
     , regexp_replace(col("Desc"), "\\\\", "-")).otherwise(col("Desc")))
    .show()

相关问题