Pyspark -获取另一列中不存在的列的剩余值

gudnpqoy  于 2023-08-02  发布在  Spark
关注(0)|答案(1)|浏览(130)

我有一个列为first_nameforenames的 Dataframe 。first_name在第一个空格之前有第一个字符串,或者如果连字符出现在名称中第一个空格之前的第一个字符串中。

|forenames        | first_name |
+--------+---------------------+
|IVO KAI ROGERS   |  IVO       |
|DYLAN STUART JOHN|  DYLAN     |
|JOSH JACK        |  JOSH      |
|MONALISA ELIEN   |  MONALISA  |
|RACHEL-GREEN JOE |RACHEL-GREEN|

字符串
我想创建一个新的列middle_name,它将在第一个空格后面有第二个字符串,从forenamesfirst_name。预期输出为:

|forenames        | first_name | middle_name |
+--------+---------------------+-------------+
|IVO KAI ROGERS   |  IVO       | KAI ROGERS  |
|DYLAN STUART JOHN|  DYLAN     | STUART JOHN |
|JOSH JACK        |  JOSH      | JACK        |
|MONALISA ELIEN   |  MONALISA  | ELIEN       |
|RACHEL-GREEN JOE |RACHEL-GREEN| JOE         |

yeotifhr

yeotifhr1#

使用expr尝试使用replace函数。

Example:

df = spark.createDataFrame([('IVO KAI ROGERS','IVO'),('RACHEL-GREEN JOE','RACHEL-GREEN')],['forenames','firstname'])
df.show(10,False)

+----------------+------------+
#|forenames       |firstname   |
#+----------------+------------+
#|IVO KAI ROGERS  |IVO         |
#|RACHEL-GREEN JOE|RACHEL-GREEN|
#+----------------+------------+
df.selectExpr("*","replace(forenames,firstname) as middlename").show(10,False)
#+----------------+------------+-----------+
#|forenames       |firstname   |middlename |
#+----------------+------------+-----------+
#|IVO KAI ROGERS  |IVO         | KAI ROGERS|
#|RACHEL-GREEN JOE|RACHEL-GREEN| JOE       |
#+----------------+------------+-----------+

字符串

UPDATE:

这里有两种方法,使用regexp_replace,replace函数。

#using regexp_replace
df = spark.createDataFrame([('IVO KAI ROGERS','IVO'),('RACHEL- GREEN JOE','RACHEL-\\s+GREEN')],['forenames','firstname'])
df.show(10,False)
df.selectExpr("*","regexp_replace(forenames,firstname,'') as middlename").show(10,False)

#using replace
df = spark.createDataFrame([('IVO KAI ROGERS','IVO'),('RACHEL- GREEN JOE','RACHEL-GREEN')],['forenames','firstname'])
df.show(10,False)
df.selectExpr("*","replace(forenames,replace(firstname,'-','- ')) as middlename").show(10,False)

#+-----------------+------------+
#|forenames        |firstname   |
#+-----------------+------------+
#|IVO KAI ROGERS   |IVO         |
#|RACHEL- GREEN JOE|RACHEL-GREEN|
#+-----------------+------------+
#
#+-----------------+------------+-----------+
#|forenames        |firstname   |middlename |
#+-----------------+------------+-----------+
#|IVO KAI ROGERS   |IVO         | KAI ROGERS|
#|RACHEL- GREEN JOE|RACHEL-GREEN| JOE       |
#+-----------------+------------+-----------+

相关问题