在sparkDataframe中合并两列最有效的方法是什么?
我有两列意思相同。中的空值 timestamp
应该用中的值填充 toAppendData_timestamp
当两列都有值时,意味着值相等。。。
我有这个:
+--------------------+----------------------+--------+
| timestamp|toAppendData_timestamp| value|
+--------------------+----------------------+--------+
|2016-03-24 22:11:...| null| null|
| null| 2016-03-24 22:12:...|0.015625|
| null| 2016-03-19 15:54:...| 5.375|
|2016-03-19 15:55:...| 2016-03-19 15:55:...| 5.78125|
|2016-03-19 15:56:...| null| null|
|2016-03-24 22:11:...| 2016-03-24 22:11:...| 0.15625|
+--------------------+----------------------+--------+
我需要这个:
+--------------------+----------------------+--------+
| timestamp_merged|toAppendData_timestamp| value|
+--------------------+----------------------+--------+
|2016-03-24 22:11:...| null| null|
|2016-03-24 22:12:...| 2016-03-24 22:12:...|0.015625|
|2016-03-19 15:54:...| 2016-03-19 15:54:...| 5.375|
|2016-03-19 15:55:...| 2016-03-19 15:55:...| 5.78125|
|2016-03-19 15:56:...| null| null|
|2016-03-24 22:11:...| 2016-03-24 22:11:...| 0.15625|
+--------------------+----------------------+--------+
我试过,但没有成功:
appendedData = appendedData['timestamp'].fillna(appendedData['toAppendData_timestamp'])
1条答案
按热度按时间2ledvvac1#
你要找的功能是
coalesce
. 你可以从pyspark.sql.functions
:使用方法: