从pysparkDataframe获取平均日期值

0yg35tkg  于 2021-05-22  发布在  Spark
关注(0)|答案(1)|浏览(456)

我有一个带有产品数据的df,其模式如下

root
 |-- Creator: string (nullable = true)
 |-- Created_datetime: timestamp (nullable = true)
 |-- Last_modified_datetime: timestamp (nullable = true)
 |-- Product_name: string (nullable = true)

Created_datetime 如下所示

+-------------------+
|   Created_datetime|
+-------------------+
|2019-10-12 17:09:18|
|2019-12-03 07:02:07|
|2020-01-16 23:10:08|

现在我想提取 Created_datetime 列。如何做到这一点?

mqkwyuun

mqkwyuun1#

当你计算 timestamp 列,它会给你平均值 unix timestamp (long) 价值观。把它扔回一个 timestamp :

from pyspark.sql.functions import *
from pyspark.sql import functions as F

df.agg(F.avg("Created_datetime").cast("timestamp").alias("avg_created_datetime")).show()
+--------------------+                                                          
|avg_created_datetime|
+--------------------+
| 2019-11-30 23:27:11|
+--------------------+

相关问题