如何从Pysppark中的字符串列中提取年和月?

fhity93d  于 2022-10-07  发布在  Spark
关注(0)|答案(1)|浏览(228)

因此,我有一个看起来像时间戳的列,但它实际上是一个字符串列。此列如下所示:2022-04-01T00:00:00.000+0000。然而,我尝试了几种方法,它们都不起作用。我试过这个:

.withColumn("year", year(to_date(col("full_time"),"yyyy-MM-dd")))``.withColumn("year", to_date(col("cycle.start_time"),"yyyy"))

这些都不管用,所以现在我不知道我还能做什么。你们能帮我个忙吗?

pkln4tw6

pkln4tw61#

DF

+----------------------------+----+
|date                        |val |
+----------------------------+----+
|2022-04-01T00:00:00.000+0000|24.0|
+----------------------------+----+

from pyspark.sql.functions import to_timestamp

# spark.sql("set spark.sql.legacy.timeParserPolicy=LEGACY")

df = (df4.withColumn("date",to_timestamp('date', 'yyyy-MM-dd'))#Coaerce to datettime
      .withColumn("month",month('date'))#extract month
      .withColumn("year",year('date'))#extract year

      )
df.show(truncate=False)

结果

+-------------------+----+-----+----+
|date               |val |month|year|
+-------------------+----+-----+----+
|2022-04-01 00:00:00|24.0|4    |2022|
+-------------------+----+-----+----+

相关问题