如何用spark函数的column替换或转换?

quhf5bfb  于 2021-05-17  发布在  Spark
关注(0)|答案(1)|浏览(658)
data = status.select("data")
df = data.withColumn("addr", col("data.addr")) \
     .withColumn("time", col('data.time'))\


我要替换或转换“时间”列为秒(bigint)。例如)13天,23:41>>>>>13x3600x24+23x3600+41x60>>>1208460

kwvwclae

kwvwclae1#

使用 split 然后将天数乘以24*3600…等。 Example: ```
df.show()

+-------------+

| time|

+-------------+

|13days, 23:41|

|12days, 22:52|

+-------------+

from pyspark.sql.functions import *

df.withColumn("tmp",split(col("time"),"days,")[0]).
withColumn("tmp1",split(trim(split(col("time"),"days,")[1]),":")).
withColumn("time",(col("tmp")360024 + col("tmp1")[0]*3600 + col("tmp1")[1]60).cast("long")).
drop(
['tmp','tmp1']).
show()

+-------+

| time|

+-------+

|1208460|

|1119120|

+-------+

相关问题