data = status.select("data")
df = data.withColumn("addr", col("data.addr")) \
.withColumn("time", col('data.time'))\
我要替换或转换“时间”列为秒(bigint)。例如)13天,23:41>>>>>13x3600x24+23x3600+41x60>>>1208460
data = status.select("data")
df = data.withColumn("addr", col("data.addr")) \
.withColumn("time", col('data.time'))\
我要替换或转换“时间”列为秒(bigint)。例如)13天,23:41>>>>>13x3600x24+23x3600+41x60>>>1208460
1条答案
按热度按时间kwvwclae1#
使用
split
然后将天数乘以24*3600…等。Example:
```df.show()
+-------------+
| time|
+-------------+
|13days, 23:41|
|12days, 22:52|
+-------------+
from pyspark.sql.functions import *
df.withColumn("tmp",split(col("time"),"days,")[0]).
withColumn("tmp1",split(trim(split(col("time"),"days,")[1]),":")).
withColumn("time",(col("tmp")360024 + col("tmp1")[0]*3600 + col("tmp1")[1]60).cast("long")).
drop(['tmp','tmp1']).
show()
+-------+
| time|
+-------+
|1208460|
|1119120|
+-------+