我正在尝试将字符串列转换为时间戳列,格式如下:
c1c22019-12-10 10:07:54.0002019-12-13 10:07:54.0002020-06-08 15:14:49.0002020-06-18 10:07:54.000
from pyspark.sql.functions import col, udf, to_timestamp
joined_df.select(to_timestamp(joined_df.c1, '%Y-%m-%d %H:%M:%S.%SSSS').alias('dt')).collect()
joined_df.select(to_timestamp(joined_df.c2, '%Y-%m-%d %H:%M:%S.%SSSS').alias('dt')).collect()
当日期改变时,我想通过减去c2-c1得到一个新的列日期差
在python中,我正在这样做:
df['c1'] = df['c1'].fillna('0000-01-01').apply(lambda x: datetime.strptime(x, '%Y-%m-%d %H:%M:%S.%f'))
df['c2'] = df['c2'].fillna('0000-01-01').apply(lambda x: datetime.strptime(x, '%Y-%m-%d %H:%M:%S.%f'))
df['days'] = (df['c2'] - df['c1']).apply(lambda x: x.days)
有人能帮我转换成Pypark吗?
1条答案
按热度按时间2g32fytz1#
如果你想得到日期差异,你可以使用
datediff
: