pandas 无法将字符串和日期转换为所有日期

tquggr8v  于 2023-10-14  发布在  其他
关注(0)|答案(1)|浏览(119)

我现在正在从impala导入数据到python。因此,我列出了表的名称及其在Impala中的最后更新,Pandas Dataframe如下:

df_impala = pd.DataFrame(columns = ['field', 'table_name', 'lastupdate', 'c_row', 'update'])
for key, value in keys_impala.items():
    for val in value:
        table = spark.sql("""SELECT '{}' AS field, '{}' AS table_name,
                             MAX(CAST(importdate  AS DATE)) AS lastupdate, 
                             COUNT(*) AS c_row, '{}' AS update  FROM datalake.{}
                             WHERE CAST(importdate  AS DATE) = 
                             (SELECT MAX(CAST(importdate  AS DATE)) FROM datalake.{})""".format(key, val, keys_impala[key][val],
                                                                                                val, val)).toPandas()
        
        df_impala = pd.concat([df_impala, table]).reset_index(drop = True)
        print(val)

但不是每个last_update列都是datetime格式的,也有这样格式的字符串:2023年10月9日02:10:16带有此字符串last_update column的表return None当用DATE()转换时,它不会被转换。
然后我已经尝试了这个转换:

SELECT MAX(TO_DATE(last_update, 'dd/M/yyyy') FROM X

它处理字符串类型,但对于日期时间列格式,它返回None
是否有其他转换语法的解决方案?

gev0vcfq

gev0vcfq1#

当您在单个列中有多个日期格式时。最好使用nvlnvl2case when函数来捕获日期格式问题并应用正确的格式。像下面的东西。

scala> df.show(false)
+-------------------+
|dt                 |
+-------------------+
|09/10/2023 02:10:16|
|2023-10-09         |
+-------------------+
scala> df.withColumn("last_updated", expr("nvl2(to_date(dt, 'dd/MM/yyyy hh:mm:ss'),to_date(dt, 'dd/MM/yyyy hh:mm:ss'), CAST(dt AS date))")).show(false)
+-------------------+------------+
|dt                 |last_updated|
+-------------------+------------+
|09/10/2023 02:10:16|2023-10-09  |
|2023-10-09         |2023-10-09  |
+-------------------+------------+
scala> df.withColumn("last_updated", when(to_date($"dt","dd/MM/yyyy hh:mm:ss").isNotNull, to_date($"dt","dd/MM/yyyy hh:mm:ss")).otherwise($"dt".cast("date"))).show(false)
+-------------------+------------+
|dt                 |last_updated|
+-------------------+------------+
|09/10/2023 02:10:16|2023-10-09  |
|2023-10-09         |2023-10-09  |
+-------------------+------------+

相关问题