当我尝试过滤datetime对象的dataframe时,我总是收到以下错误:TypeError: Invalid comparison between dtype=datetime64[ns, pytz.FixedOffset(120)] and DatetimeArray
。这是我第一次遇到这个错误,而我以前不得不使用这种过滤。
这是我的代码(包括转换datetime对象的步骤):
# Create a date column for merging and set date formats
df_fpu["datetime"] = pd.to_datetime(df_fpu["datetime"])
df_fpu["date"] = df_fpu["datetime"].dt.date
df_fpu["date"] = pd.to_datetime(df_fpu["date"])
# Set to the correct time zone
df_fpu.datetime.astype('datetime64[ns]')
df_fpu.date.astype('datetime64[ns]')
# Also make sure to have the correct time format in df_events (already in correct tz)
df_events["start_datetime"] = pd.to_datetime(df_events["start_datetime"])
df_events["end_datetime"] = pd.to_datetime(df_events["end_datetime"])
df_events["date"] = pd.to_datetime(df_events["date"])
# Merge df_fpu on df_events
df_merged = pd.merge(df_events, df_fpu, on=["stand", "date"], how="inner")
# We want to filter this dataframe on the start and end times of each flights to remove incorrect records
df_fpu = df_merged[(df_merged["datetime"] >= df_merged["start_datetime"]) & (df_merged["datetime"] <= df_merged["end_datetime"])]
有人知道为什么会出现这个错误吗?我得到了10行样本数据:
| 开始日期时间|结束日期时间|支架|日期|日期时间|数字信号|
| --------------|--------------|--------------|--------------|--------------|--------------|
| 2022-04-28T20:53:14.000+0000|2022-04-28T22:27:02.000+0000|公司简介|2022-04-28T00:00:00.000+0000|2022-04-28T05:33:00.000+0000|五|
| 2022-04-28T20:53:14.000+0000|2022-04-28T22:27:02.000+0000|公司简介|2022-04-28T00:00:00.000+0000|2022-04-28T05:34:00.000+0000|八|
| 2022-04-28T20:53:14.000+0000|2022-04-28T22:27:02.000+0000|公司简介|2022-04-28T00:00:00.000+0000|2022-04-28T05:35:00.000+0000|10个|
| 2022-04-28T20:53:14.000+0000|2022-04-28T22:27:02.000+0000|公司简介|2022-04-28T00:00:00.000+0000|2022-04-28T05:36:00.000+0000|十七岁|
| 2022-04-28T20:53:14.000+0000|2022-04-28T22:27:02.000+0000|公司简介|2022-04-28T00:00:00.000+0000|2022-04-28T05:37:00.000+0000|十九岁|
| 2022-04-28T20:53:14.000+0000|2022-04-28T22:27:02.000+0000|公司简介|2022-04-28T00:00:00.000+0000|2022-04-28T05:38:00.000+0000|三十|
| 2022-04-28T20:53:14.000+0000|2022-04-28T22:27:02.000+0000|公司简介|2022-04-28T00:00:00.000+0000|2022-04-28T05:39:00.000+0000|二十三|
| 2022-04-28T20:53:14.000+0000|2022-04-28T22:27:02.000+0000|公司简介|2022-04-28T00:00:00.000+0000|2022-04-28T05:40:00.000+0000|二十三|
| 2022-04-28T20:53:14.000+0000|2022-04-28T22:27:02.000+0000|公司简介|2022-04-28T00:00:00.000+0000|2022-04-28T05:41:00.000+0000|二十四|
| 2022-04-28T20:53:14.000+0000|2022-04-28T22:27:02.000+0000|公司简介|2022-04-28T00:00:00.000+0000|2022-04-28T05:42:00.000+0000|二十四|
1条答案
按热度按时间dgjrabp21#
请检查datetime对象的形状:
如果数据类型和时区一致,则可能是datetime对象的形状导致了错误。通过检查datetime、start_datetime和end_datetime列的shape属性,确保它们具有相同的shape。如果它们不存在(并且在前10行中不可见),则可能需要在执行比较之前重新调整一个或多个列的形状。对于您提供的示例,看起来您的代码正在工作(假设此表对应于df_merged)