pandas TypeError：使用np.where()比较字符串列时NA的布尔值不明确

jslywgbw 于 2023-04-18 发布在其他

关注(0)|答案(2)|浏览(130)

我将一个feather导入到pandas dataframe中，然后使用np.where（）比较两个字符串列。然而，我得到了以下错误：TypeError: boolean value of NA is ambiguous。MRE如下：

import pandas as pd
import numpy as np

d = {"col1": [np.NaN, "x"], "col2": [np.NaN, "x"]}
df = pd.DataFrame(data=d)
df.to_feather("test.feather")
df_f = pd.read_feather("test.feather")
df_f["col1"] = df_f["col1"].astype("string")
df_f["col2"] = df_f["col2"].astype("string")
df_f["is_equal"] = np.where(df_f["col1"] == df["col2"], 1, 0)

我必须手动将两列格式化为字符串，以便在导入时复制实际 Dataframe 的格式。
我已经阅读了这个错误，它与将列转换为字符串时创建的pd.NA值有关。
我尝试将这些值转换为np.NaN，如here：

df_f["col1"].replace({pd.NA: np.NaN}, inplace=True)
df_f["col2"].replace({pd.NA: np.NaN}, inplace=True)

但我得到了同样的错误。
我试着按照here将列转换为浮点数：

df_f["col1"] = df_f["col1"].astype("float")
df_f["col2"] = df_f["col2"].astype("float")

但我得到了ValueError: could not convert string to float。
有没有人有任何建议，我可以如何解决这个问题？

pandas

来源：https://stackoverflow.com/questions/76040055/typeerror-boolean-value-of-na-is-ambiguous-when-using-np-where-to-compare-str

2条答案

按热度按时间

hmae6n7t1#

对于原始的object dtype，与NA或None值进行比较将强制结果为False。

$ df_f["col1"] == df["col2"]

0    False
1     True
dtype: bool

当您将列转换为string dtype时，与NA或None值进行比较不会将结果强制为False。

$ df_f["col1"] = df_f["col1"].astype("string")
$ df_f["col1"] == df["col2"]

0    <NA>
1    True
dtype: boolean

对于np.where，它接受一个布尔值数组，因此它不理解NA是什么，并抛出TypeError。
对于字符串列，object dtype是标准类型，您不需要将其转换为字符串dtype。

df_f["is_equal"] = np.where(df_f["col1"].eq(df["col2"]).fillna(False), 1, 0)
# or
df_f["is_equal"] = df_f["col1"].eq(df["col2"]).fillna(False).astype(int)

赞(0）回复(0）举报 2023-04-18

flvtvl502#

如果我只是创建一个普通的df（不使用feather），用pd.NA代替原始df中的np.NaN值，np.where可以正常工作。
但是，如果我首先使用astype("string")更新每个列，将列的dtypes从object更改为string，则会出现错误。
因此，我认为astype("string")是罪魁祸首，我将尝试消除调用此方法的赋值。

赞(0）回复(0）举报 2023-04-18

我来回答

pandas TypeError：使用np.where()比较字符串列时NA的布尔值不明确

2条答案

相关问题

热门标签

最新问答