Pandas -在分类数据中填充NaN

1rhkuytd 于 2023-09-29 发布在其他

关注(0)|答案(7)|浏览(113)

我尝试使用以下代码填充缺失值（NAN）

NAN_SUBSTITUTION_VALUE = 1
g = g.fillna(NAN_SUBSTITUTION_VALUE)

但我得到以下错误

ValueError: fill value must be in categories.

有没有人能解释一下这个错误。

pandas

来源：https://stackoverflow.com/questions/32718639/pandas-filling-nans-in-categorical-data

7条答案

按热度按时间

h22fl7wq1#

你的问题忽略了g是什么，特别是它有dtype categorical。我假设它是这样的：

g = pd.Series(["A", "B", "C", np.nan], dtype="category")

您遇到的问题是fillna需要一个已经作为类别存在的值。例如，g.fillna("A")可以工作，但g.fillna("D")失败。要使用新值填充序列，您可以执行以下操作：

g_without_nan = g.cat.add_categories("D").fillna("D")

赞(0）回复(0）举报 2023-09-29

sqougxex2#

在填充之前添加类别：

g = g.cat.add_categories([1])
g.fillna(1)

赞(0）回复(0）举报 2023-09-29

3ks5zfa03#

创建 * 分类数据 * 后，您只能在类别中插入值。

>>> df
    ID  value
0    0     20
1    1     43
2    2     45

>>> df["cat"] = df["value"].astype("category")
>>> df
    ID  value    cat
0    0     20     20
1    1     43     43
2    2     45     45

>>> df.loc[1, "cat"] = np.nan
>>> df
    ID  value    cat
0    0     20     20
1    1     43    NaN
2    2     45     45

>>> df.fillna(1)
ValueError: fill value must be in categories
>>> df.fillna(43)
    ID  value    cat
0    0     20     20
1    1     43     43
2    2     45     45

赞(0）回复(0）举报 2023-09-29

8ftvxx2r4#

正如许多人之前所说，这个错误来自于这个特性的类型是“类别”。
我建议先将其转换为字符串，然后使用fillna，最后如果需要的话再将其转换回类别。

g = g.astype('string')
g = g.fillna(NAN_SUBSTITUTION_VALUE)
g = g.astype('category')

赞(0）回复(0）举报 2023-09-29

alen0pnh5#

有时你可能想用数据集中的值替换NaN，你可以使用它：

#creates a random permuation of the categorical values
permutation = np.random.permutation(df[field])

#erase the empty values
empty_is = np.where(permutation == "")
permutation = np.delete(permutation, empty_is)

#replace all empty values of the dataframe[field]
end = len(permutation)
df[field] = df[field].apply(lambda x: permutation[np.random.randint(end)] if pd.isnull(x) else x)

它的工作效率很高。

赞(0）回复(0）举报 2023-09-29

yjghlzjz6#

深刻的理解是因为：
范畴只能接受有限的，通常是固定数量的可能值（范畴）。与统计分类变量不同，分类变量可能有顺序，但不可能进行数值运算（加法、除法等）。
Categorical的所有值都在categories或np.nan中。在类别之外赋值将引发ValueError。顺序是由类别的顺序定义的，而不是值的词法顺序。
https://pandas.pydata.org/docs/reference/api/pandas.Categorical.html

赞(0）回复(0）举报 2023-09-29

xlpyo6sf7#

我遇到这个问题时，试图填补失踪的价值观，从另一个分类系列。为此，在目标系列中设置缺失值是不够的。两个系列必须共享相同的类别：

one = pd.Series(["A", "B", "C", None], dtype="category")
two = pd.Series(["A", "C", "B", "D"], dtype="category")
combined_categories = pd.concat([one.dropna(), two.dropna()]).unique()
one = one.cat.set_categories(combined_categories)
two = two.cat.set_categories(combined_categories)

result = one.fillna(two)

赞(0）回复(0）举报 2023-09-29

我来回答

Pandas -在分类数据中填充NaN

7条答案

相关问题

热门标签

最新问答