Pandas从1.5.3升级到2.1在df.sum()上返回密钥错误，即使密钥确实存在

gt0wga4j 于 2023-09-29 发布在其他

关注(0)|答案(1)|浏览(89)

我最近升级了Pandas1.5.3到2.1。我没有更改任何代码，下面给出的以前的df.[“Total”].sum（）以前运行良好，现在在单词“Total”上返回一个键错误，即使该键确实存在并且在1.5.3上运行良好。
感谢任何已知的问题，这方面的变化时，版本之间。
我试过检查代码在1.5.3上运行良好，只有在升级到>2.0时才失败。我已经检查了关键字“总”确实存在，拼写正确。

acount = stat_not_on_conject_latest_rev["Total"].sum() + stat_not_on_conject_not_latest_rev["Total"].sum()

stat_not_on_conject_not_latest_rev返回以下DF

Type    Total
0   Drawings    118
1   Reports 8
2   Specifications  2
3   Contractor Submittals   2

stat_not_on_conject_latest_rev返回以下DF

Type    Total
0   Drawings    65

acount正确返回值195
我已经弄明白为什么我得到关键错误。当运行完全相同的代码但使用pandas 2.1时，DF返回的列名完全不同。我不知道为什么**
stat_not_on_conject_latest_rev返回以下DF

fapcount    count
0   Drawings    65

stat_not_on_conject_not_latest_rev返回以下DF

concount    count
0   Drawings    118
1   Reports 8
2   Specifications  2
3   Contractor Submittals   2

问题似乎是Pandas 2.1没有像1.5.3那样重命名列。下面创建DF的代码调用下面给出的“表函数”;

stat_not_on_conject_latest_rev = table(df,'fapcount')
stat_not_on_conject_not_latest_rev = table(df,'concount')

def table(df,col):
    table = df[col].value_counts().to_frame()
    table = table.rename(columns={col: "Total"})
    table = table.reset_index()
    table = table.rename(columns={"index": "Type"})
    return table

df源自CSV文件导入;

df = pd.read_csv(filereads, skiprows=1)

操作上的区别似乎就在def table（）中的这一行

table = df[col].value_counts().to_frame()

在1.5.3中，它返回;

concount
Drawings    118
Reports 8
Specifications  2
Contractor Submittals   2

在2.1中，它返回，并且似乎自动添加了计数标签;

concount    count
Drawings    118
Reports 8
Specifications  2
Contractor Submittals   2

pandas

来源：https://stackoverflow.com/questions/77125804/pandas-upgrade-from-1-5-3-to-2-1-returns-key-error-on-df-sum-even-though-key-d

1条答案

按热度按时间

6ie5vjzr1#

这是由于函数value_counts_internal在2.0.0+上返回一个具有默认名称的Series（* 即 * "count"）。您可以查看GH49912以了解更多上下文/详细信息。

def value_counts_internal(
    values,
    sort: bool = True,
    ascending: bool = False,
    normalize: bool = False,
    bins=None,
    dropna: bool = True,
) -> Series:
    from pandas import (
        Index,
        Series,
    )

    index_name = getattr(values, "name", None)
    name = "proportion" if normalize else "count"
    ...

让我们考虑下面的例子：

df = pd.DataFrame({
    "fapcount": [1, 2, 2, 3, 3], "concount": [3, 4, 4, 5, 5]
})

在1.5.3上，我们得到：

>>> df["fapcount"].value_counts()

2    2
3    2
1    1
Name: fapcount, dtype: int64

而在2.0.0+上，Series使用"count"作为名称，列名将成为索引名称（* 见下文 *）：

>>> df["fapcount"].value_counts()

fapcount
2    2
3    2
1    1
Name: count, dtype: int64 # <-- look at the name here !

>>> df["fapcount"].value_counts().index
Index([2, 3, 1], dtype='int64', name='fapcount')

这就是为什么当你尝试table.rename(columns={col: "Total"})时，你要求pandas用"Total"重命名传递的列名（* 它不再存在 *），因为这永远不会发生，当你稍后在代码中尝试选择这个列时，KeyError ["Total"]会被触发。
所以，为了解决这个问题，你可以稍微调整你的函数table：

def table(df, col):
    return (
        df[col].value_counts()
            .rename_axis("Type")
            .reset_index(name="Total")
    )

赞(0）回复(0）举报 2023-09-29

我来回答

Pandas从1.5.3升级到2.1在df.sum()上返回密钥错误，即使密钥确实存在

1条答案

相关问题

热门标签

最新问答