pandas.diff的轴向不一致性

ddrv8njm 于 2023-04-19 发布在其他

关注(0)|答案(2)|浏览(137)

考虑dataframe：

df = pd.DataFrame({'col': [True, False]})

下面的代码可以工作：

df['col'].diff()

结果是：

0     NaN
1    True
Name: col, dtype: object

但是，代码：

df.T.diff(axis=1)

给出错误：

numpy boolean subtract, the `-` operator, is not supported, use the bitwise_xor, the `^` operator, or the logical_xor function instead.

那是窃听器吗

pandas

来源：https://stackoverflow.com/questions/75979562/axial-inconsistency-of-pandas-diff

2条答案

按热度按时间

ctrmrzij1#

您看到的行为似乎与docs不一致，docs明确指出：
对于布尔型数据类型，使用operator.xor（）而不是operator.sub（）。结果根据DataFrame中的当前数据类型计算，但结果的数据类型始终为float 64。
下面的测试也很有趣：

df = pd.DataFrame({'col': [True, False], 'col2': [True, False]})

print("","df:",sep='\n')
print(df,df.dtypes,sep='\n')

print("","diff of df:",sep='\n')
res = df.diff()
print(res,res.dtypes,sep='\n')

print("","diff of df['col']:",sep='\n')
res = df['col'].diff()
print(res,res.dtypes,sep='\n')

print("","df.T:",sep='\n')
res = df.T
print(res,res.dtypes,sep='\n')

print("","diff(axis=0) of df.T:",sep='\n')
res = df.T.diff(axis=0)
print(res,res.dtypes,sep='\n')

print("","df.T.astype(object):",sep='\n')
res = df.T.astype(object)
print(res,res.dtypes,sep='\n')

print("","diff(axis=1) of df.T.astype(object):",sep='\n')
res = df.T.astype(object).diff(axis=1)
print(res,res.dtypes,sep='\n')

try:
    print("","diff(axis=1) of df.T:",sep='\n')
    res = df.T.diff(axis=1)
    print(res,res.dtypes,sep='\n')
except TypeError:
    print('got TypeError')

输出：

df:
     col   col2
0   True   True
1  False  False
col     bool
col2    bool
dtype: object

diff of df:
    col  col2
0   NaN   NaN
1  True  True
col     object
col2    object
dtype: object

diff of df['col']:
0     NaN
1    True
Name: col, dtype: object
object

df.T:
         0      1
col   True  False
col2  True  False
0    bool
1    bool
dtype: object

diff(axis=0) of df.T:
          0      1
col     NaN    NaN
col2  False  False
0    object
1    object
dtype: object

df.T.astype(object):
         0      1
col   True  False
col2  True  False
0    object
1    object
dtype: object

diff(axis=1) of df.T.astype(object):
        0   1
col   NaN  -1
col2  NaN  -1
0    object
1    object
dtype: object

diff(axis=1) of df.T:
got TypeError

如果我们在调用diff(axis=1)之前使用astype()将列类型更改为object，则不会引发错误，并且结果显示在使用整数减法执行diff之前将布尔值转换为int。
然而，正如OP所指出的，这个没有**astype(object)的相同操作会引发TypeError TypeError: numpy boolean subtract, the-operator, is not supported, use the bitwise_xor, the ^operator, or the logical_xor function instead.，尽管在diff()文档中声明For boolean dtypes, this uses operator.xor() rather than operator.sub()。

赞(0）回复(0）举报 2023-04-19

vuv7lop32#

看起来这种行为是故意的，就像GH15856一样。NumPy中布尔数组之间的算术运算(+, -, *, /, etc.)不受支持（* 还是不再支持？）。
在axis=1上使用diff，pandas试图计算沿着columns轴的连续元素之间的差异（由于换位，这里恰好包含布尔值 *），并且由于NumPy在后台运行以计算，因此引发了TypeError。

print(df.T)

        0      1
col  True  False

np.array(False) - np.array(True)

TypeError: numpy boolean subtract, the `-` operator, is not supported, use the bitwise_xor, the `^` operator, or the logical_xor function instead.

这可能是违反直觉的，因为当使用Python boolean时，相同的操作会成功：

False - True
# return -1

但@seberg 解释了原因：
这是一个非常古老的反对意见，尽管我似乎记得一些关于只反对一元运算符的讨论-False而不是True - True。请注意，Python的布尔值与NumPy的不同，它们实际上是整数。NumPy布尔值的行为不像整数，如果你添加两个布尔值，你会再次得到一个布尔值，等等。

赞(0）回复(0）举报 2023-04-19

我来回答

pandas.diff的轴向不一致性

2条答案

相关问题

热门标签

最新问答