Paddle nansum API Inconsistent Behavior with Integer dtypes Containing NaN-like Values

u7up0aaq  于 4个月前  发布在  其他
关注(0)|答案(4)|浏览(52)

bug描述 Describe the Bug

The paddle.nansum function currently does not handle integer tensors containing NaN-like special values correctly. The issue arises when attempting to use nansum on an integer tensor that has values mimicking NaN (e.g., integer underflow/overflow values).

Also, a point to be take a note is that other framework like Torch who have a native nansum API don't support integer dtype.

import paddle

# Paddle Version: 2.5.1 (cpu)
dtypes = [paddle.float16,paddle.float32,paddle.float64,paddle.int32,paddle.int64,]
for dtype in dtypes:
    try:
        x = paddle.to_tensor([[1, 1], [1, float("nan")]], dtype=dtype)
        y = paddle.nansum(x, axis=0)

        print(f"result:{y}; yDtype:{y.dtype}supportedD:{dtype}")
    except Exception as e:
        print(f"unsupportedD:{dtype}")
        
"""
# PRINTS
unsupportedD:paddle.float16
result:Tensor(shape=[2], dtype=float32, place=Place(cpu), stop_gradient=True,
[2., 1.]); yDtype:paddle.float32supportedD:paddle.float32
result:Tensor(shape=[2], dtype=float64, place=Place(cpu), stop_gradient=True,
[2., 1.]); yDtype:paddle.float64supportedD:paddle.float64
result:Tensor(shape=[2], dtype=int64, place=Place(cpu), stop_gradient=True,
[ 2         , -2147483647]); yDtype:paddle.int64supportedD:paddle.int32
result:Tensor(shape=[2], dtype=int64, place=Place(cpu), stop_gradient=True,
[ 2                  , -9223372036854775807]); yDtype:paddle.int64supportedD:paddle.int64
"""

其他补充信息 Additional Supplementary Information

In-case, we want to continue supporting integer dtypes in paddle nansum API then we can do the following modification:

def nansum(x, axis=None, dtype=None, keepdim=False, name=None):
    zero_tensor = paddle.zeros_like(x)
    
    if 'int' in str(x.dtype):
        min_val = paddle.min(x)
        tmp_tensor = paddle.where(x == min_val, zero_tensor, x)
    else:
        tmp_tensor = paddle.where(paddle.isnan(x), zero_tensor, x)
    
    return paddle.sum(tmp_tensor, axis, dtype, keepdim, name)

Now, when we run the same testcases as above:

dtypes = [paddle.float32,paddle.float64,paddle.int32,paddle.int64]
for dtype in dtypes:
    try:
        x = paddle.to_tensor([[1, 1], [1, float("nan")]], dtype=dtype)
        y = nansum(x, axis=0)
        print(f"result: {y}; yDtype: {y.dtype}; supportedD: {dtype}")
    except Exception as e:
        print(f"unsupportedD: {dtype}, Error: {e}")
 
   """
# PRINTS:
result: Tensor(shape=[2], dtype=float32, place=Place(cpu), stop_gradient=True,
[2., 1.]); yDtype: paddle.float32; supportedD: paddle.float32
result: Tensor(shape=[2], dtype=float64, place=Place(cpu), stop_gradient=True,
[2., 1.]); yDtype: paddle.float64; supportedD: paddle.float64
result: Tensor(shape=[2], dtype=int64, place=Place(cpu), stop_gradient=True,
[2, 1]); yDtype: paddle.int64; supportedD: paddle.int32
result: Tensor(shape=[2], dtype=int64, place=Place(cpu), stop_gradient=True,
[2, 1]); yDtype: paddle.int64; supportedD: paddle.int64
"""
# testing example given in the docs:
 dtypes = [paddle.float32,paddle.float64,paddle.int32,paddle.int64]
for dtype in dtypes:
    try:
        x = paddle.to_tensor([[[1, float('nan')], [3, 4]],
                            [[5, 6], [float('-nan'), 8]]], dtype=dtype)
        y = nansum(x, axis=[1, 2])
        print(f"result: {y}; yDtype: {y.dtype}; supportedD: {dtype}")
    except Exception as e:
        print(f"unsupportedD: {dtype}, Error: {e}")

a = paddle.to_tensor([[[1, float('nan')], [3, 4]],
                    [[5, 6], [float('-nan'), 8]]], dtype=paddle.int32)
b = paddle.nansum(x, axis=[1, 2])
print(f"paddle current nansum:{b}")
""
result: Tensor(shape=[2], dtype=float32, place=Place(cpu), stop_gradient=True,
       [8. , 19.]); yDtype: paddle.float32; supportedD: paddle.float32
result: Tensor(shape=[2], dtype=float64, place=Place(cpu), stop_gradient=True,
       [8. , 19.]); yDtype: paddle.float64; supportedD: paddle.float64
result: Tensor(shape=[2], dtype=int64, place=Place(cpu), stop_gradient=True,
       [8 , 19]); yDtype: paddle.int64; supportedD: paddle.int32
result: Tensor(shape=[2], dtype=int64, place=Place(cpu), stop_gradient=True,
       [8 , 19]); yDtype: paddle.int64; supportedD: paddle.int64

# Paddle Current nansum 
paddle current nansum:Tensor(shape=[2], dtype=int64, place=Place(cpu), stop_gradient=True,
       [-9223372036854775800, -9223372036854775789])
"""
7uhlpewt

7uhlpewt1#

Thank you. NaN is often a floating-point number. We need to discuss whether this API should support integer types.

nnsrf1az

nnsrf1az2#

Thank you. NaN is often a floating-point number. We need to discuss whether this API should support integer types.

yeah, I found out that torch.nansum didn't support when we pass dtype=torch,.int32 while np.nansum does

9lowa7mx

9lowa7mx3#

Hey @zhangting2020 can you help me setup my paddle dev env. Basically what command to run after I am done with the last step mentioned at Compiling from Source Using Docker (Linux) :

12. Install the compiled .whl package on the current machine or target machine:

For Python3:

pip3.7 install -U [whl package name]

Note: We used Python3.7 command as an example above, if the version of your Python is 3.6/3.8/3.9, please change pip3.7 in the commands to pip3.6/pip3.8/pip3.9.
Congratulations, now that you have successfully installed PaddlePaddle using Docker, you only need to run PaddlePaddle after entering the Docker container. For more Docker usage, please refer to the [official Docker documentation](https://docs.docker.com/).

what command to run after this step ?

yfwxisqw

yfwxisqw4#

Hey @zhangting2020 can you help me setup my paddle dev env. Basically what command to run after I am done with the last step mentioned at Compiling from Source Using Docker (Linux) :

12. Install the compiled .whl package on the current machine or target machine:

For Python3:

pip3.7 install -U [whl package name]

Note: We used Python3.7 command as an example above, if the version of your Python is 3.6/3.8/3.9, please change pip3.7 in the commands to pip3.6/pip3.8/pip3.9.
Congratulations, now that you have successfully installed PaddlePaddle using Docker, you only need to run PaddlePaddle after entering the Docker container. For more Docker usage, please refer to the [official Docker documentation](https://docs.docker.com/).

what command to run after this step ?

After installing .whl package, you can use the following command to verify if the installation was successful.

import paddle
paddle.utils.run_check()

相关问题