numpy Pandas安全地将一个系列划分为另一个系列

hs1ihplo  于 2023-03-30  发布在  其他
关注(0)|答案(3)|浏览(143)

我有一个函数,它生成一系列随机值,我想除以其中的两个序列,并将division by 0(inf)替换为0。
下面是创建序列的函数:

def _draw_random_values(means: pd.Series,
                        standard_deviations: pd.Series, n: int = 10) -> pd.Series:
    return pd.Series([np.random.normal(mean, error, n)
                      for mean, error in zip(means, standard_deviations)])

以下是该系列的外观:

series1
0    [10.326329680446323, 10.341377563809141, 10.69...
1    [18.455738795462082, 20.24284540291898, 16.980...
2    [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
dtype: object
series2[0][0] = 0
series2
0    [0.0, -1.4639471828693384, 18.085228130080917,...
1    [3.503465289188653, 7.2015882291641535, 13.146...
2    [7.520563427232638, 8.47603656244819, 14.34839...
dtype: object

将两个系列分开工作很好:

series1.divide(series2)
0    [inf, -7.064037340158698, 0.5916429145326823, ...
1    [5.267852617925077, 2.810886259914426, 1.29171...
2    [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...

但是当我尝试替换inf时,我得到一个错误:

series1.divide(series2).replace(np.inf, 0)

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/kmaguire/source/platf0rm-api/env/lib/python3.11/site-packages/pandas/util/_decorators.py", line 331, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/kmaguire/source/platf0rm-api/env/lib/python3.11/site-packages/pandas/core/series.py", line 5380, in replace
    return super().replace(
           ^^^^^^^^^^^^^^^^
  File "/home/kmaguire/source/platf0rm-api/env/lib/python3.11/site-packages/pandas/util/_decorators.py", line 331, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/kmaguire/source/platf0rm-api/env/lib/python3.11/site-packages/pandas/core/generic.py", line 7280, in replace
    new_data = self._mgr.replace(
               ^^^^^^^^^^^^^^^^^^
  File "/home/kmaguire/source/platf0rm-api/env/lib/python3.11/site-packages/pandas/core/internals/managers.py", line 467, in replace
    return self.apply(
           ^^^^^^^^^^^
  File "/home/kmaguire/source/platf0rm-api/env/lib/python3.11/site-packages/pandas/core/internals/managers.py", line 347, in apply
    applied = getattr(b, f)(**kwargs)
              ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/kmaguire/source/platf0rm-api/env/lib/python3.11/site-packages/pandas/core/internals/blocks.py", line 593, in replace
    mask = missing.mask_missing(values, to_replace)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/kmaguire/source/platf0rm-api/env/lib/python3.11/site-packages/pandas/core/missing.py", line 98, in mask_missing
    new_mask = new_mask.to_numpy(dtype=bool, na_value=False)
               ^^^^^^^^^^^^^^^^^
AttributeError: 'bool' object has no attribute 'to_numpy'

pandas系列中的类型有一些问题,但我无法解决。我尝试将pd.NA替换为np.nan,但没有任何区别,因为数据中没有pd.NA值。
pandas==1.5.1
numpy==1.23.4

sdnqo3pr

sdnqo3pr1#

这对我有用

import pandas as pd
import numpy as np

series1 = pd.Series([0.1, 0.2, 0.3])
series2 = pd.Series([np.nan, 0.2, 0.3])

series1.divide(series2).replace(np.nan, 0)

输出为

0    0.0
1    1.0
2    1.0
dtype: float64

'1.21.6'(numpy)
'1.3.5'(pandas)
然而,你使用一个Series来存储一个值列表(也就是说,你的类型是“object”)。然而,你试图替换一个不包含nan值本身的Series中的nan值,但是包含了包含nan值的列表。为什么你不使用Dataframe来存储表格数据,而要使用列表的Series呢?

col17t5w

col17t5w2#

您可以将Series转换为2d numpy数组,然后替换inf,如果需要,转换回Series

series1 = pd.Series([np.array([0.1, 0.2, 0.3]),
                     np.array([0.1, 0.2, 0]),
                     np.array([0.1, 0.2, 0.3])])
series2 = pd.Series([np.array([0, 0.6, 0]),
                     np.array([0, 0.5, 0.3]),
                     np.array([5, 0.4, 0])])

arr = np.array(series1.tolist()) / np.array(series2.tolist())
print (arr)
[[       inf 0.33333333        inf]
 [       inf 0.4        0.        ]
 [0.02       0.5               inf]]

arr[~np.isfinite(arr)] = 0
print (arr)
[[0.         0.33333333 0.        ]
 [0.         0.4        0.        ]
 [0.02       0.5        0.        ]]

s = pd.Series(arr.tolist(), index=series1.index)
print (s)
0    [0.0, 0.33333333333333337, 0.0]
1                    [0.0, 0.4, 0.0]
2                   [0.02, 0.5, 0.0]

使用DataFrames的解决方案:

def _draw_random_values(means: pd.Series,
                    standard_deviations: pd.Series, n: int = 10) -> pd.Series:
    return pd.DataFrame([np.random.normal(mean, error, n)
                      for mean, error in zip(means, standard_deviations)])

df1 = _draw_random_values(ser1, ser2)
df2 = _draw_random_values(ser3, ser4)
df = df1.divide(df2).replace(np.inf, 0)
cdmah0mi

cdmah0mi3#

这对我很有效:

result = series1.divide(series2)
for arr in result:
    arr[arr == np.inf] = 0

正如klops(https://stackoverflow.com/a/75875709/4005067)建议的那样,我可能应该使用 Dataframe 。

相关问题