Pandas数据操作,根据同一列的其他行计算列值

am46iovg  于 2023-06-20  发布在  其他
关注(0)|答案(3)|浏览(99)

我希望在pandas dataframe中进行如下数据操作:

a = {'idx': range(8),
     'col': [47,33,23,33,32,31,22,5],
     }

df = pd.DataFrame(a)
print(df)

idx col
0   47
1   33
2   23
3   33
4   32
5   31
6   22
7   5

我想要的输出是:

idx col desired
0   47  14
1   33  10
2   23  -10
3   33  1
4   32  1
5   31  9
6   22  17
7   5   5

计算如下。

cidc1ykv

cidc1ykv1#

IIUC,您需要反转difffillna

df['desired'] = df['col'].diff(-1).fillna(df['col'])

输出:

idx  col  desired
0    0   47     14.0
1    1   33     10.0
2    2   23    -10.0
3    3   33      1.0
4    4   32      1.0
5    5   31      9.0
6    6   22     17.0
7    7    5      5.0
ssm49v7z

ssm49v7z2#

a = {'idx': range(8),
     'col': [47,33,23,33,32,31,22,5],
     }

df = pd.DataFrame(a)

df['col'] - df['col'].shift(-1, fill_value=0)

0    14
1    10
2   -10
3     1
4     1
5     9
6    17
7     5
Name: col, dtype: int64
df['desired'] = df['col'] - df['col'].shift(-1, fill_value=0)

   idx  col  desired
0    0   47       14
1    1   33       10
2    2   23      -10
3    3   33        1
4    4   32        1
5    5   31        9
6    6   22       17
7    7    5        5
kpbwa7wx

kpbwa7wx3#

与使用numpy的@mozway相同的解决方案(更快):

import numpy as np

df['desired'] = -np.diff(df['col'], append=0)

输出:

>>> df
   idx  col  desired
0    0   47       14
1    1   33       10
2    2   23      -10
3    3   33        1
4    4   32        1
5    5   31        9
6    6   22       17
7    7    5        5

对于10 k条记录:

# @mozway
>>> %timeit df['col'].diff(-1).fillna(df['col'])
281 µs ± 14 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

# @GodIsOne
>>> %timeit df['col'] - df['col'].shift(-1, fill_value=0)
144 µs ± 4.18 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

# @Corralien
>>> %timeit (-np.diff(df['col'], append=0))
32.7 µs ± 951 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

相关问题