pandas 基于另一列的值创建列

iyzzxitl  于 2023-10-14  发布在  其他
关注(0)|答案(1)|浏览(87)

请考虑这个框架:

import pandas as pd
import numpy as np

values = [0, 22, 30, 0, 20, 22, 11, 0, 13]
index = pd.date_range(start = '2023-10-1', periods = len(values))

df = pd.DataFrame({'values':values }, index = index)

df
           values
2023-10-01  0
2023-10-02  22
2023-10-03  30
2023-10-04  0
2023-10-05  20
2023-10-06  22
2023-10-07  11
2023-10-08  0
2023-10-09  13

**目标:**创建一个新列,计算values中从最后一个0开始已经过去了多少天。

我可以使用for循环来实现:

zero_indices = df[df['values'] == 0].index
df['days'] = np.nan

for i in range(len(zero_indices)-1):
    df['days'][zero_indices[i]: zero_indices[i+1]] = range(len(df[zero_indices[i]: zero_indices[i+1]]))
df['days'][zero_indices[-1]: ] = range(len(df[zero_indices[-1]: ]))

           values   days
2023-10-01  0   0.00
2023-10-02  22  1.00
2023-10-03  30  2.00
2023-10-04  0   0.00
2023-10-05  20  1.00
2023-10-06  22  2.00
2023-10-07  11  3.00
2023-10-08  0   0.00
2023-10-09  13  1.00

问题:如何使用矢量化(更快)来实现这一点?

h6my8fg2

h6my8fg21#

有很多方法可以做到这一点,其中一个解决方案是使用groupbycumcount

df['temp'] = (df.values == 0).cumsum()
df.groupby(['temp']).cumcount() # this just gives the cumulative count since the last 0 value

输出量:

2023-10-01    0
2023-10-02    1
2023-10-03    2
2023-10-04    0
2023-10-05    1
2023-10-06    2
2023-10-07    3
2023-10-08    0
2023-10-09    1
Freq: D, dtype: int64

相关问题