请考虑这个框架:
import pandas as pd
import numpy as np
values = [0, 22, 30, 0, 20, 22, 11, 0, 13]
index = pd.date_range(start = '2023-10-1', periods = len(values))
df = pd.DataFrame({'values':values }, index = index)
df
values
2023-10-01 0
2023-10-02 22
2023-10-03 30
2023-10-04 0
2023-10-05 20
2023-10-06 22
2023-10-07 11
2023-10-08 0
2023-10-09 13
**目标:**创建一个新列,计算values
中从最后一个0开始已经过去了多少天。
我可以使用for循环来实现:
zero_indices = df[df['values'] == 0].index
df['days'] = np.nan
for i in range(len(zero_indices)-1):
df['days'][zero_indices[i]: zero_indices[i+1]] = range(len(df[zero_indices[i]: zero_indices[i+1]]))
df['days'][zero_indices[-1]: ] = range(len(df[zero_indices[-1]: ]))
values days
2023-10-01 0 0.00
2023-10-02 22 1.00
2023-10-03 30 2.00
2023-10-04 0 0.00
2023-10-05 20 1.00
2023-10-06 22 2.00
2023-10-07 11 3.00
2023-10-08 0 0.00
2023-10-09 13 1.00
问题:如何使用矢量化(更快)来实现这一点?
1条答案
按热度按时间h6my8fg21#
有很多方法可以做到这一点,其中一个解决方案是使用
groupby
和cumcount
:输出量: