pandas 滚动和模式函数,用于获得panda Dataframe中行的多数投票

rn0zuynd  于 2023-02-28  发布在  其他
关注(0)|答案(1)|浏览(89)

我有一个Pandas数据框:

np.random.seed(0)
df = pd.DataFrame({'Close': np.random.uniform(0, 100, size=10)})

lbound, ubound = 0, 1
change = df["Close"].diff()

df["Change"] = change
df["Result"] = np.select([ np.isclose(change, 1) | np.isclose(change, 0) | np.isclose(change, -1),
                        # The other conditions
                        (change > 0) & (change > ubound),
                        (change < 0) & (change < lbound),
                         change.between(lbound, ubound)],[0, 1, -1, 0])
Close           Change       Result
0   54.881350           NaN          0
1   71.518937         16.637586      1
2   60.276338        -11.242599     -1
3   54.488318        -5.788019      -1
4   42.365480        -12.122838     -1
5   64.589411        22.223931       1
6   43.758721       -20.830690      -1
7   89.177300        45.418579       1
8   96.366276        7.188976        1
9   38.344152        58.022124      -1

问题陈述-现在,我希望结果列中分配给索引0的索引1、2、3、4,分配给索引1的索引2、3、4、5获得多数票,以此类推,所有后续索引都获得多数票。
我试过:

df['Voting'] = df['Result'].rolling(window = 4,min_periods=1).apply(lambda x: x.mode()[0]).shift()

但是,这并没有给出我想要的结果,它取了前4个滚动窗口,并应用了模式函数。

Close          Change       Result    Voting
0   54.881350        NaN            0       NaN
1   71.518937       16.637586       1       0.0
2   60.276338      -11.242599      -1       0.0
3   54.488318      -5.788019       -1      -1.0
4   42.36548       -12.122838      -1      -1.0
5   64.589411       22.223931       1      -1.0
6   43.758721      -20.830690      -1      -1.0
7   89.177300       45.418579       1      -1.0
8   96.366276       7.188976        1      -1.0
9   38.344152      -58.022124      -1       1.0

结果I预期-应设置滚动窗口4(索引1、2、3、4)并应用模式函数,结果应分配给索引0,然后下一个滚动窗口(索引2、3、4、5)和结果应分配给索引1,依此类推。

vjrehmav

vjrehmav1#

你必须在移位1之前反转你的列表(因为你不想在结果中看到当前的索引):

majority = lambda x: 0 if len((m := x.mode())) > 1 else m[0]
df['Voting'] = (df[::-1].rolling(4, min_periods=1)['Result']
                        .apply(majority).shift())
print(df)

# Output
       Close     Change  Result  Voting
0  54.881350        NaN       0    -1.0
1  71.518937  16.637586       1    -1.0
2  60.276338 -11.242599      -1    -1.0
3  54.488318  -5.788019      -1     0.0
4  42.365480 -12.122838      -1     1.0
5  64.589411  22.223931       1     0.0
6  43.758721 -20.830690      -1     1.0
7  89.177300  45.418579       1     0.0
8  96.366276   7.188976       1    -1.0
9  38.344152  58.022124      -1     NaN

相关问题