pandas 根据groupby日期随机选取项目或值

inkz8wg9  于 2023-04-10  发布在  其他
关注(0)|答案(1)|浏览(132)

如何在var1中随机选取3个变量,按'DATE'分组,求和,然后进行多次模拟?

df =
             
    DATE       var1 
    2023-01-31  1
    2023-01-31  2
    2023-01-31  3
    2023-01-31  4
    2023-01-31  5
    2023-02-28  6
    2023-02-28  7
    2023-02-28  8
    2023-02-28  9
    2023-02-28  10

    Simulation 1 =
    2023-01-31 = (1+3+5) = 9
    2023-02-28 = (6+7+10) = 23

    simulation 2
    2023-01-31 = (1+2+5) = 8
    2023-02-28 = (9+7+10) = 26
    
    simulation n.......

假设我们做了10次模拟

k97glaaz

k97glaaz1#

您可以将groupby.aggsample一起使用:

out = df.groupby('DATE').agg(lambda g: g.sample(n=3).sum())

示例输出:

var1
DATE            
2023-01-31     8
2023-02-28    27

如果要重复此过程,请使用循环:

N = 10

for i in range(N):
    print(f'simulation {i+1}')
    print(df.groupby('DATE').agg(lambda g: g.sample(n=3).sum()))
根据重复采样创建DataFrame:
N = 10
query = 'DATE == "2023-01-31"'

out = pd.concat({i+1: df.query(query).groupby('DATE').agg(lambda g: g.sample(n=3).sum())
                 for i in range(N)
                 }, names=['simulation'])

示例输出:

var1
simulation DATE            
1          2023-01-31     8
2          2023-01-31    10
3          2023-01-31    12
4          2023-01-31     8
5          2023-01-31     9
6          2023-01-31    10
7          2023-01-31    11
8          2023-01-31    12
9          2023-01-31    10
10         2023-01-31     6

相关问题