使用pandas或numpy -在一个组中,如何将数据从每行添加到组中的每行?

ig9co6j1  于 11个月前  发布在  其他
关注(0)|答案(1)|浏览(94)

我有一个像下面这样的dict,它代表一场赛马。数据集中有许多比赛,按raceId分组:

data_orig = {
    'meetingId': [178515] * 6,
    'raceId': [879507] * 6,
    'horseId': [90001, 90002, 90003, 90004, 90005, 90006],
    'position': [1, 2, 3, 4, 5, 6],
    'weight': [51, 52, 53, 54, 55, 56],
}

字符串
我想把每一行的马的具体数据添加到每一行。结果应该是这样的:

data_new = {
    'meetingId': [178515] * 6,
    'raceId': [879507] * 6,
    'horseId_a':[90001, 90002, 90003, 90004, 90005, 90006],
    'position_a':[1, 2, 3, 4, 5, 6],
    'weight_a':[51, 52, 53, 54, 55, 56],
    'horseId_b':[90002, 90003, 90004, 90005, 90006, 90001],
    'position_b':[2, 3, 4, 5, 6, 1],
    'weight_b':[52, 53, 54, 55, 56, 51],
    'horseId_c':[90003, 90004, 90005, 90006, 90001, 90002],
    'position_c':[3, 4, 5, 6, 1, 2],
    'weight_c':[53, 54, 55, 56, 51, 52],
    'horseId_d':[90004, 90005, 90006, 90001, 90002, 90003],
    'position_d':[4, 5, 6, 1, 2, 3],
    'weight_d':[54, 55, 56, 51, 52, 53],
    'horseId_e':[90005, 90006, 90001, 90002, 90003, 90004],
    'position_e':[5, 6, 1, 2, 3, 4],
    'weight_e':[55, 56, 51, 52, 53, 54,],
    'horseId_f':[90006, 90001, 90002, 90003, 90004, 90005],
    'position_f':[6, 1, 2, 3, 4, 5],
    'weight_f':[56, 51, 52, 53, 54, 55],
}


我在下面试过了,这是对矩阵的调换。

data_orig_df = pd.DataFrame(data_orig)
new_df = pd.DataFrame()
for index, row_i in data_orig_df.iterrows():
    horseId = row_i['horseId']
    row_new = row_i.copy()
    for index, row_j in race_df.iterrows():
        if row_j['horseId']:
            continue
        row_new = pd.merge(row_new, row_j[getHorseSpecificCols()], suffixes=('', row_j['position']))
    new_df = pd.concat([new_df, row_new], axis=1)


谢谢你的帮忙。

8qgya5xd

8qgya5xd1#

您可以使用numpy轻松地滚动/索引值:

def roll(g):
    a = g.to_numpy()
    x = np.arange(len(a))
    return pd.DataFrame(a[((x[:,None] + x)%len(a)).ravel()].reshape(len(a), -1),
                        index=g.index,
                        columns=[f'{c}_{i+1}' for i in x for c in g.columns])
    
cols = ['meetingId', 'raceId']

out = (data_orig_df.groupby(cols)
       .apply(lambda g: roll(g.drop(columns=cols)))
       .reset_index(cols)
       )

字符串
输出量:

meetingId  raceId  horseId_1  position_1  weight_1  horseId_2  position_2  weight_2  horseId_3  position_3  weight_3  horseId_4  position_4  weight_4  horseId_5  position_5  weight_5  horseId_6  position_6  weight_6
0     178515  879507      90001           1        51      90002           2        52      90003           3        53      90004           4        54      90005           5        55      90006           6        56
1     178515  879507      90002           2        52      90003           3        53      90004           4        54      90005           5        55      90006           6        56      90001           1        51
2     178515  879507      90003           3        53      90004           4        54      90005           5        55      90006           6        56      90001           1        51      90002           2        52
3     178515  879507      90004           4        54      90005           5        55      90006           6        56      90001           1        51      90002           2        52      90003           3        53
4     178515  879507      90005           5        55      90006           6        56      90001           1        51      90002           2        52      90003           3        53      90004           4        54
5     178515  879507      90006           6        56      90001           1        51      90002           2        52      90003           3        53      90004           4        54      90005           5        55

相关问题