pandas 如何将小 Dataframe 的每一行插入到大 Dataframe 的匹配行之前?

wbrvyc0a  于 2023-03-21  发布在  其他
关注(0)|答案(1)|浏览(148)

我有一个小的和一个大的 Dataframe
最小那个

WS      period shortCode identifier
6        197.78  2023-03-10   TC2-FFA       spot
7        196.79  2023-03-10   TC5-FFA       spot
8        253.13  2023-03-10   TC6-FFA       spot
9        198.13  2023-03-13  TC12-FFA       spot
10       166.67  2023-03-10  TC14-FFA       spot
11       217.86  2023-03-10  TC17-FFA       spot
18        97.00  2023-03-10   TD3-FFA       spot
19       172.19  2023-03-10   TD7-FFA       spot
20       205.71  2023-03-13   TD8-FFA       spot
21       175.63  2023-03-10  TD19-FFA       spot
22       115.45  2023-03-10  TD20-FFA       spot
23  11350000.00  2023-03-10  TD22-FFA       spot
24       232.14  2023-03-10  TD25-FFA       spot

有多索引的大文件

datumUnit                      $/mt       WS
identifier period shortCode                 
TC2BALMO   Mar 23 TC2-FFA    39.376  228.930
TC2CURMON  Mar 23 TC2-FFA    35.946  208.988
TC2+1_M    Apr 23 TC2-FFA    38.444  223.512
TC2+2_M    May 23 TC2-FFA    37.786  219.686
TC2+3_M    Jun 23 TC2-FFA    36.613  212.866
                            ...      ...
TD25+3Q    Q4 23  TD25-FFA   42.909  185.432
TD25+4Q    Q1 24  TD25-FFA   39.000      NaN
TD25+5Q    Q2 24  TD25-FFA   32.421      NaN
TD25+1CAL  Cal 24 TD25-FFA   34.250      NaN
TD25+2CAL  Cal 25 TD25-FFA   33.955      NaN

这是它的多索引

MultiIndex([( 'TC2BALMO', 'Mar 23',  'TC2-FFA'),
            ('TC2CURMON', 'Mar 23',  'TC2-FFA'),
            (  'TC2+1_M', 'Apr 23',  'TC2-FFA'),
                  ...
            (  'TD25+4Q',  'Q1 24', 'TD25-FFA'),
            (  'TD25+5Q',  'Q2 24', 'TD25-FFA'),
            ('TD25+1CAL', 'Cal 24', 'TD25-FFA'),
            ('TD25+2CAL', 'Cal 25', 'TD25-FFA')],
           names=['identifier', 'period', 'shortCode'], length=198)

我希望将小 Dataframe 的“spot”行插入到每个shortCode的第二个 Dataframe 的顶部,而不更改大 Dataframe 的顺序
预期结果

datumUnit                      $/mt       WS
identifier period shortCode                 
spot     23-03-10 TC2-FFA      NaN   197.78        
TC2BALMO   Mar 23 TC2-FFA    39.376  228.930
TC2CURMON  Mar 23 TC2-FFA    35.946  208.988
TC2+1_M    Apr 23 TC2-FFA    38.444  223.512
TC2+2_M    May 23 TC2-FFA    37.786  219.686
TC2+3_M    Jun 23 TC2-FFA    36.613  212.866
                            ...      ...
spot     23-03-10 TD25-FFA      NaN   232.14  
TD25BALMO  Mar 23 TD25-FFA   48.902  211.331
TD25CURMON Mar 23 TD25-FFA   53.254  230.138
TD25+1_M   Apr 23 TD25-FFA   46.815  202.312
TD25+2_M   May 23 TD25-FFA   43.717  188.924
TD25+3_M   Jun 23 TD25-FFA   41.571  179.650
TD25+4_M   Jul 23 TD25-FFA   40.776  176.214
TD25+5_M   Aug 23 TD25-FFA   40.281  174.075
TD25CURQ   Q1 23  TD25-FFA   46.668  201.677
TD25+1Q    Q2 23  TD25-FFA   44.035  190.298
TD25+2Q    Q3 23  TD25-FFA   40.367  174.447
TD25+3Q    Q4 23  TD25-FFA   42.909  185.432
TD25+4Q    Q1 24  TD25-FFA   39.000      NaN
TD25+5Q    Q2 24  TD25-FFA   32.421      NaN
TD25+1CAL  Cal 24 TD25-FFA   34.250      NaN
TD25+2CAL  Cal 25 TD25-FFA   33.955      NaN
0mkxixxg

0mkxixxg1#

如果df中的shortCode-块连接,您可以尝试(df_small是小 Dataframe ,df是主 Dataframe ):

df_small = df_small.set_index("shortCode", drop=False)
df = df.reset_index()
cols = ["identifier", "period", "shortCode", "WS"]
dfs = []
for short_code, sdf in df.groupby("shortCode", sort=False):
    dfs.append(pd.concat([df_small.loc[[short_code], cols], sdf]))
result = (
    pd.concat(dfs, ignore_index=True)
    .set_index(["identifier", "period", "shortCode"])
)

如果不是这种情况,并且您希望每个连接的块都有新行,那么您可以尝试:

df_small = df_small.set_index("shortCode", drop=False)
df = df.reset_index()
cols = ["identifier", "period", "shortCode", "WS"]
dfs = []
groups = df["shortCode"].ne(df["shortCode"].shift()).cumsum()
for _, sdf in df.groupby(groups):
    dfs.append(pd.concat([df_small.loc[[sdf.iat[0, 2]], cols], sdf]))
result = (
    pd.concat(dfs, ignore_index=True)
   .set_index(["identifier", "period", "shortCode"])
)

两个版本的假设:在df_small中正好有一个shortCode对应于df中的每个shortCode
与样品相似的结果:

WS    $/mt
identifier period     shortCode                 
spot       2023-03-10 TC2-FFA    197.780     NaN
TC2BALMO   Mar 23     TC2-FFA    228.930  39.376
TC2CURMON  Mar 23     TC2-FFA    208.988  35.946
TC2+1_M    Apr 23     TC2-FFA    223.512  38.444
TC2+2_M    May 23     TC2-FFA    219.686  37.786
TC2+3_M    Jun 23     TC2-FFA    212.866  36.613
spot       2023-03-10 TD25-FFA   232.140     NaN
TD25BALMO  Mar 23     TD25-FFA   211.331  48.902
TD25CURMON Mar 23     TD25-FFA   230.138  53.254
TD25+3Q    Q4 23      TD25-FFA   185.432  42.909
TD25+4Q    Q1 24      TD25-FFA       NaN  39.000
TD25+5Q    Q2 24      TD25-FFA       NaN  32.421
TD25+1CAL  Cal 24     TD25-FFA       NaN  34.250
TD25+2CAL  Cal 25     TD25-FFA       NaN  33.955

相关问题