使用Pandas以特定格式堆叠 Dataframe

sq1bmfud  于 12个月前  发布在  其他
关注(0)|答案(1)|浏览(154)

我有一个panas框架如下:

df 

   Prod  ProdDesc   tot    avg   qtr        val_qtr
   A      Cyl       110   8.7    202301     12
   A      Cyl       110   8.7    202302     56.9
   A      Cyl       110   8.7    202303      9
   A      Cyl       110   8.7    202304      0

字符串
所以我想要的是堆栈/转置的嵌套。我用了pandas melt,

df_tra = df.melt(id_vars=['Prod', 'ProdDesc'], var_name='Attrib', value_name='Value')
    df_tra.drop_duplicates()


所以我的输出是:

df_tra

    Prod  ProdDesc  Attrib   Value
     A    Cyl       tot      110           
     A    Cyl       avg      8.7           
     A    Cyl       quarter  202301    
     A    Cyl       quarter  202302
     A    Cyl       quarter  202303
     A    Cyl       quarter  202304        
     A    Cyl       val_qtr  12    
     A    Cyl       val_qtr  56.9
     A    Cyl       val_qtr  9
     A    Cyl       val_qtr  0

**但是输出我想要的/想要的是不同的。**我想要的是以下内容:

df_actual_wanted 

    Prod  ProdDesc  Attrib   Value
    A     Cyl       tot      110           
    A     Cyl       avg      8.7           
    A     Cyl       202301   12 
    A     Cyl       202302   56.9
    A     Cyl       202303    9
    A     Cyl       202304    0


我怎么才能做到这一点?

xqkwcwgp

xqkwcwgp1#

选择多列DataFrame.drop_duplicatesDataFrame.melt,并与snoter子集rename通过concat连接,最后如果需要,按两列排序:

df1 = (df[['Prod','ProdDesc','tot','avg']]
               .drop_duplicates()
               .melt(id_vars=['Prod', 'ProdDesc'], var_name='Attrib', value_name='Value'))
df2 = (df[['Prod','ProdDesc','qtr','val_qtr']]
               .rename(columns={'qtr':'Attrib','val_qtr':'Value'}))

out = pd.concat([df1, df2]).sort_values(['Prod','ProdDesc'], ignore_index=True)
print (out)
  Prod ProdDesc  Attrib  Value
0    A      Cyl     tot  110.0
1    A      Cyl     avg    8.7
2    A      Cyl  202301   12.0
3    A      Cyl  202302   56.9
4    A      Cyl  202303    9.0
5    A      Cyl  202304    0.0

字符串
如果默认的索引和排序需要和原来的一样,改变一下解决方案:

print (df)
   Prod ProdDesc  tot   avg     qtr  val_qtr
0     A      Cyl  110  8.70  202301     12.0
1     A      Cyl  110  8.70  202302     56.9
2     A      Cyl  110  8.70  202303      9.0
3     A      Cyl  110  8.70  202304      0.0
4     B      Cyl  133  8.76  202301     12.0
5     B      Cyl  133  8.76  202302     56.9
6     B      Cyl  133  8.76  202303      9.0
7     B      Cyl  133  8.76  202304      0.0
8     A     Cyl1  117  8.37  202301     12.0
9     A     Cyl1  117  8.37  202302     56.9
10    A     Cyl1  117  8.37  202303      9.0
11    A     Cyl1  117  8.37  202304      0.0
df1 = (df[['Prod','ProdDesc','tot','avg']]
               .drop_duplicates()
               .melt(id_vars=['Prod', 'ProdDesc'], 
                     var_name='Attrib', 
                     value_name='Value',
                     ignore_index=False))
df2 = (df[['Prod','ProdDesc','qtr','val_qtr']]
               .rename(columns={'qtr':'Attrib','val_qtr':'Value'}))

out = pd.concat([df1, df2]).sort_index(kind='stable', ignore_index=True)
print (out)
   Prod ProdDesc  Attrib   Value
0     A      Cyl     tot  110.00
1     A      Cyl     avg    8.70
2     A      Cyl  202301   12.00
3     A      Cyl  202302   56.90
4     A      Cyl  202303    9.00
5     A      Cyl  202304    0.00
6     B      Cyl     tot  133.00
7     B      Cyl     avg    8.76
8     B      Cyl  202301   12.00
9     B      Cyl  202302   56.90
10    B      Cyl  202303    9.00
11    B      Cyl  202304    0.00
12    A     Cyl1     tot  117.00
13    A     Cyl1     avg    8.37
14    A     Cyl1  202301   12.00
15    A     Cyl1  202302   56.90
16    A     Cyl1  202303    9.00
17    A     Cyl1  202304    0.00

如果小数据或性能不重要:

def f(x):
    y = x[['tot','avg']].iloc[0].T.reset_index().set_axis(['Attrib', 'Value'], axis=1)
    return pd.concat([y, x[['Attrib','Value']]])

out = (df.rename(columns={'qtr':'Attrib','val_qtr':'Value'})
         .groupby(['Prod', 'ProdDesc'], sort=False)
         .apply(f)
         .droplevel(-1)
         .reset_index())
print (out)
   Prod ProdDesc  Attrib   Value
0     A      Cyl     tot  110.00
1     A      Cyl     avg    8.70
2     A      Cyl  202301   12.00
3     A      Cyl  202302   56.90
4     A      Cyl  202303    9.00
5     A      Cyl  202304    0.00
6     B      Cyl     tot  133.00
7     B      Cyl     avg    8.76
8     B      Cyl  202301   12.00
9     B      Cyl  202302   56.90
10    B      Cyl  202303    9.00
11    B      Cyl  202304    0.00
12    A     Cyl1     tot  117.00
13    A     Cyl1     avg    8.37
14    A     Cyl1  202301   12.00
15    A     Cyl1  202302   56.90
16    A     Cyl1  202303    9.00
17    A     Cyl1  202304    0.00

相关问题