pandas 在按某些列分组的堆叠式结构中创建变量之间的比率

jslywgbw  于 12个月前  发布在  其他
关注(0)|答案(3)|浏览(112)

我有一个DF如下:

df_in 
      G1    G2       TPE           QC
      
      A     S1       td            2
      A     S1       ts            4
      A     S2       td            6
      A     S2       ts            3
      B     S1       td            20
      B     S1       ts            40
      B     S2       td            60
      B     S2       ts            30
      C     S1       td            90
      D     S2       ts            7

字符串
因此,输出应该按列G1和G2分组,对于每个这样的组,对列QC执行逐行比率,如(ts/td)其中TPE列的值为td和ts,并将TPE列中的新变量重命名为ratio。它还应包含原始行。还应注意,对于TPE列,某些组可能没有ts和td值。在这种情况下,将没有比率或比率应保持为空白。
所以输出应该是这样的:

df_out

         G1    G2       TPE           QC
      
         A     S1       td            2
         A     S1       ts            4
         A     S2       td            6
         A     S2       ts            3
         B     S1       td            20
         B     S1       ts            40
         B     S2       td            60
         B     S2       ts            30
         C     S1       td            90
         D     S2       ts            7
         A     S1       ratio         2
         A     S2       ratio         0.5
         B     S1       ratio         2
         B     S2       ratio        0.5
         C     S1       ratio         
         D     S2       ratio


我尝试了以下方法,但它忽略了C & D组的空白值和空白比率:

def calculate_ratio(group):
    td_row = group[group['TPE'] == 'td']
    ts_row = group[group['TPE'] == 'ts']
    if not td_row.empty and not ts_row.empty:
        ratio = ts_row['QC'].values[0] / td_row['QC'].values[0]
        return pd.DataFrame({'G1': [group['G1'].iloc[0]], 
                             'G2': [group['G2'].iloc[0]], 
                             'TPE': ['ratio'], 
                             'QC': [ratio]})
    return pd.DataFrame()

grouped = df_in.groupby(['G1', 'G2']).apply(calculate_ratio).reset_index(drop=True)

df_out = pd.concat([df_in, grouped], ignore_index=True)


任何帮助将不胜感激。

uklbhaso

uklbhaso1#

验证码

tmp = df_in.set_index(['G1', 'G2', 'TPE']).unstack()['QC']
out = pd.concat([df_in, tmp['ts'].div(tmp['td']).reset_index(name='QC').assign(TPE='ratio')])

字符串
输出:

G1  G2  TPE     QC
0   A   S1  td      2.0
1   A   S1  ts      4.0
2   A   S2  td      6.0
3   A   S2  ts      3.0
4   B   S1  td      20.0
5   B   S1  ts      40.0
6   B   S2  td      60.0
7   B   S2  ts      30.0
8   C   S1  td      90.0
9   D   S2  ts      7.0
0   A   S1  ratio   2.0
1   A   S2  ratio   0.5
2   B   S1  ratio   2.0
3   B   S2  ratio   0.5
4   C   S1  ratio   NaN
5   D   S2  ratio   NaN

中间日期

温度:

TPE td      ts
G1  G2      
A   S1  2.0     4.0
    S2  6.0     3.0
B   S1  20.0    40.0
    S2  60.0    30.0
C   S1  90.0    NaN
D   S2  NaN     7.0

z2acfund

z2acfund2#

另一种可能的解决方案是使用多重索引,pandas.xsts值与td值分开,最后使用pandas.concat连接两个字符串:

s = df.set_index(['G1', 'G2', 'TPE'])

pd.concat([
    df, s.xs('ts', level=2).div(s.xs('td', level=2))
    .reset_index().assign(TPE='ratio')])

字符串
输出量:

G1  G2    TPE    QC
0  A  S1     td   2.0
1  A  S1     ts   4.0
2  A  S2     td   6.0
3  A  S2     ts   3.0
4  B  S1     td  20.0
5  B  S1     ts  40.0
6  B  S2     td  60.0
7  B  S2     ts  30.0
8  C  S1     td  90.0
9  D  S2     ts   7.0
0  A  S1  ratio   2.0
1  A  S2  ratio   0.5
2  B  S1  ratio   2.0
3  B  S2  ratio   0.5
4  C  S1  ratio   NaN
5  D  S2  ratio   NaN

yacmzcpb

yacmzcpb3#

你也可以使用pivotpipe来完成这个任务:

out = pd.concat([
    df_in, 
    df_in.pivot(index=['G1','G2'], columns='TPE', values='QC')
         .pipe(lambda df:df['ts'].div(df['td']))
         .reset_index(name='QC')
         .assign(TPE='ratio')
    ]
)

字符串
输出量:

G1  G2    TPE    QC
0  A  S1     td   2.0
1  A  S1     ts   4.0
2  A  S2     td   6.0
3  A  S2     ts   3.0
4  B  S1     td  20.0
5  B  S1     ts  40.0
6  B  S2     td  60.0
7  B  S2     ts  30.0
8  C  S1     td  90.0
9  D  S2     ts   7.0
0  A  S1  ratio   2.0
1  A  S2  ratio   0.5
2  B  S1  ratio   2.0
3  B  S2  ratio   0.5
4  C  S1  ratio   NaN
5  D  S2  ratio   NaN


如果您希望NaN值为空字符串而不是NaN,则可以向结果添加fillna('')

相关问题