Pandas分组：如何使用不同列中的变量创建新列

i2loujxw 于 2023-03-06 发布在其他

关注(0)|答案(4)|浏览(128)

考虑下面的DataFrame：

df = pd.DataFrame({'c0':['1980']*3+['1990']*2+['2000']*3,
                   'c1':['x','y','z']+['x','y']+['x','y','z'],
                   'c2':range(8)  })

     c0 c1  c2
0  1980  x   0
1  1980  y   1
2  1980  z   2
3  1990  x   3
4  1990  y   4
5  2000  x   5
6  2000  y   6
7  2000  z   7

我想在c0上使用pandas的groupby执行以下操作：
1.根据c0（表示年份）对行进行分组。
1.在每组中，从c2的值中减去y的c2值（在c1中）。
1.添加一个新列c3来收集这些值。
最终的结果是

c0 c1  c2  c3
0  1980  x   0  -1
1  1980  y   1   0
2  1980  z   2   1
3  1990  x   3  -1
4  1990  y   4   0
5  2000  x   5  -1
6  2000  y   6   0
7  2000  z   7   1

我可以在不使用groupby的情况下获得如下结果：

dic = {}

for yr in df['c0'].unique():
    
    cond1 = ( df['c0']==yr )
    tmp = df.loc[cond1,:].copy()
    
    cond2 = ( tmp['c1']=='y' )
    val = tmp.loc[cond2,'c2'].to_numpy()
    
    tmp['c3'] = tmp['c2'] - val
    
    dic[yr] = tmp

pd.concat([dic['1980'],dic['1990'],dic['2000']])

它工作正常，但看起来不是很好。我尝试了transform和apply来获得groupby，但无法弄清楚。如有任何帮助，将不胜感激。

pandas

来源：https://stackoverflow.com/questions/75570533/python-pandas-groupby-how-to-use-variables-in-different-columns-to-create-a-new

4条答案

按热度按时间

7lrncoxx1#

以下是多索引选择的新增功能

s = df.set_index(['c0', 'c1'])
s['c3'] = s['c2'] - s['c2'].xs('y', level=1)
s = s.reset_index()

结果

c0 c1  c2  c3
0  1980  x   0  -1
1  1980  y   1   0
2  1980  z   2   1
3  1990  x   3  -1
4  1990  y   4   0
5  2000  x   5  -1
6  2000  y   6   0
7  2000  z   7   1

赞(0）回复(0）举报 2023-03-06

tyky79it2#

使用where隐藏组的所有非y行后，可以使用transform广播y值：

df['c3' ] = df['c2'] - df.where(df['c1'] == 'y').groupby(df['c0'])['c2'].transform('max')
print(df)

# Output
     c0 c1  c2   c3
0  1980  x   0 -1.0
1  1980  y   1  0.0
2  1980  z   2  1.0
3  1990  x   3 -1.0
4  1990  y   4  0.0
5  2000  x   5 -1.0
6  2000  y   6  0.0
7  2000  z   7  1.0

赞(0）回复(0）举报 2023-03-06

vshtjzan3#

另一种可能的解决方案：

df['c3'] = (df.groupby('c0')
            .apply(lambda g: g['c2'].values-g.loc[g['c1'].eq('y'), 'c2'].values)
            .explode().values)

输出：

c0 c1  c2  c3
0  1980  x   0  -1
1  1980  y   1   0
2  1980  z   2   1
3  1990  x   3  -1
4  1990  y   4   0
5  2000  x   5  -1
6  2000  y   6   0
7  2000  z   7   1

赞(0）回复(0）举报 2023-03-06

rta7y2nd4#

df['c3']  = (df.groupby('c0')
               .apply(lambda g: g['c2']-g['c2'][g['c1'].eq('y')].values)
               .reset_index(drop=True)
           )

df
     c0 c1  c2  c3
0  1980  x   0  -1
1  1980  y   1   0
2  1980  z   2   1
3  1990  x   3  -1
4  1990  y   4   0
5  2000  x   5  -1
6  2000  y   6   0
7  2000  z   7   1

赞(0）回复(0）举报 2023-03-06

我来回答

Pandas分组：如何使用不同列中的变量创建新列

4条答案

相关问题

热门标签

最新问答