pandas 如何在dataframe中使用groupby多列(python中的一列除外)

jgwigjjp 于 2023-01-28 发布在 Python

关注(0)|答案(1)|浏览(183)

我有以下 Dataframe ：

ID    Code     Color   Value
-----------------------------------
0    111     AAA      Blue      23
1    111     AAA       Red      43
2    111     AAA     Green       4
3    121     ABA     Green      45
4    121     ABA     Green      23
5    121     ABA       Red      75
6    122     AAA       Red      52
7    122     ACA      Blue      24
8    122     ACA      Blue      53
9    122     ACA     Green      14
...

我想按列"ID"和"Code"对这个 Dataframe 进行分组，并对"Value"列中的值求和，同时从分组中排除"Color"列。或者换句话说，我想按所有非Value列进行分组，除了"Color"列，然后对"Value"列中的值求和。我使用python来完成这个任务。
我想做的是创建一个列表，其中包含所有不是"Color"和"Value"的列名，并创建这个"column_list"，然后简单地运行：

df.groupby['column_list'].sum()

虽然这不起作用，但我应该如何扩展这段代码，以便能够按照预期正确地执行groupby？
编辑：
此代码适用于：

bins = df.groupby([df.columns[0], 
                   df.columns[1], 
                   df.columns[2]).count() 

bins["Weight"] = bins / bins.groupby(df.columns[0]).sum()
bins.reset_index(inplace=True)
bins['Weight'] = bins['Weight'].round(4)
display(HTML(bins.to_html()))

无法正常工作的完整代码：

column_list = [c for c in df.columns if c not in ['Value']]
bins = df.groupby(column_list, as_index=False)['Value'].count()

bins["Weight"] = bins / bins.groupby(df.columns[0]).sum()  
bins.reset_index(inplace=True)
bins['Weight'] = bins['Weight'].round(4)
display(HTML(bins.to_html()))

pandas

来源：https://stackoverflow.com/questions/75255070/how-to-groupby-multiple-columns-in-dataframe-except-one-in-python

1条答案

按热度按时间

esbemjvw1#

您可以将list传递给groupby，并为聚合sum指定列：

column_list = [c for c in df.columns if c not in ['Color','Value']]
df1 = df.groupby(column_list, as_index=False)['Value'].sum()

或者：

column_list = list(df.columns.difference(['Color','Value'], sort=False))
df1 = df.groupby(column_list, as_index=False)['Value'].sum()

它处理的样本数据如下：

df1 = df.groupby(['ID','Code'], as_index=False)['Value'].sum()

编辑：是的，还在工作：

column_list = [c for c in df.columns if c not in ['Color']]
df1 = df.groupby(column_list, as_index=False).sum()

原因是sum默认删除非数值列，如果未指定Value，则会对所有列求和。
因此，如果Color是数值，它也求和：

print (df)
    ID Code  Color  Value
0  111  AAA      1     23
1  111  AAA      2     43
2  111  AAA      3      4
3  121  ABA      1     45
4  121  ABA      1     23
5  121  ABA      2     75
6  122  AAA      1     52
7  122  ACA      2     24
8  122  ACA      1     53
9  122  ACA      2     14

column_list = [c for c in df.columns if c not in ['Color']]
df1 = df.groupby(column_list, as_index=False).sum()
print (df1)
    ID Code  Value  Color
0  111  AAA      4      3
1  111  AAA     23      1
2  111  AAA     43      2
3  121  ABA     23      1
4  121  ABA     45      1
5  121  ABA     75      2
6  122  AAA     52      1
7  122  ACA     14      2
8  122  ACA     24      2
9  122  ACA     53      1

column_list = [c for c in df.columns if c not in ['Color']]
df1 = df.groupby(column_list, as_index=False)['Value'].sum()
print (df1)
    ID Code  Value
0  111  AAA      4
1  111  AAA     23
2  111  AAA     43
3  121  ABA     23
4  121  ABA     45
5  121  ABA     75
6  122  AAA     52
7  122  ACA     14
8  122  ACA     24
9  122  ACA     53

编辑：如果bins中需要MultiIndex，请删除as_index=False和groupby后的列：

bins = df.groupby([df.columns[0], 
                   df.columns[1], 
                   df.columns[2]).count()

应改为：

column_list = [c for c in df.columns if c not in ['Value']]
bins = df.groupby(column_list).count()

赞(0）回复(0）举报 2023-01-28

我来回答

pandas 如何在dataframe中使用groupby多列(python中的一列除外)

1条答案

相关问题

热门标签

最新问答