Python - Pandas：规范化groupby中的列

anauzrmj 于 2023-04-10 发布在 Python

关注(0)|答案(1)|浏览(185)

我希望这个问题不是微不足道的。我一直在寻找答案在其他线程没有运气。
我一直在尝试分析一个用groupby分隔的数据集，在这个数据集中，我添加了一列，其中包含变量“profits”的累积和（cumsum）。
现在，为了比较不同类别的结果，我想将累积利润值“归一化”，将其除以其组中的最大值。
我一直在尝试使用数学和lambda函数，但我无法找到一种方法来显示所需的结果。
这里是我的部分代码的输出

us_discount = us.groupby(['Sub-Category', 'Discount'], as_index = False)['Profit'].sum()
us_discount['Cumulative Profit'] = us_discount.groupby('Sub-Category', as_index = False)['Profit'].cumsum()

print(us_discount.groupby('Sub-Category')['Cumulative Profit'].max())

us_discount['test'] = us_discount['Cumulative Profit'] / us_discount.groupby('Sub-Category')['Cumulative Profit'].max()

us_discount.head()

结果如下：

正如你所看到的，最大值实际上是正确计算的，但是我不能把它们作为变量“test”添加进去。我希望打印输出是“test”列的声音。
由于我目前正在学习pandas，我想（如果可能的话）找到一个使用库的解决方案，如果可能的话，不使用lambda函数。我知道我可以使用矩阵或拟合来解决问题。

pandas

来源：https://stackoverflow.com/questions/75954626/python-pandas-normalizing-a-column-within-a-groupby

1条答案

按热度按时间

q5lcpyga1#

这是一个方法来做你问：

us_discount['Cumulative Profit'] = us_discount.groupby('Sub-Category', as_index = False)['Profit'].cumsum()
us_max = us_discount.groupby('Sub-Category')['Cumulative Profit'].max()
us_discount['test'] = us_discount['Cumulative Profit'].div(us_discount['Sub-Category'].map(us_max))

us_discount的输入示例：

Sub-Category  Discount      Profit
0  Accessories       0.0  35289.2539
1  Accessories       0.2   6647.3818
2  Applicances       0.0  23183.7361
3  Applicances       0.1   1086.0808
4  Applicances       0.2   2497.8297

输出：

Sub-Category  Discount      Profit  Cumulative Profit      test
0  Accessories       0.0  35289.2539         35289.2539  0.841490
1  Accessories       0.2   6647.3818         41936.6357  1.000000
2  Applicances       0.0  23183.7361         23183.7361  0.866110
3  Applicances       0.1   1086.0808         24269.8169  0.906685
4  Applicances       0.2   2497.8297         26767.6466  1.000000

赞(0）回复(0）举报 2023-04-10

我来回答

Python - Pandas：规范化groupby中的列

1条答案

相关问题

热门标签

最新问答