让我们使用以下csv创建一个pyspakr Dataframe :
desc,B1,B2,B3,B4,B5,B6,B7,B8,B9,B10,B11
Other,111957.0,35293.0,225852.0,35110.0,1023680.0,448736.0,256473.0,269856.0,306668.0,8807.0,89551.0
Down,575614.0,203186.0,0.0,125056.0,0.0,766086.0,1157311.0,11127.0,88741.0,31603.0,300733.0
Up,0.0,0.0,1953645.0,0.0,346423.0,0.0,0.0,0.0,0.0,0.0,0.0
Same,2948065.0,730113.0,33121.0,668868.0,5451224.0,4485121.0,30780025.0,1977361.0,5295598.0,217697.0,1790024.0
Old,186596.0,88257.0,0.0,36842.0,2173626.0,240619.0,0.0,2770.0,2212560.0,9865.0,121045.0
New,0.0,0.0,0.0,0.0,3148.0,0.0,97252.0,0.0,0.0,0.0,0.0
它是通过在 Dataframe 上使用透视创建的:
y = x.groupby('desc').pivot('prev_segment').sum('cust_count')
现在我需要将每个B*
列的值转换为百分比,百分比是通过将列中所有值相加,然后将每个单元格除以总和得出的,因此列的总和为100%
如果有人能给予我一个简单的方法来完成这个任务,我将不胜感激,最好是作为透视表本身中的agg函数的一部分。因此,除了sum('cust_count')
之外,也许还有另一种更简单的方法来提供结果 Dataframe
1条答案
按热度按时间olmpazwi1#
使用窗口函数。将每一行除以该列的和。乘以100并舍入到0个小数位。colRegex将帮助您只选择具有B的行。