python 如何计算一个数据框列中包含定性数据的年百分比?

jhkqcmku  于 2023-01-04  发布在  Python
关注(0)|答案(3)|浏览(137)

假设我的 Dataframe 如下所示:
| 构建年份|品牌|
| - ------| - ------|
| 二○一○年|梅赛德斯|
| 二○一○年|梅赛德斯|
| 二○一○年|宝马|
| 二○一○年|起亚|
| 二○一一年|丰田|
| 二○一一年|梅赛德斯|
| 二○一一年|梅赛德斯|
| 二〇一二年|特斯拉|
我想找到构建年份和品牌的所有唯一组合,然后计算这些值,并计算每年每种颜色的百分比。

df.groupby(["Build year", "Brand"]).count()

是否有简单的方法将其转换为每年的百分比?所需的输出为:
| 构建年份|品牌|计数|年度计数百分比|
| - ------| - ------| - ------| - ------|
| 二○一○年|梅赛德斯|第二章|0.5分|
| 二○一○年|宝马|1个|0.25|
| 二○一○年|起亚|1个|0.25|
| 二○一一年|丰田|1个|0.33|
| 二○一一年|梅赛德斯|第二章|0.66|
| 二〇一二年|特斯拉|1个|1个|

lf3rwulv

lf3rwulv1#

您只需按Build Year分组,并使用.value_counts

import pandas as pd

df = pd.read_clipboard() # Your df here
groups = df.groupby("Build year")

count = groups.value_counts()
percentage = groups.value_counts(normalize=True)

out = pd.concat([count, percentage], axis=1, keys=["Count", "Percentage of annual count"])
Count  Percentage of annual count
Build year Brand
2010       Mercedes      2                    0.500000
           BMW           1                    0.250000
           Kia           1                    0.250000
2011       Mercedes      2                    0.666667
           Toyota        1                    0.333333
2012       Tesla         1                    1.000000
zujrkrfu

zujrkrfu2#

您可以使用lambda函数计算年度计数百分比,如下所示:

grouped_df = df.groupby(["Build year", "Brand"])

counts = grouped_df.size().reset_index(name='Count')

counts['Percentage of annual count'] = grouped_df.apply(lambda x: 100 * x.count() / x.count().sum())

完整代码示例

import pandas as pd

data = {'Build year': [2010, 2010, 2010, 2010, 2011, 2011, 2011, 2012],
        'Brand': ['Mercedes', 'Mercedes', 'BMW', 'Kia', 'Toyota', 'Mercedes', 'Mercedes', 'Tesla']}

df = pd.DataFrame(data)

grouped_df = df.groupby(["Build year", "Brand"])
counts = grouped_df.size().reset_index(name='Count')
counts['Percentage of annual count'] = grouped_df.apply(lambda x: 100 * x.count() / x.count().sum())

print(counts)

产出

Build year     Brand         Count                 Percentage of annual count
0        2010  Mercedes      2                     50.00
1        2010       BMW      1                     25.00
2        2010       Kia      1                     25.00
3        2011    Toyota      1                     33.33
4        2011  Mercedes      2                     66.67
5        2012     Tesla      1                    100.00

希望能有所帮助

jckbn6z7

jckbn6z73#

一种方法链接方法(假设名称中没有空格)可能如下所示:

(
df.groupby(["build_year","brand"])
    .agg(count=('build_year', "count"))
    .assign(Percentage_of_annual_count = 
                lambda x: x["count"]/x.groupby("build_year")["count"]
                                      .transform(lambda x: sum(x)))
)

完整解决方案:

df = pd.DataFrame({
"build_year":[2010,2010,2010,2010,2011,2011,2011,2012,],
"brand":["Mercedes","Mercedes","BMW","Kia","Toyota","Mercedes","Mercedes","Tesla",]
})
(
df.groupby(["build_year","brand"])
    .agg(count=('build_year', "count"))
    .assign(Percentage_of_annual_count = 
                lambda x: x["count"]/x.groupby("build_year")["count"]
                                      .transform(lambda x: sum(x)))
)

输出:

count  Percentage_of_annual_count
build_year brand                                      
2010       BMW           1                    0.250000
           Kia           1                    0.250000
           Mercedes      2                    0.500000
2011       Mercedes      2                    0.666667
           Toyota        1                    0.333333
2012       Tesla         1                    1.000000

相关问题