pandas 基于其他列值的Groupby自定义函数

t5zmwmid  于 2023-01-11  发布在  其他
关注(0)|答案(2)|浏览(194)

我有一个数据框架,其中包含各国的调查答复。

country=['Country A','Country A','Country A','Country B','Country B','Country B']
responses=['Agree','Neutral','Disagree','Agree','Neutral','Disagree']
num_respondents=[10,50,30,58,24,23]
example_df = pd.DataFrame({"Country": country, "Response": responses, "Count": num_respondents})

对于每个国家,我想计算(同意人数-不同意人数)/(总受访者人数)的比值。有没有一种简单的方法可以使用groupby或其他panda函数来完成这项工作?

yh2wf1be

yh2wf1be1#

也许会有帮助:

example_df.groupby('Country').apply(lambda x: (sum(x['Count'][x['Response'] == 'Agree']) 
                                            - sum(x['Count'][x['Response'] == 'Disagree'])) 
                                              /sum(x['Count']))
4c8rllxm

4c8rllxm2#

您可以创建一个自定义函数并在其中包含您的逻辑:

import pandas as pd

def custom_agg(grp: pd.DataFrame) -> float:
    """Calculate the difference of agreement and disagreements in a group of responses.

    Parameters
    ----------
    grp : pd.DataFrame
        A pandas DataFrame containing at least two columns: 'Response' and 'Count'.

    Returns
    -------
    float
        The diference between 'Agree' and 'Disagree' responses,
        relative to the total number of responses,
        calculated as: (total_agree - total_disagree) / total_count

    Examples
    --------
    >>> country = ["Country A", "Country A", "Country A", "Country B",
    ...            "Country B", "Country B"]
    >>> responses = ["Agree", "Neutral", "Disagree", "Agree", "Neutral",
    ...             "Disagree"]
    >>> num_respondents = [10, 50, 30, 58, 24, 23]
    >>> example_df = pd.DataFrame({"Country": country, "Response": responses,
    ...                            "Count": num_respondents})
    >>> example_df.groupby("Country").apply(lambda grp: custom_agg(grp))
    """
    total_agree = grp[grp["Response"] == "Agree"]["Count"].sum()
    total_disagree = grp[grp["Response"] == "Disagree"]["Count"].sum()
    total_count = grp["Count"].sum()
    return (total_agree - total_disagree) / total_count

example_df.groupby("Country").apply(lambda grp: custom_agg(grp))
# Returns:
#
# Country
# Country A   -0.222222
# Country B    0.333333
# dtype: float64

当您必须定义在group by/aggregate场景中使用的复杂逻辑时,定义自定义函数特别有用。

相关问题