pandas 基于其他列值的Groupby自定义函数

t5zmwmid 于 2023-01-11 发布在其他

关注(0)|答案(2)|浏览(194)

我有一个数据框架，其中包含各国的调查答复。

country=['Country A','Country A','Country A','Country B','Country B','Country B']
responses=['Agree','Neutral','Disagree','Agree','Neutral','Disagree']
num_respondents=[10,50,30,58,24,23]
example_df = pd.DataFrame({"Country": country, "Response": responses, "Count": num_respondents})

对于每个国家，我想计算（同意人数-不同意人数）/（总受访者人数）的比值。有没有一种简单的方法可以使用groupby或其他panda函数来完成这项工作？

pandas

来源：https://stackoverflow.com/questions/75074071/groupby-custom-function-based-on-other-column-values

2条答案

按热度按时间

yh2wf1be1#

也许会有帮助：

example_df.groupby('Country').apply(lambda x: (sum(x['Count'][x['Response'] == 'Agree']) 
                                            - sum(x['Count'][x['Response'] == 'Disagree'])) 
                                              /sum(x['Count']))

赞(0）回复(0）举报 2023-01-11

4c8rllxm2#

您可以创建一个自定义函数并在其中包含您的逻辑：

import pandas as pd

def custom_agg(grp: pd.DataFrame) -> float:
    """Calculate the difference of agreement and disagreements in a group of responses.

    Parameters
    ----------
    grp : pd.DataFrame
        A pandas DataFrame containing at least two columns: 'Response' and 'Count'.

    Returns
    -------
    float
        The diference between 'Agree' and 'Disagree' responses,
        relative to the total number of responses,
        calculated as: (total_agree - total_disagree) / total_count

    Examples
    --------
    >>> country = ["Country A", "Country A", "Country A", "Country B",
    ...            "Country B", "Country B"]
    >>> responses = ["Agree", "Neutral", "Disagree", "Agree", "Neutral",
    ...             "Disagree"]
    >>> num_respondents = [10, 50, 30, 58, 24, 23]
    >>> example_df = pd.DataFrame({"Country": country, "Response": responses,
    ...                            "Count": num_respondents})
    >>> example_df.groupby("Country").apply(lambda grp: custom_agg(grp))
    """
    total_agree = grp[grp["Response"] == "Agree"]["Count"].sum()
    total_disagree = grp[grp["Response"] == "Disagree"]["Count"].sum()
    total_count = grp["Count"].sum()
    return (total_agree - total_disagree) / total_count

example_df.groupby("Country").apply(lambda grp: custom_agg(grp))
# Returns:
#
# Country
# Country A   -0.222222
# Country B    0.333333
# dtype: float64

当您必须定义在group by/aggregate场景中使用的复杂逻辑时，定义自定义函数特别有用。

赞(0）回复(0）举报 2023-01-11

我来回答

pandas 基于其他列值的Groupby自定义函数

2条答案

相关问题

热门标签

最新问答