pandas 创建一个函数来标准化每个ID的标签

2g32fytz 于 9个月前发布在其他

关注(0)|答案(1)|浏览(103)

我试图创建一个函数，该函数针对给定的条件为给定的ID转换标签列。
我想根据该ID的 * 最常用标签 * 来标准化标签，如果没有common/majority标签，那么就将第一个观察结果作为默认标准。
到目前为止，我的功能如下：

def standardize_labels(df, id_col, label_col):
    # Function to find the most common label or the first one if there's a tie
    def most_common_label(group):
        labels = group[label_col].value_counts()
        # Check if the top two labels have the same count
        if len(labels) > 1 and labels.iloc[0] == labels.iloc[1]:
            return group[label_col].iloc[0]
        return labels.idxmax()

    # Group by the ID column and apply the most_common_label function
    common_labels = df.groupby(id_col).apply(most_common_label)

    # Map the IDs in the original DataFrame to their common labels
    df['standardized_label'] = df[id_col].map(common_labels)

    return df

pandas

来源：https://stackoverflow.com/questions/77668372/creating-a-function-to-standardize-labels-for-each-id

1条答案

按热度按时间

vwkv1x7d1#

这段代码对我来说和预期的一样。但是，你可以使用mode来使它更容易阅读。你也可以将groupby中的函数转换为直接赋值给列，这样你的整个操作就变成了一行代码。

df['standardized_label'] = df.groupby('ID')['raw_label'].transform(lambda x: x.mode()[0])

字符串
或者你也可以使用groupby.apply并Map它。无论如何，函数看起来像这样：

def standardize_labels(df, id_col, label_col):
    # Function to find the most common label or the first one if there's a tie
    def most_common_label(group):
        return group.mode()[0]

    # Group by the ID column and apply the most_common_label function
    common_labels = df.groupby(id_col)[label_col].apply(most_common_label)

    # Map the IDs in the original DataFrame to their common labels
    df['standardized_label'] = df[id_col].map(common_labels)

    return df

型
由于value_counts()工作在一个框架上，我们可以直接使用它而不需要groupby。所以函数可以改为下面的。这是我为另一个问题写的a function的重构。

def standardize_labels(df, id_col, label_col):
    # Group by the ID column and apply the most_common_label function
    labels_counts = df.value_counts([id_col, label_col])
    dup_idx_msk = ~labels_counts.droplevel(label_col).index.duplicated()
    common_labels = labels_counts[dup_idx_msk]
    common_labels = common_labels.reset_index(level=1)[label_col]
    # Map the IDs in the original DataFrame to their common labels
    df['standardized_label'] = df[id_col].map(common_labels)
    return df

df = standardize_labels(df, 'ID', 'raw_label')

型

赞(0）回复(0）举报 9个月前

我来回答

pandas 创建一个函数来标准化每个ID的标签

1条答案

相关问题

热门标签

最新问答