pandas Python中的聚合函数获取最大交集

ddrv8njm  于 2023-04-28  发布在  Python
关注(0)|答案(1)|浏览(81)

我有一个类似下面的dataframe(可用数组格式或unnest格式):

team  | player     | favorite_food
  A   | A_player1  | [pizza, sushi]
  A   | A_player2  | [salad, sushi]
  B   | B_player1  | [pizza, pasta, salad, taco]
  B   | B_player2  | [taco, salad, sushi]
  B   | B_player3  | [taco]

我想得到每支球队球员共有的食物数量和百分比。如下所示:

team  | #_food_common | percent_food_common
  A   | 1             |  0.33
  B   | 1             |  0.2

在Python中做这件事的好方法是什么,最好是Pandas?

wtzytmuj

wtzytmuj1#

您可以使用set操作和groupby.agg

(df['favorite_food'].apply(set)
 .groupby(df['team'])
 .agg(**{'#_food_common': lambda x: len(set.intersection(*x)),
         'percent_food_common': lambda x: len(set.intersection(*x))/len(set.union(*x)),
         
        })
 .reset_index()
)

输出:

team  #_food_common  percent_food_common
0    A              1             0.333333
1    B              1             0.200000

使用的输入:

df = pd.DataFrame({'team': ['A', 'A', 'B', 'B', 'B'],
                   'player': ['A_player1', 'A_player2', 'B_player1', 'B_player2', 'B_player3'],
                   'favorite_food': [['pizza', 'sushi'],
                                     ['salad', 'sushi'],
                                     ['pizza', 'pasta', 'salad', 'taco'],
                                     ['taco', 'salad', 'sushi'],
                                     ['taco']]})

相关问题