Pandas数据框架有一个列,其中包含“list”数据结构作为条目,如何按Groupby()分组?

h7wcgrx3  于 2023-01-01  发布在  其他
关注(0)|答案(1)|浏览(104)
  • 查找制作过平均ROI(投资回报率)最高的电影的前3名制作人 *

描述:(我正被提供一张table)

import pandas as pd
import numpy as np
        
table = pd.DataFrame({'Movie_title':['Hot Tub Time Machine 2','The Princess Diaries 2: Royal Engagement','Whiplash','Kahaani','마린보이'],'Producers':[['Andrew Panay','Jason Blum'],['Whitney Houston', 'Mario Iscovich', 'Michel Litvak'],['David Lancaster', 'Michel Litvak', 'Jason Blum', 'Helen Estabrook'],['Sujoy Ghosh'],[]],'Directors':[['Steve Pink'],['Garry Marshall'],['Damien Chazelle'],['Sujoy Ghosh'],['Jong-seok Yoon']],'ROI':[-12.038207142857143,137.8735875,296.72727272727275,1233.3333333333333,-76.14607902735563]})

这是 Dataframe table DataFrame的外观
我想在"Producers"列上应用. groupby()方法,然后在ROI列上使用. mean()方法

table.groupby('Producers')[['Movie Title','ROI','Directors']].mean('ROI')
    • 但它抛出错误**(请参考下图)

last line of error
请参考下面的图片来查看错误。我不知道如何添加jupyter笔记本代码输出和Pandas Dataframe 请帮助我解决这个问题陈述。我提供了jupyter笔记本代码块的图片。

0mkxixxg

0mkxixxg1#

您可以在Producers列上使用pandas explode函数-对于列表中每行的每个元素,您将得到一个新行(其中包含来自索引和其他列的复制数据)。

table.explode("Producers")
Movie_title        Producers          Directors          ROI
0                    Hot Tub Time Machine 2     Andrew Panay       [Steve Pink]   -12.038207
0                    Hot Tub Time Machine 2       Jason Blum       [Steve Pink]   -12.038207
1  The Princess Diaries 2: Royal Engagement  Whitney Houston   [Garry Marshall]   137.873588
1  The Princess Diaries 2: Royal Engagement   Mario Iscovich   [Garry Marshall]   137.873588
1  The Princess Diaries 2: Royal Engagement    Michel Litvak   [Garry Marshall]   137.873588
2                                  Whiplash  David Lancaster  [Damien Chazelle]   296.727273
2                                  Whiplash    Michel Litvak  [Damien Chazelle]   296.727273
2                                  Whiplash       Jason Blum  [Damien Chazelle]   296.727273
2                                  Whiplash  Helen Estabrook  [Damien Chazelle]   296.727273
3                                   Kahaani      Sujoy Ghosh      [Sujoy Ghosh]  1233.333333
4                                      마린보이              NaN   [Jong-seok Yoon]   -76.146079

然后,您可以应用您的均值groupby(并将其与爆炸组合,以获得一行解):

table.explode("Producers").groupby("Producers").mean("ROI")

最终结果:

ROI
Producers                   
Andrew Panay      -12.038207
David Lancaster   296.727273
Helen Estabrook   296.727273
Jason Blum        142.344533
Mario Iscovich    137.873588
Michel Litvak     217.300430
Sujoy Ghosh      1233.333333
Whitney Houston   137.873588

相关问题