Pandas;尝试使用拆分列中的字符串|，然后列出所有字符串，删除所有重复项

bpzcxfmw 于 2022-11-27 发布在其他

关注(0)|答案(1)|浏览(160)

我正在为一个虚构的电视节目制作一个 Dataframe 。在这个 Dataframe 中，有几列："季节"、"剧集标题"、"关于"、"收视率"、"投票"、"收视率"、"持续时间"、"日期"、"访客之星"、"导演"、"编剧"，其中以升序数值列出行。
在这个数据框架中，我的问题涉及两列; 'Writers'和'Viewership'。在Writers栏中，某些栏有多个writers，以""分隔|"。在"收视率"列中，每列都有一个介于1和23之间的浮点值，最多有2位小数。
下面是我正在使用的数据框架的一个精简示例，我试图过滤"作家"专栏，然后确定每个作家的总平均收视率：

df = pd.DataFrame({'Writers' : ['John Doe','Jennifer Hopkins | John Doe','Ginny Alvera','Binny Glasglow | Jennifer Hopkins','Jennifer Hopkins','Sam Write','Lawrence Fieldings | Ginny Alvera | John Doe','John Doe'], 'Viewership' : '3.4','5.26','22.82','13.5','4.45','7.44','9'})

我想到的拆分列字符串的解决方案是：

df["Writers"]= df["Writers"].str.split('|', expand=False)

这确实会分割字符串，但在某些情况下会在逗号前后留下空格。我需要删除空格，然后我需要列出所有作家，但每个作家只列出一次。
第二，对于每一位作者，我希望有一个专栏，说明他们的总平均收视率，或者每个作者的列表，说明他们的总平均收视率是他们工作的所有剧集：
["John Doe : 15" , "Jennifer Hopkins : 7.54" , "Lawrence Fieldings : 3.7"]
这是我在这里的第一篇文章，我真的很感谢任何帮助!

pandas

来源：https://stackoverflow.com/questions/74554643/pandas-trying-to-split-a-string-in-a-column-with-and-then-list-all-strings

1条答案

按热度按时间

hgc7kmma1#

# I believe in newer versions of pandas you can split cells to multiple rows like this
# here is a reference https://pandas.pydata.org/pandas-docs/stable/whatsnew/v0.25.0.html#series-explode-to-split-list-like-values-to-rows

df2 =df.assign(Writers=df.Writers.str.split('|')).explode('Writers').reset_index(drop=True)

#to remove whitespaces just use this
#this will remove white spaces at the beginning and end of every cell in that column
df2['Writers'] = df2['Writers'].str.strip()

#if you want to remove duplicates, then do a groupby
# this will combine (sum) duplicate, you can use any other mathematical aggregation
# function as well (you can replace sum() by mean())
df2.groupby(['writers']).sum()

赞(0）回复(0）举报 2022-11-27

我来回答

Pandas;尝试使用拆分列中的字符串|，然后列出所有字符串，删除所有重复项

1条答案

相关问题

热门标签

最新问答