我有一个包含评论的Pandas数据框架。对于每一篇评论,我有不同的单词和具体的分数,如下所示:
import pandas as pd
df = pd.DataFrame({
"review_num": [2,2,2,1,1,1,1,1,3,3],
"review": ["The second review", "The second review", "The second review",
"This is the first review", "This is the first review",
"This is the first review", "This is the first review",
"This is the first review",'Not Noo', 'Not Noo'],
"token_num":[1,2,3,1,2,3,4,5,1,2],
"token":["The", "second", "review", "This", "is", "the", "first", "review", "Not", "Noo"],
"score":[0.3,-0.6,0.4,0.5,0.6,0.7,-0.6,0.4,0.5,0.6]
})
使用下面的代码,我可以通过将转换函数应用于得分最高的单词来修改评论,并创建一个包含旧评论和新评论的新 Dataframe 。
# Identify the line with the max score for each review
token_max_score = df.groupby("review_num", sort=False)["score"].idxmax()
# keep only lines with max score by review
Modified_df = df.loc[token_max_score, ["review_num", "review"]]
def modify_word(w):
return w + "E" # just to simplify the example
# Add the new column
Modified_df = Modified_df.join(
pd.DataFrame(
{
"Modified_review": [
txt.replace(w, modify_word(w))
for w, txt in zip(
df.loc[token_max_score, "token"], df.loc[token_max_score, "review"]
)
]
},
index=token_max_score,
)
)
我需要应用转换函数n次,而不是只应用一次(就像我的代码中那样)
当前修改的 Dataframe 为:
review_num review Modified_review
2 2 The second review The second reviewE
5 1 This is the first review This is theE first review
9 3 Not Noo Not NooE
n = 2的预期修改 Dataframe 为:
review_num review Modified_review
2 2 The second review TheE second reviewE
5 1 This is the first review This isE theE first review
9 3 Not Noo NotE NooE
谢谢你的帮助。
1条答案
按热度按时间zphenhs41#
这里是一种方法来做它与Pandas应用:
然后: