我有一个包含圣诞歌曲歌词的 Dataframe ,大致如下所示:
df1 <- data.frame(line = c("I don't want a lot for Christmas",
"There is just one thing I need",
"I don't care about the presents",
"Underneath the Christmas tree",
"I just want you for my own"))
我还安装了R包aws.comprehend
。
然后我把它变成一个长串:
lyrics_df1 <- df1 %>%
iconv(., from = "UTF-8", to = 'ASCII//TRANSLIT') %>%
str_c(.,collapse = " ")
当我现在运行代码detect_sentiment(lyrics_df1)
时,输出为:
Index Sentiment Mixed Negative Neutral Positive
1 0 NEUTRAL 0.0003775794 0.291473 0.6762416 0.03190778
但是,如果我只对作为字符串的歌词运行相同的代码,我会得到以下输出:
detect_sentiment("I don't want a lot for Christmas
There is just one thing I need
I don't care about the presents underneath the Christmas tree
I just want you for my own")
Index Sentiment Mixed Negative Neutral Positive
1 0 NEUTRAL 0.2951728 0.2238117 0.3551461 0.1258695
现在输出完全不同了!
如何确保得到与直接将整个歌词粘贴到detect_sentiment()
函数相同的结果?
1条答案
按热度按时间unftdfkk1#
当您使用第一个命令时,您将整个 Dataframe 发送到函数,这将导致:
[1]c(“圣诞节我不要太多”、“我只需要一样东西”、“我不在乎礼物”、“圣诞树下”、“我只想把你据为己有”)
可能是添加的符号导致了得分的差异。要将函数直接应用于变量,请使用
pull
[1]“圣诞节我不想要很多东西我只想要一样东西我不在乎圣诞树下的礼物我只想拥有你”