AWS情绪分析在同一字符串上给出不同结果(R)

zour9fqk  于 2022-12-20  发布在  其他
关注(0)|答案(1)|浏览(144)

我有一个包含圣诞歌曲歌词的 Dataframe ,大致如下所示:

df1 <- data.frame(line = c("I don't want a lot for Christmas", 
                           "There is just one thing I need", 
                           "I don't care about the presents", 
                           "Underneath the Christmas tree", 
                           "I just want you for my own"))

我还安装了R包aws.comprehend
然后我把它变成一个长串:

lyrics_df1 <- df1 %>% 
  iconv(., from = "UTF-8", to = 'ASCII//TRANSLIT') %>% 
  str_c(.,collapse = " ")

当我现在运行代码detect_sentiment(lyrics_df1)时,输出为:

Index Sentiment        Mixed Negative   Neutral   Positive
1     0   NEUTRAL 0.0003775794 0.291473 0.6762416 0.03190778

但是,如果我只对作为字符串的歌词运行相同的代码,我会得到以下输出:

detect_sentiment("I don't want a lot for Christmas
There is just one thing I need
I don't care about the presents underneath the Christmas tree
I just want you for my own")

  Index Sentiment     Mixed  Negative   Neutral  Positive
1     0   NEUTRAL 0.2951728 0.2238117 0.3551461 0.1258695

现在输出完全不同了!
如何确保得到与直接将整个歌词粘贴到detect_sentiment()函数相同的结果?

unftdfkk

unftdfkk1#

当您使用第一个命令时,您将整个 Dataframe 发送到函数,这将导致:

df1 %>% 
  iconv(., from = "UTF-8", to = 'ASCII//TRANSLIT') %>% 
  str_c(.,collapse = " ")

[1]c(“圣诞节我不要太多”、“我只需要一样东西”、“我不在乎礼物”、“圣诞树下”、“我只想把你据为己有”)
可能是添加的符号导致了得分的差异。要将函数直接应用于变量,请使用pull

df1 %>% 
  pull(line) %>% 
  iconv(., from = "UTF-8", to = 'ASCII//TRANSLIT') %>% 
  str_c(.,collapse = " ")

[1]“圣诞节我不想要很多东西我只想要一样东西我不在乎圣诞树下的礼物我只想拥有你”

相关问题