R语言见字典里的单词表

6qftjkof 于 2023-07-31 发布在其他

关注(0)|答案(2)|浏览(159)

我想看看字典里的单词。这是我的字典：

Name               Type                             Value
dict_lg            list [2] (quanteda::dictionary2) List of length 2
   NEGATIVE        character [2867]                 'à côrnes' 'à court de personnel'
   POSITIVE        list [1] (quanteda::dictionary2) List of length 1 
      VÉRITÉ* (1)) character [0]

字符串
我希望看到每个列表中包含的单词（消极，积极）。如果我这样做：

dict_lg <- dictionary(file = "frlsd/frlsd.cat", encoding = "UTF-8")
dict_lg$NEGATIVE

型
它会给我打印一个否定词的列表，但是如果我这样做了：

dict_lg$POSITIVE

型
本人获得：

Dictionary object with 1 key entry.
- [VÉRITÉ* (1))]:

型
如果我真这么做了

dict_lg[["POSITIVE"]][["VÉRITÉ* (1))"]]

型
我获得

character(0)

型
如何查看积极词汇列表？原始字典文件如下：https://www.poltext.org/fr/donnees-et-analyses/lexicoder

来源：https://stackoverflow.com/questions/76738262/r-see-list-of-words-in-dictionary

2条答案

按热度按时间

u5rb5r591#

你可以像这样检查字典的列表结构：

rapply(dict_lg, f = \(i) i, how = 'list') |> str()

字符串
...这表明结构被打乱了（在生成cat-file时或导入时）：

List of 2
 $ NEGATIVE:List of 1
  ..$ : chr [1:2867] "à côrnes" "à court de personnel " "à l'étroit" "à peine*" ...
 $ POSITIVE:List of 2
  ..$ VÉRITÉ* (1)):List of 1
  .. ..$ : chr(0) 
  ..$             : chr [1:1283] "à l'épreuve*" "à la mode" "abondamment" "abondance" ...

型
...但是，您可以像这样从列表项'POSITIVE'中提取所有术语：

rapply(dict_lg, f = \(i) i, how = 'list')$POSITIVE

型

edit将字典转换为术语和情感的 Dataframe ，e. g.过滤掉负面情绪的术语：

library(dplyr)

rapply(dict_lg, f = \(i) i, how = 'unlist', ) %>%
data.frame(term = .,
           sentiment = gsub('(POSITIVE|NEGATIVE).*', '\\1', names(.))
           ) %>%
filter(sentiment == 'NEGATIVE')

term sentiment
NEGATIVE1              à côrnes  NEGATIVE
NEGATIVE2 à court de personnel   NEGATIVE
NEGATIVE3            à l'étroit  NEGATIVE
NEGATIVE4              à peine*  NEGATIVE
NEGATIVE5                abais*  NEGATIVE
NEGATIVE6              abandon*  NEGATIVE
## truncated

赞(0）回复(0）举报 2023-07-31

uxh89sit2#

这里的问题在于您在https://www.poltext.org/fr/donnees-et-analyses/lexicoder处引用的文件。对于键“POSITIVE”下的值“VÉRITÉ”，它有一个额外的“）”。消除这一点，字典将正常运行。
x1c 0d1x的数据
我消除了额外的“）”，然后加载在编辑的文件中，它工作得很好。

library("quanteda")
#> Package version: 3.3.1
#> Unicode version: 14.0
#> ICU version: 71.1
#> Parallel computing: 10 of 10 threads used.
#> See https://quanteda.io for tutorials and examples.
dict <- dictionary(file = "~/Downloads/frlsd_edited.cat")

print(dict, max_nval = 6)
#> Dictionary object with 2 key entries.
#> - [NEGATIVE]:
#>   - à côrnes, à court de personnel , à l'étroit, à peine*, abais*, abandon* [ ... and 2,861 more ]
#> - [POSITIVE]:
#>   - à l'épreuve*, à la mode, abondamment, abondance, abondant*, abonde* [ ... and 1,278 more ]

head(dict$POSITIVE)
#> [1] "à l'épreuve*" "à la mode"    "abondamment"  "abondance"    "abondant*"   
#> [6] "abonde*"

head(dict$NEGATIVE)
#> [1] "à côrnes"              "à court de personnel " "à l'étroit"           
#> [4] "à peine*"              "abais*"                "abandon*"

字符串
创建于2023-07-24带有reprex v2.0.2

赞(0）回复(0）举报 2023-07-31

我来回答

R语言见字典里的单词表

2条答案

相关问题

热门标签

最新问答

R语言 见字典里的单词表

2条答案

相关问题

热门标签

最新问答

R语言见字典里的单词表