如何在R中提取Excel .CSV文件中括号内的文本？

xnifntxz 于 2023-02-10 发布在其他

关注(0)|答案(1)|浏览(107)

我有一个Excel.CSV文件，其中一列有对话的转录。每当说话者使用西班牙语时，西班牙语都写在括号内。
所以[通常]也许[我来这里一个新的媒体]喜欢我锻炼和我喜欢要么去网上上课或在人喜欢它取决于一天
理想情况下，我希望分别提取英语和西班牙语，这样一个文件将包含所有西班牙语单词，另一个文件将包含所有英语单词。
有什么想法吗？或者使用哪个函数/包？
编辑以添加：这个Excel工作表中大约有100个包含文本的单元格。我想我困惑的地方是如何将整个CSV视为"字符串"？

r

来源：https://stackoverflow.com/questions/75367432/how-to-extract-text-within-brackets-in-excel-csv-file-in-r

1条答案

按热度按时间

0x6upsns1#

你可以通过Vectorize调用seq函数并建立索引，然后使用stringr::word提取索引处的整个单词来实现这一点：
示例字符串：

strng <- "so [usualmente] maybe [me levanto como a las nueve y media] like I exercise and the I like either go to class online or in person like it depends on the day"

代码

strng <- "so [usualmente] maybe [me levanto como a las nueve y media] like I exercise and the I like either go to class online or in person like it depends on the day"

vecSeq <- Vectorize(seq.default, vectorize.args = c("to", "from"))

ixstart <- grep("\\[", unlist(strsplit(strng, " ")))
ixend <- grep("\\]", unlist(strsplit(strng, " ")))
spanish_ix <- unlist(vecSeq(ixstart, ixend, 1))
english_ix <- setdiff(1:(lengths(gregexpr("\\W+", strng)) + 1), spanish_ix)

spanish <- paste(stringr::word(strng, spanish_ix), collapse = " ")
english <- paste(stringr::word(strng, english_ix), collapse = " ")

#spanish
#[1] "[usualmente] [me levanto como a las nueve y media]"
#> english
#[1] "so maybe like I exercise and the I like either go to class #online or in person like it depends on the day"

注意，要删除讨厌的括号只是做：spanish <- gsub("\\]|\\[", "", spanish)

赞(0）回复(0）举报 2023-02-10

我来回答

如何在R中提取Excel .CSV文件中括号内的文本？

1条答案

相关问题

热门标签

最新问答