regex 字符串中的标题和副标题用R中的CamelCase分隔

mjqavswn 于 2023-01-31 发布在其他

关注(0)|答案(3)|浏览(153)

我已经刮了一个标题列表，其中一些有字幕。不幸的是，每当有字幕时，它就粘贴到标题上（如paste0()）。我如何在R中将两者分开？我正在考虑一些regex，因为模式是CamelCase表示字幕，如下所示：

data <- data.frame(title = "Bilder aus dem LebenWie man Universalerbe wird")

result <- data.frame(title = "Bilder aus dem Leben",
                     subtitle = "Wie man Universalerbe wird")

regex

来源：https://stackoverflow.com/questions/75287416/separate-title-and-subtitle-in-a-string-by-camelcase-in-r

3条答案

按热度按时间

nbysray51#

一个普通的正则表达式可以查找一个小写字母，然后查找一个大写字母，

strcapture("^(.+[a-z])([A-Z].+)", data$title, proto = list(title = "", subtitle = ""))
#                  title                   subtitle
# 1 Bilder aus dem Leben Wie man Universalerbe wird

赞(0）回复(0）举报 2023-01-31

ar7v8xwq2#

带tidyr's（新）separate_wider_regex：

library(tidyr)
separate_wider_regex(data, title, c(title = "^.+[a-z]", subtitle = "[A-Z].+"))

#  title                subtitle                                
#1 Bilder aus dem Leben Wie man Universalerbe wird

这相当于被取代的extract：

extract(data, title, c("title", "subtitle"), "^(.+[a-z])([A-Z].+)")

赞(0）回复(0）举报 2023-01-31

vcudknz33#

您可以使用tidyr中的separate：

library(tidyverse)
data %>%
  separate(title, into = c("title", "subtitle"), sep = "(?<=[a-z])(?=[A-Z])")
                 title                   subtitle
1 Bilder aus dem Leben Wie man Universalerbe wird

sep在这里使用两个查找来定义拆分点：

(?<=[a-z])：正向后看，Assert在拆分点的左侧必须有小写字母，以及
(?=[A-Z])：Assert拆分点右侧必须有一个大写字母的正向前看

赞(0）回复(0）举报 2023-01-31

我来回答

regex 字符串中的标题和副标题用R中的CamelCase分隔

3条答案

相关问题

热门标签

最新问答