我需要找到文本中任何地方包含术语info
的行
1.前后都没有字符
1.后面带点或任何特殊字符
1.后接一个或多个数字
这是一个数据的快照,可以帮助
df_new <- data.frame(
text=c('info is given','he is given info. in the class',
'she needs info2','why not having information',
'his info# missing', 'info12 and packages are given',
'parainfo is ready','info. was awarded',
'meeting is with .info'))
> df_new
text
1 info is given
2 he is given info. in the class
3 she needs info2
4 why not having information
5 his info# missing
6 info12 and packages are given
7 parainfo is ready
8 info. was awarded
9 meeting is with .info
我正在使用这段代码,但它并没有捕获我需要的所有内容:
df_new %>%
mutate(text=tolower(text)) %>%
mutate(string_detected = as.integer(str_detect(text, "(^|\\s)info(\\s|$)")))
因此,感兴趣的结果是:
text strings_detected
info is given 1
he is given info. in the class 1
she needs info2 1
why not having information 0
his info# missing 1
info12 and packages are given 1
parainfo is ready 0
info. was awarded 1
meeting is with .info 0
非常感谢!
1条答案
按热度按时间daolsyd01#
下面的正则表达式应该可以工作:
(^| )info([\W\d]|$)
。请注意,\W
将排除_
,因此如果您希望接受info_
,则应该使用(^| )info([\W\d_]|$)
。你可以在http://regex101.com上测试你的正则表达式