regex 正则表达式仅查找单词和dplyr中的特殊字符/数字/点

bxfogqkk  于 2023-04-07  发布在  其他
关注(0)|答案(1)|浏览(112)

我需要找到文本中任何地方包含术语info的行
1.前后都没有字符
1.后面带点或任何特殊字符
1.后接一个或多个数字
这是一个数据的快照,可以帮助

df_new <- data.frame(
  text=c('info is given','he is given info. in the class',
               'she needs info2','why not having information',
               'his info# missing', 'info12 and packages are given',
               'parainfo is ready','info. was awarded',
               'meeting is with .info'))
> df_new
                            text
1                  info is given
2 he is given info. in the class
3                she needs info2
4     why not having information
5              his info# missing
6  info12 and packages are given
7              parainfo is ready
8              info. was awarded
9           meeting is with .info

我正在使用这段代码,但它并没有捕获我需要的所有内容:

df_new %>%
  mutate(text=tolower(text)) %>%
  mutate(string_detected = as.integer(str_detect(text, "(^|\\s)info(\\s|$)")))

因此,感兴趣的结果是:

text             strings_detected
                  info is given               1
 he is given info. in the class               1   
                she needs info2               1
     why not having information               0
              his info# missing               1
  info12 and packages are given               1
              parainfo is ready               0
              info. was awarded               1 
           meeting is with .info              0

非常感谢!

daolsyd0

daolsyd01#

下面的正则表达式应该可以工作:(^| )info([\W\d]|$)。请注意,\W将排除_,因此如果您希望接受info_,则应该使用(^| )info([\W\d_]|$)
你可以在http://regex101.com上测试你的正则表达式

相关问题