regex 正则表达式:如何查找并替换子字符串及其前面任何字符,直到白色?

jjhzyzn0  于 2022-11-18  发布在  其他
关注(0)|答案(1)|浏览(132)

我试图找到包含任何类型的ampm的时间模式,并希望将整个模式替换为--
我所想的是找到包含ampm的字符串,它们之前/之间/之后可能包含或不包含点.,然后与它们之前的任何数字模式一起提取,直到我到达一白色。
下面是原始数据t0

t0 <- c("29th October 2022 5-6pm", "12-1pm 02/11/22", "10:25 bike rack at bexley college erith", "November 2nd 2022, apm shop ", " between 7pm Thursday 27th October to Saturday 29th October 9am", "04/09/2022 at 4 a.m.", "4/09/2022 at 4.a.m.", "04/09/2022 at 4.a.m" , "28.10.22 between 1.30pm and midnight", " Sunday 30th October 2022 between 11am and 3pm", "30th October, approx 6pm", "03/11/2022", "02/11/22 at campus", "Between 15:15 and 21:10", "03/11/2022 7pm", " Between 5:30pm and 6:30pm on 31/10/2022", "10am-2pm 31 oct 2022", "31/10/22 5.15am", " Tuesday 25th October 2022. 10:30pm", "30/10/2022 6pm")

然后我创建两个变量t1t2来存储搜索结果和gsub结果,得到的结果如下:

library("stringr")

t1 <- t0[str_detect(t0, "\\s[\\s|0-9|\\.|:]+a\\.?m\\.?|p\\.?m\\.?")]
t2 <- t1 %>% gsub("\\s[\\s|0-9|\\.|:]+a\\.?m\\.?|p\\.?m\\.?","--", .)

> t1
 [1] "29th October 2022 5-6pm"                                         "12-1pm 02/11/22"                                                
 [3] "November 2nd 2022, apm shop "                                    " between 7pm Thursday 27th October to Saturday 29th October 9am"
 [5] "04/09/2022 at 4 a.m."                                            "4/09/2022 at 4.a.m."                                            
 [7] "04/09/2022 at 4.a.m"                                             "28.10.22 between 1.30pm and midnight"                           
 [9] " Sunday 30th October 2022 between 11am and 3pm"                  "30th October, approx 6pm"                                       
[11] "03/11/2022 7pm"                                                  " Between 5:30pm and 6:30pm on 31/10/2022"                       
[13] "10am-2pm 31 oct 2022"                                            "31/10/22 5.15am"                                                
[15] " Tuesday 25th October 2022. 10:30pm"                             "30/10/2022 6pm"   

> t2
 [1] "29th October 2022 5-6--"                                       "12-1-- 02/11/22"                                              
 [3] "November 2nd 2022, a-- shop "                                  " between 7-- Thursday 27th October to Saturday 29th October--"
 [5] "04/09/2022 at 4 a.m."                                          "4/09/2022 at--"                                               
 [7] "04/09/2022 at--"                                               "28.10.22 between 1.30-- and midnight"                         
 [9] " Sunday 30th October 2022 between-- and 3--"                   "30th October, approx 6--"                                     
[11] "03/11/2022 7--"                                                " Between 5:30-- and 6:30-- on 31/10/2022"                     
[13] "10am-2-- 31 oct 2022"                                          "31/10/22--"                                                   
[15] " Tuesday 25th October 2022. 10:30--"                           "30/10/2022 6--"

而期望的结果是:

> t2
[1] "29th October 2022--"                                              "-- 02/11/22"                                              
[3] " between-- Thursday 27th October to Saturday 29th October--"      "04/09/2022 at--"
[5] "4/09/2022 at--"                                                   "04/09/2022 at--"                                               
[7] "28.10.22 between-- and midnight"                                  " Sunday 30th October 2022 between-- and--"                   
[9] "30th October, approx--"                                           "03/11/2022--"                                                
[11] " Between-- and-- on 31/10/2022"                                  "----- 31 oct 2022"                                          
[13] "31/10/22--"                                                      " Tuesday 25th October 2022.--"                           
[15] "30/10/2022--"

我应该如何更正正则表达式模式?

j91ykkif

j91ykkif1#

t1 <- gsub("\\s?[-:0-9.]+\\s*[ap]\\.?m\\.?", "--", t0)
t1[t1 != t0]
#  [1] "29th October 2022--"                                        
#  [2] "-- 02/11/22"                                                
#  [3] " between-- Thursday 27th October to Saturday 29th October--"
#  [4] "04/09/2022 at--"                                            
#  [5] "4/09/2022 at--"                                             
#  [6] "04/09/2022 at--"                                            
#  [7] "28.10.22 between-- and midnight"                            
#  [8] " Sunday 30th October 2022 between-- and--"                  
#  [9] "30th October, approx--"                                     
# [10] "03/11/2022--"                                               
# [11] " Between-- and-- on 31/10/2022"                             
# [12] "---- 31 oct 2022"                                           
# [13] "31/10/22--"                                                 
# [14] " Tuesday 25th October 2022.--"                              
# [15] "30/10/2022--"

这和你声称的“期望结果”之间的唯一区别是[12]

t1[t1 != t0][12]
# [1] "---- 31 oct 2022"
t2[12]
# [1] "----- 31 oct 2022"

相关问题