regex 如何编写正则表达式来对路径进行排序,以便按数字顺序列出路径

oymdgrw7  于 2022-12-01  发布在  其他
关注(0)|答案(3)|浏览(111)

我有数以百计的.wav文件,并使用list. files导入它们。

[1] "10/Project_English-10/BL-0001_A-conifer-cone-contains-seeds/Balsam-poplar-English-0701.wav"           
  [2] "10/Project_English-10/BL-0001_A-conifer-cone-contains-seeds/Birch-English-0700.wav"                   
  [3] "10/Project_English-10/BL-0001_A-conifer-cone-contains-seeds/Blueberry-English-0703.wav"  
.......
  [73] "3/Project_English-3/BL-0002_Lesser-horseshoe-bat/Capercaillie-English-0069.wav"                       
  [74] "3/Project_English-3/BL-0002_Lesser-horseshoe-bat/Fat-tail-scorpion-English-0082.wav"                  
  [75] "3/Project_English-3/BL-0002_Lesser-horseshoe-bat/Fire-salamander-English-0067.wav"

我使用下面的代码来重新排序文件路径,我希望每个子路径中的编号遵循数字顺序。

filename<- file_list[order(as.numeric(stringr::str_extract(file_list,"[0-9]+(.*?)")) )]

结果如下:

[1] "3/Project_English-3/BL-0002_Lesser-horseshoe-bat/Capercaillie-English-0069.wav"                       
  [2] "3/Project_English-3/BL-0002_Lesser-horseshoe-bat/Fat-tail-scorpion-English-0082.wav"                  
  [3] "3/Project_English-3/BL-0002_Lesser-horseshoe-bat/Fire-salamander-English-0067.wav"
.......
  [73] "10/Project_English-10/BL-0001_A-conifer-cone-contains-seeds/Balsam-poplar-English-0701.wav"           
  [74] "10/Project_English-10/BL-0001_A-conifer-cone-contains-seeds/Birch-English-0700.wav"                   
  [75] "10/Project_English-10/BL-0001_A-conifer-cone-contains-seeds/Blueberry-English-0703.wav"

我还希望最后一个子路径以数字顺序跟随,例如English-0067;我试着重复最后一个子路径的匹配,但是它会打乱之前3 ...10的顺序。我怎么能让所有子路径中的数字都遵循数字的顺序呢?

ws51t4hk

ws51t4hk1#

另一个选择:

ord <- order(as.numeric(sub("(^\\d+)/.*$","\\1",files)), as.numeric(sub("^.*-(\\d+)\\.wav","\\1",files)))

files[ord]
#> [1] "3/Project_English-3/BL-0002_Lesser-horseshoe-bat/Fire-salamander-English-0067.wav"         
#> [2] "3/Project_English-3/BL-0002_Lesser-horseshoe-bat/Capercaillie-English-0069.wav"            
#> [3] "3/Project_English-3/BL-0002_Lesser-horseshoe-bat/Fat-tail-scorpion-English-0082.wav"       
#> [4] "10/Project_English-10/BL-0001_A-conifer-cone-contains-seeds/Birch-English-0700.wav"        
#> [5] "10/Project_English-10/BL-0001_A-conifer-cone-contains-seeds/Balsam-poplar-English-0701.wav"
#> [6] "10/Project_English-10/BL-0001_A-conifer-cone-contains-seeds/Blueberry-English-0703.wav"
s3fp2yjn

s3fp2yjn2#

这里有一个方法:

vec <- c( "10/Project_English-10/BL-0001_A-conifer-cone-contains-seeds/Balsam-poplar-English-0701.wav",
"10/Project_English-10/BL-0001_A-conifer-cone-contains-seeds/Birch-English-0700.wav",
"10/Project_English-10/BL-0001_A-conifer-cone-contains-seeds/Blueberry-English-0703.wav",
"3/Project_English-3/BL-0002_Lesser-horseshoe-bat/Capercaillie-English-0069.wav",
"3/Project_English-3/BL-0002_Lesser-horseshoe-bat/Fat-tail-scorpion-English-0082.wav",
"3/Project_English-3/BL-0002_Lesser-horseshoe-bat/Fire-salamander-English-0067.wav")
nums <- strcapture("^([0-9]+).*\\b([0-9]+)\\.[a-z]+$", vec, proto=list(a=0L,b=0L))
nums
#    a   b
# 1 10 701
# 2 10 700
# 3 10 703
# 4  3  69
# 5  3  82
# 6  3  67
do.call(order, nums)
# [1] 6 4 5 2 1 3
vec[do.call(order, nums)]
# [1] "3/Project_English-3/BL-0002_Lesser-horseshoe-bat/Fire-salamander-English-0067.wav"         
# [2] "3/Project_English-3/BL-0002_Lesser-horseshoe-bat/Capercaillie-English-0069.wav"            
# [3] "3/Project_English-3/BL-0002_Lesser-horseshoe-bat/Fat-tail-scorpion-English-0082.wav"       
# [4] "10/Project_English-10/BL-0001_A-conifer-cone-contains-seeds/Birch-English-0700.wav"        
# [5] "10/Project_English-10/BL-0001_A-conifer-cone-contains-seeds/Balsam-poplar-English-0701.wav"
# [6] "10/Project_English-10/BL-0001_A-conifer-cone-contains-seeds/Blueberry-English-0703.wav"

如果需要在排序中包含BL-0001,只需在正则表达式中添加一个小的内容,在proto=中添加一个条目,就可以了。使用do.call(order, nums)将处理1列或更多列,而不管有多少列。
请注意,如果过度调整正则表达式,不匹配此处两个组的行将返回NA;这意味着它将最后排序NA行。如果你发现一个或多个文件名顺序混乱,请检查正则表达式和中间的nums条目中是否有这些文件名。

1aaf6o9v

1aaf6o9v3#

答:将数据结构化为表并在提取文件名之前使用stringr::str_detect()排列行。

vec <- c( "10/Project_English-10/BL-0001_A-conifer-cone-contains-seeds/Balsam-poplar-English-0701.wav",
          "10/Project_English-10/BL-0001_A-conifer-cone-contains-seeds/Birch-English-0700.wav",
          "10/Project_English-10/BL-0001_A-conifer-cone-contains-seeds/Blueberry-English-0703.wav",
          "3/Project_English-3/BL-0002_Lesser-horseshoe-bat/Capercaillie-English-0069.wav",
          "3/Project_English-3/BL-0002_Lesser-horseshoe-bat/Fat-tail-scorpion-English-0082.wav",
          "3/Project_English-3/BL-0002_Lesser-horseshoe-bat/Fire-salamander-English-0067.wav")

library(dplyr)
library(stringr)

vec_tib <- tibble(filename = vec)

vec_tib <- mutate(vec_tib,
                  num_1 = str_extract(filename, "\\d+"),
                  num_2 = str_extract(filename, "\\d+(?=(\\.wav))"))

head(vec_tib, 3)
#> # A tibble: 3 × 3
#>   filename                                                           num_1 num_2
#>   <chr>                                                              <chr> <chr>
#> 1 10/Project_English-10/BL-0001_A-conifer-cone-contains-seeds/Balsa… 10    0701 
#> 2 10/Project_English-10/BL-0001_A-conifer-cone-contains-seeds/Birch… 10    0700 
#> 3 10/Project_English-10/BL-0001_A-conifer-cone-contains-seeds/Blueb… 10    0703

vec_tib <- mutate(vec_tib, across(starts_with("num"), as.numeric))

vec_tib |> 
  arrange(num_1, num_2) |> 
  pull(filename)
#> [1] "3/Project_English-3/BL-0002_Lesser-horseshoe-bat/Fire-salamander-English-0067.wav"         
#> [2] "3/Project_English-3/BL-0002_Lesser-horseshoe-bat/Capercaillie-English-0069.wav"            
#> [3] "3/Project_English-3/BL-0002_Lesser-horseshoe-bat/Fat-tail-scorpion-English-0082.wav"       
#> [4] "10/Project_English-10/BL-0001_A-conifer-cone-contains-seeds/Birch-English-0700.wav"        
#> [5] "10/Project_English-10/BL-0001_A-conifer-cone-contains-seeds/Balsam-poplar-English-0701.wav"
#> [6] "10/Project_English-10/BL-0001_A-conifer-cone-contains-seeds/Blueberry-English-0703.wav"

创建于2022年11月28日,使用reprex v2.0.2

相关问题