R语言 只打印包含数字的文件名

hzbexzde  于 2023-05-26  发布在  其他
关注(0)|答案(3)|浏览(132)

我有几个文件,我想使用这些文件名中存在的数字作为样本ID,并匹配包含特定数字的所有文件。下面是我正在尝试的代码。

underscore <- "_0.005"
sampleid_underscored <- gsub(" ", "", paste(sample_id, underscore))
pattern <- paste0("(S", sample_id, "\\.)", "|", sampleid_underscored)
matching_files <- summaryvcffiles[grepl(pattern, summaryvcffiles)]

当我从命令行使用样本ID 1时,它会打印所有包含11_,21_,31_的文件。我只想打印带有1_和S1的文件。对于样本ID 2也是如此。
对于以下文件:

c("./protocol12/1_0.005_consensus.vardict.summary.tab", 
  "./protocol12/11_0.005_consensus.vardict.summary.tab", 
"./protocol12/21_0.005_consensus.vardict.summary.tab", 
"./protocol12/31_0.005_consensus.vardict.summary.tab", 
"./protocol12/twiceSort/1_0.005_consensus.vardict.summary.tab", 
"./protocol12/twiceSort/11_0.005_consensus.vardict.summary.tab", 
"./protocol12/twiceSort/21_0.005_consensus.vardict.summary.tab", 
"./protocol12/twiceSort/31_0.005_consensus.vardict.summary.tab", 
"./test_S1.vardict.summary.tab", 
"./fwd-protocol12/1_0.005_consensus.vardict.summary.tab", 
"./fwd-protocol12/11_0.005_consensus.vardict.summary.tab",
"./fwd-protocol12/21_0.005_consensus.vardict.summary.tab", 
"./fwd-protocol12/31_0.005_consensus.vardict.summary.tab", 
"./fwd-protocol12/twiceSort/1_0.005_consensus.vardict.summary.tab", 
"./fwd-protocol12/twiceSort/11_0.005_consensus.vardict.summary.tab", 
"./fwd-protocol12/twiceSort/21_0.005_consensus.vardict.summary.tab", 
"./fwd-protocol12/twiceSort/31_0.005_consensus.vardict.summary.tab", 
"./rev-protocol12/1_0.005_consensus.vardict.summary.tab",
"./rev-protocol12/11_0.005_consensus.vardict.summary.tab", 
"./rev-protocol12/21_0.005_consensus.vardict.summary.tab", 
"./rev-protocol12/31_0.005_consensus.vardict.summary.tab", 
"./rev-protocol12/twiceSort/1_0.005_consensus.vardict.summary.tab", 
"./rev-protocol12/twiceSort/11_0.005_consensus.vardict.summary.tab", 
"./rev-protocol12/twiceSort/21_0.005_consensus.vardict.summary.tab", 
"./rev-protocol12/twiceSort/31_0.005_consensus.vardict.summary.tab")

我只想得到下面的文件:

**"./protocol12/1_0.005_consensus.vardict.summary.tab"**
 **"./protocol12/twiceSort/1_0.005_consensus.vardict.summary.tab"**
 **"./test_S1.vardict.summary.tab"**
 **"./fwd-protocol12/1_0.005_consensus.vardict.summary.tab"**
 **"./fwd-protocol12/twiceSort/1_0.005_consensus.vardict.summary.tab"**
 **"./rev-protocol12/1_0.005_consensus.vardict.summary.tab"**
 **"./rev-protocol12/twiceSort/1_0.005_consensus.vardict.summary.tab"**
h43kikqp

h43kikqp1#

怎么样:

underscore <- "_0.005"

list.files(path = '.',
           pattern = sprintf("/S?1%s.*tab", underscore),
           recursive = TRUE, full.names = TRUE
           )
hivapdat

hivapdat2#

下面的代码应该工作

path[str_detect(path, regex("(.+\\/\[your digit id here]\\_\\d+\\.\\d+)|(.+_S[your digit id here]\\.)", ignore_case = TRUE))]

注意:注意regex模式。如果将来您的模式会有所不同,请尝试查看regex101以检查您的模式。不要忘记在regexp函数中添加一个额外的“\”用于转义。

quhf5bfb

quhf5bfb3#

这些命令起作用了

pattern_to_extract <- paste0("(S", sample_id, "\\.", "|" ,"\\/", sample_id, "_0\\.005)")
matching_files <- summaryvcffiles[grepl(pattern_to_extract, summaryvcffiles)]

相关问题