regex 如何使一组未对齐的子串重叠到主序列上?

nbewdwxp  于 2023-05-08  发布在  其他
关注(0)|答案(1)|浏览(120)

是否可以在R中将未对齐的子串重叠到主序列后进行分组?
这里是虚拟数据集。

library(stringr)
seq <- "cuggg"  #This is main string
pattern <- c("cuggguu",  
             "cuggguuu",
             "cugGguu",
             "cuggga",
             "cugggaa",
             "cugggaaa",
             "cugCg")  # these are the substrings

# However I couldnt make group of unaligned substring after overlapping to main sequence 
aa <- str_extract_all(seq, pattern)
aa

# [[1]]
# character(0)
# [[2]]
# character(0)
# [[3]]
# character(0)
# [[4]]
# character(0)
# [[5]]
# character(0)
# [[6]]
# character(0)
# [[7]]
# character(0)
# [[8]]
# character(0)

我想输出如下的数据集:

#'[ X     Y]
#'[uu     2]  
#'[uuu    1]
#'[Gguu   1]
#'[a      1]
#'[aa     1]
#'[aaa    1]
#'[Cg     1]

如果可能的话,采用tidyverse方法更可取。

wvt8vs2t

wvt8vs2t1#

我假设你误解了str_extract_all,因为它没有取出模式并从字符串中删除。您可以尝试subgsub

> sub("cug{,3}", "", s)
[1] "uu"   "uuu"  "Gguu" "a"    "aa"   "aaa"  "Cg"

数据

s <- c(
    "cuggguu",
    "cuggguuu",
    "cugGguu",
    "cuggga",
    "cugggaa",
    "cugggaaa",
    "cugCg"
)

相关问题