是否可以在R中将未对齐的子串重叠到主序列后进行分组?
这里是虚拟数据集。
library(stringr)
seq <- "cuggg" #This is main string
pattern <- c("cuggguu",
"cuggguuu",
"cugGguu",
"cuggga",
"cugggaa",
"cugggaaa",
"cugCg") # these are the substrings
# However I couldnt make group of unaligned substring after overlapping to main sequence
aa <- str_extract_all(seq, pattern)
aa
# [[1]]
# character(0)
# [[2]]
# character(0)
# [[3]]
# character(0)
# [[4]]
# character(0)
# [[5]]
# character(0)
# [[6]]
# character(0)
# [[7]]
# character(0)
# [[8]]
# character(0)
我想输出如下的数据集:
#'[ X Y]
#'[uu 2]
#'[uuu 1]
#'[Gguu 1]
#'[a 1]
#'[aa 1]
#'[aaa 1]
#'[Cg 1]
如果可能的话,采用tidyverse方法更可取。
1条答案
按热度按时间wvt8vs2t1#
我假设你误解了
str_extract_all
,因为它没有取出模式并从字符串中删除。您可以尝试sub
或gsub
数据