识别R Dataframe 中的文本模式

a6b3iqyw  于 2023-05-20  发布在  其他
关注(0)|答案(1)|浏览(167)

我在 Dataframe 的两列中有标识符,但结构不同。它看起来像这样:

Description1                Description2
1  A0A2H1CVW1_FASHEprotein1   tr|A0A2H1CVW1|A0A2H1CVW1_FASHEprotein1 
2  A0A4E0RAA2_FASHEprotein2   tr|A0A2H1BSG1|A0A2H1BSG1_FASHEprotein3
3  A0A2H1CFJ4_FASHEprotein4   tr|A0A2H1CFJ4|A0A2H1CFJ4_FASHEprotein4

如何识别两列之间的不同标识符,例如第2行中的标识符?

qojgxg4l

qojgxg4l1#

可以使用stringr软件包中的str_detect来确定是否可以在Description2中找到Description1

library(stringr)

str_detect(df$Description2, df$Description1)
#> [1]  TRUE FALSE  TRUE

数据为可复制格式

df <- structure(list(Description1 = c("A0A2H1CVW1_FASHEprotein1",  
                                      "A0A4E0RAA2_FASHEprotein2", 
                                      "A0A2H1CFJ4_FASHEprotein4"), 
                     Description2 = c("tr|A0A2H1CVW1|A0A2H1CVW1_FASHEprotein1", 
                                      "tr|A0A2H1BSG1|A0A2H1BSG1_FASHEprotein3",
                                      "tr|A0A2H1CFJ4|A0A2H1CFJ4_FASHEprotein4"
                )), class = "data.frame", row.names = c("1", "2", "3"))

相关问题