R语言 数据框查找范围内的值并返回不同的列

qpgpyjmq  于 12个月前  发布在  其他
关注(0)|答案(3)|浏览(80)

我有两个 Dataframe ,希望使用其中一个(DF1$pos)中的值来搜索DF2中的两列(DF2start,DF2end),如果它福尔斯在这些数字之内,则返回DF2$name

  • DF1*
ID   pos  name
chr   12
chr  542
chr  674

字符串

  • DF2*
ID   start   end   annot
chr      1   200      a1
chr    201   432      a2
chr    540  1002      a3
chr   2000  2004      a4


所以在这个例子中,我希望DF1成为

ID   pos  name
chr   12    a1
chr  542    a3
chr  674    a3


我尝试过使用merge和intersect,但不知道如何使用if语句中的逻辑表达式。
Dataframe 应编码如下,

DF1  <- data.frame(ID=c("chr","chr","chr"),
               pos=c(12,542,672),
               name=c(NA,NA,NA))

DF2  <- data.frame(ID=c("chr","chr","chr","chr"),
               start=c(1,201,540,200),
               end=c(200,432,1002,2004),
               annot=c("a1","a2","a3","a4"))

n9vozmp4

n9vozmp41#

也许您可以使用“data.table”包中的foverlaps

library(data.table)
DT1 <- data.table(DF1)
DT2 <- data.table(DF2)
setkey(DT2, ID, start, end)
DT1[, c("start", "end") := pos]  ## I don't know if there's a way around this step...
foverlaps(DT1, DT2)
#     ID start  end annot pos i.start i.end
# 1: chr     1  200    a1  12      12    12
# 2: chr   540 1002    a3 542     542   542
# 3: chr   540 1002    a3 674     674   674
foverlaps(DT1, DT2)[, c("ID", "pos", "annot"), with = FALSE]
#     ID pos annot
# 1: chr  12    a1
# 2: chr 542    a3
# 3: chr 674    a3

字符串
正如@Arun在评论中提到的,你也可以在foverlaps中使用which = TRUE来提取相关值:

foverlaps(DT1, DT2, which = TRUE)
#    xid yid
# 1:   1   1
# 2:   2   3
# 3:   3   3
DT2$annot[foverlaps(DT1, DT2, which = TRUE)$yid]
# [1] "a1" "a3" "a3"

yzxexxkh

yzxexxkh2#

也可以使用IRanges

source("http://bioconductor.org/biocLite.R")
biocLite("IRanges")
library(IRanges)
DF1N <- with(DF1, IRanges(pos, pos))
DF2N <- with(DF2, IRanges(start, end))
DF1$name <- DF2$annot[subjectHits(findOverlaps(DF1N, DF2N))]
DF1
#   ID pos name
#1 chr  12   a1
#2 chr 542   a3
#3 chr 674   a3

字符串

cqoc49vn

cqoc49vn3#

这个“dupiter”解决方案的工作原理类似于Excel中的范围查找。

library(dplyr)

left_join(DF1, DF2, join_by(closest(pos > start)))

字符串

相关问题