如何在R中基于区间匹配向data.table中添加列？[副本]

mpbci0fu 于 2023-05-26 发布在其他

关注(0)|答案(1)|浏览(75)

此问题已在此处有答案：

Overlap join with start and end positions（5个答案）
3天前关闭。
我有两个数据表，A和B。表A具有两列“chrom”和“pos”，而B表示从BED文件读取的一系列间隔。我想在data.table A中添加一个名为“select_status”的新列。如果一行的“pos”福尔斯在B中的任何区间内，则“select_status”中的相应值应设置为TRUE;否则，应将其设置为FALSE。
下面是一个示例来说明数据结构：

library(data.table)

A <- data.table(chrom = c("chr1", "chr2", "chr3", "chr3", "chr3"),
                pos = c(100, 200, 300, 391, 399))
B <- data.table(chrom = c("chr1", "chr2", "chr2", "chr3", "chr3", "chr3"),
                start = c(150, 180, 250, 280, 390, 600),
                end = c(200, 220, 300, 320, 393, 900))

# I need add a col select_status to A, and set it to Ture if pos in B
# I want someting like this but this is wrong

A[, select_status := any(pos >= B$start & pos <= B$end & chrom == B$chrom)]

或

A[, select_status := sapply(.SD, function(x) any(x >= B$start & x <= B$end)), .SDcols = c("pos"), by = .(chrom)]

A[is.na(select_status), select_status := FALSE]

我的解决方案是不工作，因为它不比较位置和区域匹配的行在B中，位置chr3 399也将被设置为TURE
我知道可以使用apply逐行遍历A，然后将遍历的结果作为过滤器应用于B，以获得类似的结果，但在数据具有许多行的情况下，这会较慢，我想知道是否有另一种更简洁的方法
我期待结果

A
   chrom pos select_status
1:  chr1 100         FALSE
2:  chr2 200          TRUE
3:  chr3 300          TRUE
4:  chr3 391          TRUE
5:  chr3 399          FALSE

r

来源：https://stackoverflow.com/questions/76302850/how-to-add-a-column-to-data-table-based-on-interval-matching-in-r

1条答案

按热度按时间

avwztpqn1#

以下是可以考虑的一种方法：

library(data.table)

A <- data.table(chrom = c("chr1", "chr2", "chr3", "chr3", "chr3"),
                pos = c(100, 200, 300, 391, 399))

B <- data.table(chrom = c("chr1", "chr2", "chr2", "chr3", "chr3", "chr3"),
                start = c(150, 180, 250, 280, 390, 600),
                end = c(200, 220, 300, 320, 393, 900))

X_Val <- eval(parse(text = paste0("c(",  paste0(paste0(B$start, ":", B$end), collapse = ","), ")")))
A[["select_status"]] <- ifelse(A$pos %in% X_Val, TRUE, FALSE)

 A
   chrom pos select_status
1:  chr1 100         FALSE
2:  chr2 200          TRUE
3:  chr3 300          TRUE
4:  chr3 391          TRUE
5:  chr3 399         FALSE

赞(0）回复(0）举报 2023-05-26

我来回答

如何在R中基于区间匹配向data.table中添加列？[副本]

1条答案

相关问题

热门标签

最新问答