检查一个 Dataframe R中一列的值是否存在于另外两列中

uqjltbpv  于 2022-12-06  发布在  其他
关注(0)|答案(3)|浏览(164)

我想找出一种方法来比较SAME数据框中的列,但要用这种方法创建一个名为STATUS的新列作为输出。我有3列1)SNPs、2)gained和3)lost。我想知道列1中每个单元格中的数据是否出现在列2或列3中。如果第1列的数据出现在第2列,那么我希望输出为GAINED,如果它出现在第3列,那么输出将为LOST。如果它出现在任何一列,那么输出将为NEUTRAL
以下是我想要的:

SNPs         GAINED          LOST           STATUS
1_752566     1_949654        6_30022061     NEUTRAL
1_776546     1_1045331       6_30314321     NEUTRAL
1_832918     1_832918        13_95612033    GAINED
1_914852     1_1247494       1_914852       LOST

我试过这个:

data_frame$status <- data.frame(lapply(data_frame[1], `%in%`, data_frame[2:3]))

但我的数据并不是以这种方式组织的,因此它不能在每行中找到所有匹配项。相反,我希望搜索整个列,并让R在每个单元格中找到匹配项,而不是在每行中搜索。

sauutmhj

sauutmhj1#

你不需要lapply或者其他类似的东西。

data_frame$STATUS = with(data_frame,
  ifelse(SNPs %in% GAINED, "GAINED",
   ifelse(SNPs %in% LOST, "LOST", "NEUTRAL")
  )
)

请注意,其编写方式是首先检查GAINED条件,因此,如果GAINED和LOST中均存在该条件,则结果将为“GAINED”。

q3qa4bjr

q3qa4bjr2#

使用嵌套的ifelse应该可以工作,并且如果适当缩进的话也是相当容易理解的:

tbl$status <- ifelse(tbl$SNPs %in% tbl$GAINED, "GAINED",
                               ifelse(tbl$SNPs %in% tbl$LOST, "LOST", "NEUTRAL") )

> tbl
      SNPs    GAINED        LOST  STATUS  status
1 1_752566  1_949654  6_30022061 NEUTRAL NEUTRAL
2 1_776546 1_1045331  6_30314321 NEUTRAL NEUTRAL
3 1_832918  1_832918 13_95612033  GAINED  GAINED
4 1_914852 1_1247494    1_914852    LOST    LOST
t5zmwmid

t5zmwmid3#

使用case_when的Tidyverse方法

library(tidyverse)

df <-
  structure(
    list(
      SNPs = c("1_752566", "1_776546", "1_832918", "1_914852"),
      GAINED = c("1_949654", "1_1045331", "1_832918", "1_1247494"),
      LOST = c("6_30022061", "6_30314321", "13_95612033", "1_914852")
    ),
    row.names = c(NA,-4L),
    spec = structure(list(
      cols = list(
        SNPs = structure(list(), class = c("collector_character",
                                           "collector")),
        GAINED = structure(list(), class = c("collector_character",
                                             "collector")),
        LOST = structure(list(), class = c("collector_character",
                                           "collector"))
      ),
      default = structure(list(), class = c("collector_guess",
                                            "collector")),
      delim = ","
    ), class = "col_spec"),
    class = c("spec_tbl_df",
              "tbl_df", "tbl", "data.frame")
  )

df %>%
  mutate(STATUS = case_when(
    SNPs %in% GAINED ~ 'GAINED',
    SNPs %in% LOST ~ 'LOST',
    TRUE ~ 'NEUTRAL'
  ))
#> # A tibble: 4 × 4
#>   SNPs     GAINED    LOST        STATUS 
#>   <chr>    <chr>     <chr>       <chr>  
#> 1 1_752566 1_949654  6_30022061  NEUTRAL
#> 2 1_776546 1_1045331 6_30314321  NEUTRAL
#> 3 1_832918 1_832918  13_95612033 GAINED 
#> 4 1_914852 1_1247494 1_914852    LOST

创建于2022年12月1日,reprex v2.0.2

相关问题