如果R中的列中不存在两个条件,则删除数据行

kfgdxczn  于 2023-01-28  发布在  其他
关注(0)|答案(2)|浏览(122)

我有一个52K行的 Dataframe 。我想删除所有在group列中既没有光明又没有健康的基因。我想过滤掉这些。我真的不知道如何快速完成。我想tidyverse或dplyr可能会有用。

data
         gene      id   group           snp ref total ref_condition
11080    ZZZ3 Healthy Healthy chr1:77664558   1     5       Healthy
22772    ZZZ3 Healthy Healthy chr1:77557488   2     5       Healthy
1632    ZZEF1 Healthy Healthy chr17:4086375   4     7       Healthy
13357   ZZEF1 Healthy Healthy chr17:4033235   7     9       Healthy
15312  ZYG11B Healthy Healthy chr1:52769202   1     2       Healthy
145341 ZYG11B   Light   Light chr1:52779185   1     4       Healthy

Wanted output
             gene      id   group           snp ref total ref_condition
    15312  ZYG11B Healthy Healthy chr1:52769202   1     2       Healthy
    145341 ZYG11B   Light   Light chr1:52779185   1     4       Healthy
ql3eal8s

ql3eal8s1#

您可以对每个group_by使用两个any,如下所示:

library(dplyr)
data %>%
  group_by(gene) %>%
  filter(any(group == "Healthy") & any(group == "Light"))
#> # A tibble: 2 × 7
#> # Groups:   gene [1]
#>   gene   id      group   snp             ref total ref_condition
#>   <chr>  <chr>   <chr>   <chr>         <int> <int> <chr>        
#> 1 ZYG11B Healthy Healthy chr1:52769202     1     2 Healthy      
#> 2 ZYG11B Light   Light   chr1:52779185     1     4 Healthy

创建于2023年1月23日,使用reprex v2.0.2

tjjdgumg

tjjdgumg2#

简而言之:

data%>%
  group_by(gene)%>%
  filter(sum(group=="Light")>=1 & sum(group=="Healthy")>=1)%>%
  ungroup

  gene   id      group   snp             ref total ref_condition
  <fct>  <fct>   <fct>   <fct>         <int> <int> <fct>        
1 ZYG11B Healthy Healthy chr1:52769202     1     2 Healthy      
2 ZYG11B Light   Light   chr1:52779185     1     4 Healthy

原答复:如果n_light>=1 & n_healthy>=1,我们可以计算光和健康的数量并过滤行

library(dplyr)
data%>%
  group_by(gene)%>%
  mutate(n_light=sum(group=="Light"),
         n_healthy=sum(group=="Healthy"))%>%
  filter(n_light>=1 & n_healthy>=1)%>%
  ungroup

  gene   id      group   snp             ref total ref_condition n_light n_healthy
  <fct>  <fct>   <fct>   <fct>         <int> <int> <fct>           <int>     <int>
1 ZYG11B Healthy Healthy chr1:52769202     1     2 Healthy             1         1
2 ZYG11B Light   Light   chr1:52779185     1     4 Healthy             1         1

并通过%>%select(-n_light,n_healthy), if needed移除辅助列n_light,n_healthy

相关问题