R语言 比较两个数据集,显示总行数,如果不同,则显示主题

cedebl8k  于 2022-12-30  发布在  其他
关注(0)|答案(2)|浏览(136)

我有两个数据集,其中在列edta_complete中0是不完整的,1是完整的。我尝试比较df和df1中的这些列。1)我需要比较两个数据集中完整edta的subject_ids的计数。2)如果一个数据集比另一个数据集具有更多完整条目,则显示不同的subject_ids。请参见下面的示例:
DF:

df <- structure (list(subject_id = c("191-5467", "191-6784", "191-3457", "191-0987", "191-1245", "191-2365"), edta_complete = c("1","0","1","1","1","0")), class = "data.frame", row.names = c (NA, -6L))

DF1:

df1 <- structure (list(subject_id = c("191-5467", "191-6784", "191-3457", "191-0987", "191-1245", "191-2365"), edta_complete = c("1","1","1","1","1","1")), class = "data.frame", row.names = c (NA, -6L))

edta_complete的计数= 1
一个二个一个一个
我需要一个代码,这将显示我在df1 191-6784191-2365不同于df。希望这是有意义的。

oyt4ldly

oyt4ldly1#

我们可以使用setdiff来查找在df1中找到而在df中没有找到的subject_id

setdiff(with(df1, subject_id[edta_complete == 1]), 
      with(df, subject_id[edta_complete == 1]))
[1] "191-6784" "191-2365"

或者使用anti_join

library(dplyr)
df1 %>% 
  filter(edta_complete == 1) %>% 
  anti_join(df %>%
      filter(edta_complete == 1), by = 'subject_id') %>% 
  pull(subject_id)
[1] "191-6784" "191-2365"
jv4diomz

jv4diomz2#

同样使用bind_cols()

library(dplyr)

bind_cols(df, df1) %>% 
  filter(edta_complete...2 != edta_complete...4) %>% 
  pull(subject_id...1)
[1] "191-6784" "191-2365"

相关问题