如何在两个 Dataframe 之间的两列中查找重复值,并删除R中的非重复值?

bihw5rsg  于 2023-02-06  发布在  其他
关注(0)|答案(3)|浏览(179)

假设我有两个 Dataframe ,如下所示

df1 <- data.frame(ID = c("A","B","F","G","B","B","A","G","G","F","A","A","A","B","F"),
                 code = c(1,2,2,3,3,1,2,2,1,1,3,2,2,1,1),
                 class =  c(2,4,5,5,2,3,2,5,1,2,4,5,3,2,1))

df2 <- data.frame(ID = c("G","F","C","F","B","A","F","C","A","B","A","B","C","A","G"),
                 code = c(1,2,2,3,3,1,2,2,1,1,3,2,2,1,1),
                 class =  c(2,4,5,5,2,3,2,5,1,2,4,5,3,2,1))

我想检查df1$IDdf2$ID中的重复项,如果df1中不存在ID,则删除df2中的所有行,这样新的 Dataframe 将如下所示:

df3 <- data.frame(ID = c("G","F","F","B","A","F","A","B","A","B","A","G"),
                 code = c(1,2,3,3,1,2,1,1,3,2,1,1),
                 class =  c(2,4,5,2,3,2,1,2,4,5,2,1))
von4xj4u

von4xj4u1#

使用%in%

df2[df2$ID %in% df1$ID, ]

   ID code class
1   G    1     2
2   F    2     4
4   F    3     5
5   B    3     2
6   A    1     3
7   F    2     2
9   A    1     1
10  B    1     2
11  A    3     4
12  B    2     5
14  A    1     2
15  G    1     1
vwhgwdsa

vwhgwdsa2#

您可以使用'intersect'函数来解决这个问题。

common_ids <- intersect(df1$ID, df2$ID)
df3 <- df2[df2$ID %in% common_ids, ]

ID code class
1   G    1     2
2   F    2     4
4   F    3     5
5   B    3     2
6   A    1     3
7   F    2     2
9   A    1     1
10  B    1     2
11  A    3     4
12  B    2     5
14  A    1     2
15  G    1     1
ldxq2e6h

ldxq2e6h3#

我想把semi_join加进去。

library(tidyverse)
df_test <- df2 |> semi_join(df1, by = "ID")
all.equal(df3, df_test)
#> [1] TRUE

相关问题