如何在dplyr中连接除指定列之外的所有列?

ttcibm8c  于 2023-04-03  发布在  其他
关注(0)|答案(1)|浏览(91)

我有两个共享所有列的数据集,我想基于除其中两列之外的所有列进行反连接。
例如,我想做如下事情:

library(dplyr)
df1 <- tibble(x = c("A", "B", "C"), y = c("X", "Y", "Z"), z = c(1, 2, 3),
              a = c(4, 5, 6))

df2 <- tibble(x = c("A", "D", "E"), y = c("X", "W", "R"), z = c(1, 5, 6),
              a = c(4, 7, 8))

df2 %>% anti_join(df1, join_by(-c(z, a)))
#> Error in `join_by()`:
#> ! Expressions must use one of: `==`, `>=`, `>`, `<=`, `<`, `closest()`,
#>   `between()`, `overlaps()`, or `within()`.
#> ℹ Expression 1 is `-c(z, a)`.

#> Backtrace:
#>      ▆
#>   1. ├─df2 %>% anti_join(df1, join_by(-c(z, a)))
#>   2. ├─dplyr::anti_join(., df1, join_by(-c(z, a)))
#>   3. ├─dplyr:::anti_join.data.frame(., df1, join_by(-c(z, a)))
#>   4. │ └─dplyr:::join_filter(...)
#>   5. │   └─dplyr:::is_cross_by(by)
#>   6. │     └─rlang::is_character(x, n = 0L)
#>   7. └─dplyr::join_by(-c(z, a))
#>   8.   └─dplyr:::parse_join_by_expr(exprs[[i]], i, error_call = error_call)
#>   9.     └─dplyr:::stop_invalid_top_expression(expr, i, error_call)
#>  10.       └─rlang::abort(message, call = call)

创建于2023-03-27带有reprex v2.0.2
那么,在join中是否有tidy-select变量的选项呢?或者,特别地,除了一些变量之外,调用所有变量。

pqwbnv8z

pqwbnv8z1#

select()df2中删除不需要的列,而不是尝试在join_by()中指定:

library(dplyr)

df2 %>%
  anti_join(select(df1, -c(z, a)))

# Joining with `by = join_by(x, y)`
# # A tibble: 2 × 4
#   x     y         z     a
#   <chr> <chr> <dbl> <dbl>
# 1 D     W         5     7
# 2 E     R         6     8

对于标准连接,如果要丢弃df2$z$a,请执行相同的操作。否则,使用rename_with()附加后缀:

df2 %>%
  full_join(
    rename_with(df1, \(x) paste0(x, ".df1"), c(z, a))
  )

# Joining with `by = join_by(x, y)`
# # A tibble: 5 × 6
#   x     y         z     a z.df1 a.df1
#   <chr> <chr> <dbl> <dbl> <dbl> <dbl>
# 1 A     X         1     4     1     4
# 2 D     W         5     7    NA    NA
# 3 E     R         6     8    NA    NA
# 4 B     Y        NA    NA     2     5
# 5 C     Z        NA    NA     3     6

相关问题