如何使用dplyr通过一个ID列透视和连接3个不同的表?

q3qa4bjr  于 2022-12-25  发布在  其他
关注(0)|答案(2)|浏览(138)

假设我想用dplyr连接这3个 Dataframe ,我该怎么做呢?我知道我应该使用一些数据透视连接的组合,但是我不知道怎么做才对。
我的目标是让df像这样:

mpg_deciles mean_mpg mean_price production coefficient
1           13.5     12990      Foreign    12990
2           16       10874      Domestic   10874.8571428572

这是数据

library(dplyr)

a <- tibble::tribble(
  ~mpg_deciles,        ~mean_mpg,
  1L,             13.5,
  2L,               16,
  3L,            17.75,
  4L,           18.625,
  5L, 19.7142857142857)

 b <- tibble::tribble(
    ~coeff_foreign, ~mpg_deciles,  ~mean_p_foreign,  ~foreign,
    12990,            2,            12990, "Foreign",
    -2147.49999999997,            3,          10842.5, "Foreign",
    -7180.99999999996,            4, 5809.00000000003, "Foreign",
    -6777.49999999999,            6,           6212.5, "Foreign",
    -6435.3333333333,            7, 6554.66666666669, "Foreign")
  
  
c <- tibble::tribble(
    ~coeff_domestic, ~mpg_deciles, ~mean_p_domestic,   ~foreign,
    10874.8571428572,           1L, 10874.8571428572, "Domestic",
    -3697.73214285716,           2L,         7177.125, "Domestic",
    -6031.19047619049,           3L, 4843.66666666666, "Domestic",
    -6365.35714285716,           4L,           4509.5, "Domestic",
    -4650.42857142859,           5L, 6224.42857142857, "Domestic")
41ik7eoe

41ik7eoe1#

我认为您需要对bc进行预处理,然后使用left_join

library(dplyr)

a %>% 
  left_join(
    bind_rows(
      b %>% 
        rename(coefficient = coeff_foreign, mean_price = mean_p_foreign, production = foreign),
      c %>%     
        rename(coefficient = coeff_domestic, mean_price = mean_p_domestic, production = foreign)
      ),
    by = "mpg_deciles"
  )

这将返回

# A tibble: 8 x 5
  mpg_deciles mean_mpg coefficient mean_price production
        <dbl>    <dbl>       <dbl>      <dbl> <chr>     
1           1     13.5      10875.     10875. Domestic  
2           2     16        12990      12990  Foreign   
3           2     16        -3698.      7177. Domestic  
4           3     17.8      -2147.     10842. Foreign   
5           3     17.8      -6031.      4844. Domestic  
6           4     18.6      -7181.      5809. Foreign   
7           4     18.6      -6365.      4510. Domestic  
8           5     19.7      -4650.      6224. Domestic

预处理会更改coeff_foreigncoeff_domestic(对于mean_p_相同)列转换为相同名称的列。如果现在两个 Dataframe 彼此附加,具有相同列名的所有值将进入各自的(相同)列。如果不进行此预处理,则将使用不同名称的列(例如coeff_foreigncoeff_domestic)不会在同一列中结束,但会创建两列(coeff_foreigncoeff_domestic)来存储这些值。在这种情况下,left_join不会获得所需的结果。

ifsvaxew

ifsvaxew2#

更新版本:感谢@Martin Gal的意见:
我们可以使用嵌套的left_join

library(dplyr)

left_join(a, b, by='mpg_deciles') %>%
  left_join(., c, by='mpg_deciles') %>% 
  select(-starts_with("foreign")) %>% 
  pivot_longer(-c("mpg_deciles", "mean_mpg"), names_pattern = "(coeff|mean_p)_(.*)", names_to = c(".value", "production"), values_drop_na = TRUE)
mpg_deciles mean_mpg production  coeff mean_p
        <dbl>    <dbl> <chr>       <dbl>  <dbl>
1           1     13.5 domestic   10875. 10875.
2           2     16   foreign    12990  12990 
3           2     16   domestic   -3698.  7177.
4           3     17.8 foreign    -2147. 10842.
5           3     17.8 domestic   -6031.  4844.
6           4     18.6 foreign    -7181.  5809.
7           4     18.6 domestic   -6365.  4510.
8           5     19.7 domestic   -4650.  6224.

相关问题