如何通过匹配R中的ID来合并具有不同行数的多个 Dataframe

1zmg4dgp 于 2023-04-18 发布在其他

关注(0)|答案(1)|浏览(183)

我确信这个答案存在于某个地方，但我一直试图让这个代码工作，我似乎不能为我的目的。我有7个不同的 Dataframe ，每个 Dataframe 包含一个ID，年龄和种族列。每个 Dataframe 来自不同的时间间隔，所以如果一个受访者只在时间2和5提供响应，它们只会在数据集2和5中出现一行。例如，dataframe 1中的变量是如何显示的，如下所示：

id <- c(1,2,3,4,5)
race <- c("Black", "White", "Asian", "White", "Black")
age <- c(26,24,33,45,65)
one_T1 <- c(1,0,1,1,1)
two_T1 <- c(1,0,1,1,0)
three_T1 <- c(0,0,0,1,1)
df1 <- data.frame(id,race,age,one_T1,two_T1,three_T1)

  id  race age one_T1 two_T1 three_T1
1  1 Black  26      1      1        0
2  2 White  24      0      0        0
3  3 Asian  33      1      1        0
4  4 White  45      1      1        1
5  5 Black  65      1      0        1

因此，所有感兴趣的响应都以二进制编码，对于每个 Dataframe ，每个变量都有一个后缀，列出了它们来自哪个时间帧（但这显然可以改变）。我的目标是尝试获得一个 Dataframe ，其中所有ID都出现，即使它们没有跨所有时间段的数据。因此，如果它们在某个时间段没有数据，对于那些没有响应的特定变量，他们只会有“NA”。此外，年龄和种族应该保持不变，所以我也不想在合并的数据集中重复这些。所以，如果我将上面的df 1与这个数据框组合：

id <- c(1,2,4,5,6)
race <- c("Black", "White", "White", "Black", "Indigenous")
age <- c(26,24,45,65,21)
one_T2 <- c(1,0,1,1,1)
two_T2 <- c(1,0,1,1,0)
three_T2 <- c(0,0,0,1,1)
df2 <- data.frame(id,race,age,one_T2,two_T2,three_T2)

  id       race age one_T2 two_T2 three_T2
1  1      Black  26      1      1        0
2  2      White  24      0      0        0
3  4      White  45      1      1        0
4  5      Black  65      1      1        1
5  6 Indigenous  21      1      0        1

我希望输出看起来像这样：

id       race age one_T1 two_T1 three_T1 one_T2 two_T2 three_T2
1  1      Black  26      1      1        0      1      1        0
2  2      White  24      0      0        0      0      0        0
3  3      Asian  33      1      1        0     NA     NA       NA
4  4      White  45      1      1        1      1      1        0
5  5      Black  65      1      0        1      1      1        1
6  6 Indigenous  21     NA     NA       NA      1      0        1

我希望这是有意义的，非常感谢提前！

r

来源：https://stackoverflow.com/questions/75998618/how-to-merge-multiple-dataframes-with-a-different-number-of-rows-by-matching-the

1条答案

按热度按时间

5t7ly7z51#

dplyr::full_join(df1, df2, c('id', 'race', 'age'))

  id       race age one_T1 two_T1 three_T1 one_T2 two_T2 three_T2
1  1      Black  26      1      1        0      1      1        0
2  2      White  24      0      0        0      0      0        0
3  3      Asian  33      1      1        0     NA     NA       NA
4  4      White  45      1      1        1      1      1        0
5  5      Black  65      1      0        1      1      1        1
6  6 Indigenous  21     NA     NA       NA      1      0        1

在碱R中：

merge(df1, df2, c('id', 'race', 'age'), all=TRUE)
  id       race age one_T1 two_T1 three_T1 one_T2 two_T2 three_T2
1  1      Black  26      1      1        0      1      1        0
2  2      White  24      0      0        0      0      0        0
3  3      Asian  33      1      1        0     NA     NA       NA
4  4      White  45      1      1        1      1      1        0
5  5      Black  65      1      0        1      1      1        1
6  6 Indigenous  21     NA     NA       NA      1      0        1

赞(0）回复(0）举报 2023-04-18

我来回答

如何通过匹配R中的ID来合并具有不同行数的多个 Dataframe

1条答案

相关问题

热门标签

最新问答