我有两个数据集,df和df2,这是两个非常大且杂乱的 Dataframe 的过度简化版本。
在最初的df中,我通过按腰带和体重分组为每个人创建了一个唯一的id。我希望每个人在df中拥有的相同id号被赋予df2中的相同人员。他们需要有相同的名字,并且应该按腰带和体重分组。注意,df2中有一些人不在df中。
简化的df如下所示
belt weight rank id name
1 purple open class 1 55 Tom Cruise
2 black rooster 2 79 Emma Watson
3 blue feather 3 63 John Doe
4 blue feather 4 63 John Doe
5 purple open class 5 55 Tom Cruise
6 brown heavy 6 3 James Bond
7 purple open class 7 55 Tom Cruise
8 purple heavy 8 61 Tom Cruise
9 black open class 9 70 Jane Doe
10 purple heavy 10 61 Tom Cruise
第二个数据框看起来像这样。一个人谁是在df2,但不是在df应该收到一个NA为他们的id。注意,id的必须由腰带和重量,因为有些人有不同的点取决于他们参加的重量分区
belt2 weight2 rank2 name points
1 purple open class 1 Tom Cruise 100
2 black rooster 2 Emma Watson 30
3 blue feather 3 John Doe 50
4 blue feather 4 John Doe 50
5 purple open class 5 Tom Cruise 100
6 brown heavy 6 James Bond 200
7 black rooster 7 Jon Snow 92
8 purple heavy 8 Tom Cruise 77
9 black open class 9 Jane Doe 88
10 purple heavy 10 Tom Cruise 77
这是我希望df2的样子:
belt2 weight2 rank2 id name points
1 purple open class 1 55 Tom Cruise 100
2 black rooster 2 79 Emma Watson 30
3 blue feather 3 63 John Doe 50
4 blue feather 4 63 John Doe 50
5 purple open class 5 55 Tom Cruise 100
6 brown heavy 6 3 James Bond 200
7 black rooster 7 NA Jon Snow 92
8 purple heavy 8 61 Tom Cruise 77
9 black open class 9 70 Jane Doe 88
10 purple heavy 10 61 Tom Cruise 77
基本上,我希望df2中的ID号与df中的ID号匹配。如果不匹配,请填写NA。
# df
belt <- c("purple", "black", "blue", "blue", "purple", "brown", "purple", "purple", "black", "purple")
weight <- c("open class", "rooster", "feather", "feather", "open class", "heavy", "open class", "heavy", "open class", "heavy")
rank <- 1:10
id <- c(55, 79, 63, 63, 55, 3, 55, 61, 70, 61)
names <- c("Tom Cruise", "Emma Watson", "John Doe", "John Doe", "Tom Cruise", "James Bond", "Tom Cruise", "Tom Cruise", "Jane Doe", "Tom Cruise")
(df <- data.frame(belt, weight, rank, id, name = names))
#df2
belt2 <- c("purple", "black", "blue", "blue", "purple", "brown", "black", "purple", "black", "purple")
weight2 <- c("open class", "rooster", "feather", "feather", "open class", "heavy", "rooster", "heavy", "open class", "heavy")
rank2 <- 1:10
names2 <- c("Tom Cruise", "Emma Watson", "John Doe", "John Doe", "Tom Cruise", "James Bond", "Jon Snow", "Tom Cruise", "Jane Doe", "Tom Cruise")
points <- c(100, 30, 50, 50, 100, 200, 92, 77, 88, 77)
(df2 <- data.frame(belt2, weight2, rank2, name = names2, points))
2条答案
按热度按时间roejwanj1#
这可以通过右连接并删除它后面的重复项来解决。我将使用基本函数
merge
。创建于2023年2月8日,使用reprex v2.0.2
dplyr
右联接为创建于2023年2月8日,使用reprex v2.0.2
0sgqnhkj2#
您可以通过在两个 Dataframe 之间使用
left join
来完成此任务。