我想组建一个团体。应始终考虑两列(ID、ADDRESS),然后考虑NAME_1或NAME_2来形成组(基于是否存在相同的条目)。下面是一个简单的例子:
library(dplyr)
# Sample input data frame
df <- data.frame(
ID = c("1234", "1234", "1234", "1234", "1234", "1234", "2345"),
ADDRESS = c("Weg 1", "Weg 1", "Weg 1", "Weg 1", "Weg 1", "Weg 1", "Weg 1234"),
NAME_1 = c("Müller", "Peter", "Hirn", "Hirn", "Test", "Test", "Müller"),
NAME_2 = c("Meier", "Meier", "Mensch", "Maler", "Hallo", "Velo", "Meier")
)
# Create a grouping variable based on conditions
df_grouped <- df %>%
group_by(ID, ADDRESS, EITHER(NAME_1, NAME_2)) %>%
mutate(GRP = ...)
# Desired output
df <- data.frame(
ID = c("1234", "1234", "1234", "1234", "1234", "1234", "2345"),
ADDRESS = c("Weg 1", "Weg 1", "Weg 1", "Weg 1", "Weg 1", "Weg 1", "Weg 1234"),
NAME_1 = c("Müller", "Peter", "Hirn", "Hirn", "Test", "Test", "Müller"),
NAME_2 = c("Meier", "Meier", "Mensch", "Maler", "Hallo", "Velo", "Meier"),
GRP = c(1, 1, 2, 2, 3, 3, 4)
)
)
有办法解决吗?
我尝试了一些嵌套分组,但还没有找到解决方案。
2条答案
按热度按时间9rbhqvlz1#
如果你允许
NAME_1
和NAME_2
中的组重叠,你可以构造一个无向图并提取它的组件:创建于2023-10-07附带reprex v2.0.2
在
NAME_1
和NAME_2
不重叠的简单情况下,您可以为每个观察选择较大的组:创建于2023-10-07带有reprex v2.0.2
brgchamk2#
另一个
tidyverse
选项可能是: