在dplyr中使用group_by将两个字符列合并为一个

umuewwlo  于 2023-04-03  发布在  其他
关注(0)|答案(2)|浏览(161)

我有一个像下面这样的 Dataframe ,我想使用dplyrgroup_by函数,将GenderIncome合并为一列。

df1 <- tribble(
  ~Country, ~Gender, ~var1, ~var2, ~ var3, ~Income,
  "Bangladesh", "F", 2.5, 3, 1.5, "LM",
  "Bangladesh", "M", 4.5, 4.3, 2.7, "LM",
  "Laos", "F", 2.7, 3.2, 6.5, "LM", 
  "Laos", "M", 3.5, 5.1, 8.2, "LM", 
  "Ghana", "F", 8.5, 5, 7.5, "LM",
  "Ghana", "M", 4, 6.7, 1.3, "LM",
  "China", "F", 4.3, 6.1, 2.5, "UM",
  "China", "M", 6.2, 2.8, 6.8, "UM",
)

我可以选择使用group_by来连接两个数字clolumn,如下所示:

df1 %>% 
  group_by(Country, subgroup = var1 + var2) %>%
  summarise()

但我不能这样做的字符clolumn:

df1 %>% 
  group_by(Country, subgroup = Gender + Income) %>%
  summarise()

#Error: ! non-numeric argument to binary operator

我想分组后是什么样的东西如下

df2 <- tribble(
  ~Country, ~subgroup, 
  "Bangladesh", "F", 
  "Bangladesh", "M", 
  "Laos", "F",  
  "Laos", "M", 
  "Ghana", "F", 
  "Ghana", "M", 
  "China", "F", 
  "China", "M",
  "Bangladesh", "LM", 
  "Bangladesh", "LM", 
  "Laos", "LM",  
  "Laos", "LM", 
  "Ghana", "LM", 
  "Ghana", "LM", 
  "China", "UM", 
  "China", "UM",
)
a7qyws3x

a7qyws3x1#

你想要的输出并不是两列的总和,而是将其从“宽”转换为“长”。你可以使用mapplyc合并它们或使用tidyr::pivot_longer()(更流行)来获得你想要的输出:
碱基R:

mapply(c, 
       df1[c("Country", "Gender")],
       df1[c("Country", "Income")])

      Country      Gender
 [1,] "Bangladesh" "F"   
 [2,] "Bangladesh" "M"   
 [3,] "Laos"       "F"   
 [4,] "Laos"       "M"   
 [5,] "Ghana"      "F"   
 [6,] "Ghana"      "M"   
 [7,] "China"      "F"   
 [8,] "China"      "M"   
 [9,] "Bangladesh" "LM"  
[10,] "Bangladesh" "LM"  
[11,] "Laos"       "LM"  
[12,] "Laos"       "LM"  
[13,] "Ghana"      "LM"  
[14,] "Ghana"      "LM"  
[15,] "China"      "UM"  
[16,] "China"      "UM"

x1米3英寸/x1米4英寸

library(dplyr)
library(tidyr)
df1 %>%
  pivot_longer(df1, c(Gender, Income), values_to = "subgroup") %>%
  select(Country, subgroup)

产出(按国家分列)

Country    subgroup
   <chr>      <chr>   
 1 Bangladesh F       
 2 Bangladesh LM      
 3 Bangladesh M       
 4 Bangladesh LM      
 5 Laos       F       
 6 Laos       LM      
 7 Laos       M       
 8 Laos       LM      
 9 Ghana      F       
10 Ghana      LM      
11 Ghana      M       
12 Ghana      LM      
13 China      F       
14 China      UM      
15 China      M       
16 China      UM
bmp9r5qi

bmp9r5qi2#

如果行顺序无关紧要,可以使用reframe()

dplyr::reframe(df1,  subgroup = c(Gender, Income), .by = Country)
#> # A tibble: 16 × 2
#>    Country    subgroup
#>    <chr>      <chr>   
#>  1 Bangladesh F       
#>  2 Bangladesh M       
#>  3 Bangladesh LM      
#>  4 Bangladesh LM      
#>  5 Laos       F       
#>  6 Laos       M       
#>  7 Laos       LM      
#>  8 Laos       LM      
#>  9 Ghana      F       
#> 10 Ghana      M       
#> 11 Ghana      LM      
#> 12 Ghana      LM      
#> 13 China      F       
#> 14 China      M       
#> 15 China      UM      
#> 16 China      UM

相关问题