如何使用拟引号在R中总结group_by中的值？

wnavrhmk 于 2023-05-04 发布在其他

关注(0)|答案(3)|浏览(148)

假设我有一个数据表

df_demo <- data.table(
    year = c(2021, 2022, 2022, 2023),
    x = c(1, 2, 3, 4),
    y = c(5, 6, 7, 8))

我想按“年”和平均值每其他列的任何数量和名称的列组。
下面是如何使用“固定”数量的列名来实现的

df_demo %>% 
    group_by(year) %>% 
    summarise(
      x = mean(x),
      y = mean(y)) %>% 
    ungroup()

下面是我想如何使用R Quasiquotation

# List of columns
  list_columns <- c("x", "y")
  
  # Replacement
  for(column_i in list_columns) {        

    column_i <- sym(column_i)

    df_demo %>% 
      group_by(year) %>% 
      summarise(
        !!column_i = mean(!!column_i)) %>% 
      ungroup()
  }

但这不适用于左侧变量“！！column_i”，因为我希望保持列的名称不变。
任何一点帮助都是受欢迎的！
谢谢！

来源：https://stackoverflow.com/questions/76147000/how-summarise-values-in-group-by-in-r-by-using-quasiquotation

3条答案

按热度按时间

gorkyyrv1#

如果你在赋值的左手使用quasiquotation，你需要使用walrus运算符:=：

for(column_i in list_columns) {        
  
  column_i <- sym(column_i)
  
  df_demo %>% 
    group_by(year) %>% 
    summarise(
      !!column_i := mean(!!column_i)) %>% 
    ungroup() %>%
    print()
}
#> # A tibble: 3 x 2
#>    year     x
#>   <dbl> <dbl>
#> 1  2021   1  
#> 2  2022   2.5
#> 3  2023   4  
#> # A tibble: 3 x 2
#>    year     y
#>   <dbl> <dbl>
#> 1  2021   5  
#> 2  2022   6.5
#> 3  2023   8

赞(0）回复(0）举报 2023-05-04

dfty9e192#

在我看来，这个问题有几个解决方法。我们的想法是在summarise中使用变量列名，挑战是我们如何管理函数中的命名，通常定义为summarise(name_of_col=mean(column_name))。假设我们有与您定义的相同的数据集：

library(data.table)
library(dplyr)
df_demo <- data.table(
  year = c(2021, 2022, 2022, 2023),
  x = c(1, 2, 3, 4),
  y = c(5, 6, 7, 8))

解决这个问题的想法很容易。将变量列指定为固定名称的summarise（本例为m），然后使用rename()函数根据列名对其进行重命名。另外，在summarise中有两个定义变量的选项，如下所示：

1.使用`!!(column_i)`

# List of columns
list_columns <- c("x", "y")
# define an empty list
df_mean <- list()
# Loop through the columns using !!(column_i)
for (column_i in list_columns) {
  column_i <- sym(column_i)
  df_mean[[column_i]] <- df_demo %>%
    group_by(year) %>% 
    summarise(m=mean(!!enquo(column_i))) %>% 
    rename(!!(column_i) := m)  
}
df_mean
$x
# A tibble: 3 × 2
   year     x
  <dbl> <dbl>
1  2021   1  
2  2022   2.5
3  2023   4  

$y
# A tibble: 3 × 2
   year     y
  <dbl> <dbl>
1  2021   5  
2  2022   6.5
3  2023   8

2.使用`!!enquo(column_i)`

# List of columns
list_columns <- c("x", "y")
# define an empty list
df_mean <- list()
# Loop through the columns using !!enquo(column_i)
for (column_i in list_columns) {
  column_i <- sym(column_i)
  df_mean[[column_i]] <- df_demo %>%
    group_by(year) %>% 
    summarise(m=mean(!!enquo(column_i))) %>% 
    rename(!!enquo(column_i) := m)  
}
df_mean
$x
# A tibble: 3 × 2
   year     x
  <dbl> <dbl>
1  2021   1  
2  2022   2.5
3  2023   4  

$y
# A tibble: 3 × 2
   year     y
  <dbl> <dbl>
1  2021   5  
2  2022   6.5
3  2023   8

或者你可以简单地使用left_join函数绑定结果，如下所示（这里，year列是不同输出之间的公共列）：

# define an empty vector
df_bind <- tibble(year=unique(df_demo$year))
# Loop through the columns
for (column_i in list_columns) {
  column_i <- sym(column_i)
  df_mean <- df_demo %>%
    group_by(year) %>% 
    summarise(m=mean(!!enquo(column_i))) %>% 
    rename(!!(column_i) := m)  
  # bind the results
  df_bind <- left_join(df_bind,df_mean,by='year')
}
df_bind
# A tibble: 3 × 3
   year     x     y
  <dbl> <dbl> <dbl>
1  2021   1     5  
2  2022   2.5   6.5
3  2023   4     8

我希望它有帮助。

赞(0）回复(0）举报 2023-05-04

k75qkfdt3#

如果您使用列名作为字符串，当前推荐的方法是使用.data代词。如果您想动态命名列，则需要使用:=操作符和glue包中的语法。所以你会希望

for(column_i in list_columns) {        
  
  df_demo %>% 
    group_by(year) %>% 
    summarise(
      "{column_i}" := mean(.data[[column_i]])) %>% 
    ungroup()
}

请务必查看讨论这些主题的programming with dplyr guide。

赞(0）回复(0）举报 2023-05-04

我来回答

如何使用拟引号在R中总结group_by中的值？

3条答案

1.使用`!!(column_i)`

2.使用`!!enquo(column_i)`

相关问题

热门标签

最新问答

如何使用拟引号在R中总结group_by中的值？

3条答案

1.使用!!(column_i)

2.使用!!enquo(column_i)

相关问题

热门标签

最新问答

1.使用`!!(column_i)`

2.使用`!!enquo(column_i)`