使用dplyr嵌套自定义函数

qvsjd97n  于 2023-03-27  发布在  其他
关注(0)|答案(2)|浏览(124)

我正在尝试构建嵌套的自定义函数来获取交叉表的输出。许多交叉表函数只允许在表内进行计算,而我需要将表中的每个值除以来自不同逻辑的值。但是因为我需要很多交叉表,所以我想构建一个函数,它将交叉表和每个值除以外部数据结合起来。
以下是初始数据:

Key <- c("Yellow", "Yellow", "Red", "Blue", "Yellow", "Red", "Blue", "Yellow", "Yellow", "Red", "Blue", "Yellow", "Red", "Blue", "Blue")
Speed <- c("Slow", "Fast", "Mid", "Fast", "Mid", "Slow", "Slow", "Mid", "Slow", "Fast", "Mid", "Fast", "Mid", "Fast", "Mid")
ID <- factor(c(1, 1, 1, 2, 2, 3, 3, 4, 5, 6, 6, 8, 9, 9, 10))
init_dt <- data.frame(ID, Key, Speed)

和第一个函数的交叉表与简单的频率:

crosstab <- function(dt, col_y, col_x){
  col_y = sym(col_y)
  col_x = sym(col_x)
  dt %>% group_by(!!col_y, !!col_x) %>% 
    tally() %>%
    spread(!!col_x, n)
}

当我将这个函数应用于初始数据时,我得到以下结果:

crosstab_key <- crosstab(init_dt, "Key", 'Speed')
crosstab_key

# A tibble: 3 × 4
# Groups:   Key [3]
  Key     Fast   Mid  Slow
  <chr>  <int> <int> <int>
1 Blue       2     2     1
2 Red        1     2     1
3 Yellow     2     2     2

之后,我使用相同的初始数据但单独的函数计算除法的值:

div <- function(dt, id, col_y, col_x) {
      col_y = sym(col_y)
      id = sym(id)
      col_x = sym(col_x)
      dt%>% select(id, col_y, col_x) %>%
        group_by(!!col_x) %>% 
        distinct(!!id) %>%
        summarise(count = n()) 
    }

当应用此函数时,我得到以下结果:

div_speed <- div(init_dt, "ID", "Key", "Speed")
div_speed

# A tibble: 3 × 2
  Speed count
  <chr> <int>
1 Fast      5
2 Mid       6
3 Slow      4

我想将交叉表crosstab_key中的每个值除以div_speed中的值。我可以这样合并函数:

crosstab(init_dt, "Key", 'Speed') %>% rowwise() %>% 
   mutate(across(everything())/div(init_dt, "ID", "Key", "Speed")$count)

# A tibble: 3 × 4
# Rowwise:  Key
  Key     Fast   Mid  Slow
  <chr>  <dbl> <dbl> <dbl>
1 Blue     0.4 0.333  0.25
2 Red      0.2 0.333  0.25
3 Yellow   0.4 0.333  0.5

但是我想避免单独使用div函数,所以我的问题是如何将这两个函数组合在一起(正如您所看到的,参数是相同的:dt, id, col_y, col_x,因为它们来自相同的初始 Dataframe ),这样我就不需要在每次需要将交叉表除以某些值时创建div_对象。最重要的是,我还需要将每个值四舍五入到四位小数mutate_at(2:4, funs(round(., 4)))

vh0rcniy

vh0rcniy1#

关于Tidyverse

您可以将两个功能简化为一个。

library(dplyr) # you need version >= 1.1 !!
library(tidyr)

Key <- c("Yellow", "Yellow", "Red", "Blue", "Yellow", "Red", "Blue", "Yellow", "Yellow", "Red", "Blue", "Yellow", "Red", "Blue", "Blue")
Speed <- c("Slow", "Fast", "Mid", "Fast", "Mid", "Slow", "Slow", "Mid", "Slow", "Fast", "Mid", "Fast", "Mid", "Fast", "Mid")
ID <- factor(c(seq(1:15)))
init_dt <- data.frame(ID, Key, Speed)

# set the order you want in advance!
init_dt$Key   <- factor(init_dt$Key, c("Blue", "Red", "Yellow"), ordered = TRUE)
init_dt$Speed <- factor(init_dt$Speed, c("Fast", "Mid", "Slow"), ordered = TRUE)

crosstab <- function(dt, id, col_y, col_x){
  
  dt |> 
    mutate(count = n_distinct(!!sym(id)), .by = all_of(col_x)) |> 
    pivot_wider(id_cols = all_of(col_y),
                names_from = all_of(col_x),
                names_sort = TRUE,
                values_from = count,
                values_fn = ~ sum(1/.)) |> 
    arrange(!!sym(col_y))
  
}

crosstab(init_dt, "ID", "Key", "Speed")
#> # A tibble: 3 × 4
#>   Key     Fast   Mid  Slow
#>   <ord>  <dbl> <dbl> <dbl>
#> 1 Blue     0.4 0.333  0.25
#> 2 Red      0.2 0.333  0.25
#> 3 Yellow   0.4 0.333  0.5

创建于2023-03-23带有reprex v2.0.2
按照您的步骤,我首先计算了每个col_x对应的id s的不同个数。

mutate(count = n_distinct(!!sym(id)), .by = all_of(col_x)) |>

然后,你要计算每col_xcol_yid的数量,并将其除以我们之前计算的数量。
我们可以分两步来做:

summarise(subcount = sum(1 / count), .by = all_of(c(col_y, col_x))) |> 
    pivot_wider(id_cols = all_of(col_y), 
                names_from = all_of(col_x), 
                values_from = subcount)

但是pivot_wider中的values_fn参数允许我们对扩展列进行操作。因此我们可以一次完成。

带table

看起来你可以用tableprop.table得到相同的结果:

Key <- c("Yellow", "Yellow", "Red", "Blue", "Yellow", "Red", "Blue", "Yellow", "Yellow", "Red", "Blue", "Yellow", "Red", "Blue", "Blue")
Speed <- c("Slow", "Fast", "Mid", "Fast", "Mid", "Slow", "Slow", "Mid", "Slow", "Fast", "Mid", "Fast", "Mid", "Fast", "Mid")
ID <- factor(c(seq(1:15)))
init_dt <- data.frame(ID, Key, Speed)

# to keep the right order
init_dt$Key   <- factor(init_dt$Key, c("Yellow", "Red", "Blue"), ordered = TRUE)
init_dt$Speed <- factor(init_dt$Speed, c("Slow", "Mid", "Fast"), ordered = TRUE)

crosstab <- function(dt, id, col_y, col_x){
  
  table(dt[[col_y]], dt[[col_x]]) |> 
    prop.table(margin = 2) |> 
    as.data.frame.matrix() |> 
    tibble::rownames_to_column(col_y)
  
}

crosstab(init_dt, "ID", "Key", "Speed")
#>      Key Slow       Mid Fast
#> 1 Yellow 0.50 0.3333333  0.4
#> 2    Red 0.25 0.3333333  0.2
#> 3   Blue 0.25 0.3333333  0.4

创建于2023-03-23带有reprex v2.0.2

将4位数强制写入kable

crosstab(init_dt, "ID", "Key", "Speed") |> 
  knitr::kable(format = "pipe", digits = 4, format.args = list(nsmall = 4))

| Key    |   Slow |    Mid |   Fast |
|:-------|-------:|-------:|-------:|
| Yellow | 0.5000 | 0.3333 | 0.4000 |
| Red    | 0.2500 | 0.3333 | 0.2000 |
| Blue   | 0.2500 | 0.3333 | 0.4000 |
ncecgwcz

ncecgwcz2#

这里有一种方法可以将它们组合起来,使交叉制表符的功能。我注意到你正在使用spread(),但pivot_wider()是替代功能,如果你有访问权限,应该使用它。

crosstab <- function(dt, col_y, col_x){
  col_y = sym(col_y)
  col_x = sym(col_x)
  
  dt %>% 
    count(!!col_y, !!col_x) %>% 
    ungroup() %>% 
    group_by(!!col_x) %>% 
    mutate(pct = round(n/sum(n),4)) %>% 
    ungroup() %>% 
    select(-n) %>% 
    pivot_wider( names_from = !!col_x,  values_from =  pct)
}

crosstab_key <- crosstab(init_dt, 'Key', 'Speed')

相关问题