pivot_longger用于创建多个变量

6ojccjat  于 2023-03-15  发布在  其他
关注(0)|答案(3)|浏览(115)

我正在尝试使用pivot_longger()来创建两个变量而不是一个。我的两个变量名应该是“stage”和“stock”以及它们对应的值。“stage”列应该包括rec,tadult和teggs。“stock”列应该有recStk,teggStk和tadultStk。我在下面的一个虚拟例子中尝试了names_pattern,但是我显然不知道如何使用模式。有人能帮助我吗?先谢了。

pivot_test <- structure(list(year = 1991:1996, rec = c(1635970, 1269210, 927557, 
1312280, 1387800, 1772490), teggStk = c(419648000, 411314000, 
445784000, 434684000, 364483000, 686490000), tadultStk = c(84009.2, 
58228.6, 65226.2, 44371.9, 65555.1, 79610.6), tadult = c(84009.2, 
58228.6, 65226.2, 44371.9, 65555.1, 79610.6), teggs = c(419648000, 
411314000, 445784000, 434684000, 364483000, 686490000), recStk = c(1635970, 
1269210, 927557, 1312280, 1387800, 1772490)), row.names = c(NA, 
6L), class = "data.frame")

  year     rec   teggStk tadultStk  tadult     teggs  recStk
1 1991 1635970 419648000   84009.2 84009.2 419648000 1635970
2 1992 1269210 411314000   58228.6 58228.6 411314000 1269210
3 1993  927557 445784000   65226.2 65226.2 445784000  927557
4 1994 1312280 434684000   44371.9 44371.9 434684000 1312280
5 1995 1387800 364483000   65555.1 65555.1 364483000 1387800
6 1996 1772490 686490000   79610.6 79610.6 686490000 1772490

annual21atest <- pivot_test %>% pivot_longer(-c("year"),
                                names_to=c("stage","stock"),
                                names_pattern = "([A-Za-z]+)([A-Za-z]+)",
                                values_to="quantity") %>% data.frame()
                                
  The final dataframe should have 4 columns:
  year stage   stock      quantity
  1991 rec     recStk     00000
  1992 tadult  tadultStk  00000
  1993 teggs   teggsStk   0000
  ...............................
chhqkbe1

chhqkbe11#

这里有一个可能的解决方案,这个问题有点奇怪,因为你得到了stage和stock的重复值,而pivot_longer不知道如何自动折叠,相反,我们从一开始就删除重复值,然后它是一个单独的调用:

pivot_test %>%
  select(year, ends_with("Stk")) %>%
  pivot_longer(-year, names_to = "stock", values_to = "quantity") %>%
  mutate(stage=str_remove(stock, "Stk")) %>%
  select(year, stage, stock, quantity)

其返回

# A tibble: 18 × 4
    year stage  stock       quantity
   <int> <chr>  <chr>          <dbl>
 1  1991 tegg   teggStk   419648000 
 2  1991 tadult tadultStk     84009.
 3  1991 rec    recStk      1635970 
 4  1992 tegg   teggStk   411314000 
 5  1992 tadult tadultStk     58229.
 6  1992 rec    recStk      1269210 
 7  1993 tegg   teggStk   445784000 
 8  1993 tadult tadultStk     65226.
 9  1993 rec    recStk       927557 
10  1994 tegg   teggStk   434684000 
11  1994 tadult tadultStk     44372.
12  1994 rec    recStk      1312280 
13  1995 tegg   teggStk   364483000 
14  1995 tadult tadultStk     65555.
15  1995 rec    recStk      1387800 
16  1996 tegg   teggStk   686490000 
17  1996 tadult tadultStk     79611.
18  1996 rec    recStk      1772490
uplii1fm

uplii1fm2#

主要的挑战是获得正确的名称模式:

library(tidyr)
library(dplyr)
  
pivot_test %>% 
    pivot_longer(cols = -year, 
                 names_to = c("stage", "stock"), 
                 names_pattern = "(.*)(Stk)", 
                 values_to = "quantity") %>% 
  na.omit() %>% 
  mutate(stock = paste0(stage, stock))
year stage  stock       quantity
   <int> <chr>  <chr>          <dbl>
 1  1991 tegg   teggStk   419648000 
 2  1991 tadult tadultStk     84009.
 3  1991 rec    recStk      1635970 
 4  1992 tegg   teggStk   411314000 
 5  1992 tadult tadultStk     58229.
 6  1992 rec    recStk      1269210 
 7  1993 tegg   teggStk   445784000 
 8  1993 tadult tadultStk     65226.
 9  1993 rec    recStk       927557 
10  1994 tegg   teggStk   434684000 
11  1994 tadult tadultStk     44372.
12  1994 rec    recStk      1312280 
13  1995 tegg   teggStk   364483000 
14  1995 tadult tadultStk     65555.
15  1995 rec    recStk      1387800 
16  1996 tegg   teggStk   686490000 
17  1996 tadult tadultStk     79611.
18  1996 rec    recStk      1772490
am46iovg

am46iovg3#

为什么不直接将对tidyr::pivot_longer()的“stock”调用转换为对tidyr::pivot_longer()的“stage”调用,反之亦然呢?

pivot_test <- structure(list(
      year = 1991:1996,
      rec = c(1635970, 1269210, 927557, 1312280, 1387800, 1772490),
      teggStk = c(419648000, 411314000, 445784000, 434684000, 364483000, 686490000),
      tadultStk = c(84009.2, 58228.6, 65226.2, 44371.9, 65555.1, 79610.6),
      tadult = c(84009.2, 58228.6, 65226.2, 44371.9, 65555.1, 79610.6),
      teggs = c(419648000, 411314000, 445784000, 434684000, 364483000, 686490000),
      recStk = c(1635970, 1269210, 927557, 1312280, 1387800, 1772490)),
      row.names = c(NA, 6L), class = 'data.frame')
    
    annual21atest <- 
      tidyr::pivot_longer(pivot_test,
                          -c('year', 'recStk', 'teggStk', 'tadultStk'),
                          names_to= 'stage',
                          values_to = 'quantity')|>
      tidyr::pivot_longer(-c('year', 'stage'),
                          names_to= 'stock',
                          values_to = 'quantity')|>
      as.data.frame()

结果就是您所要求的:

year  stage     stock    quantity
    1  1991    rec   teggStk 419648000.0
    2  1991    rec tadultStk     84009.2
    3  1991    rec    recStk   1635970.0
    4  1991    rec  quantity   1635970.0
    5  1991 tadult   teggStk 419648000.0
    6  1991 tadult tadultStk     84009.2
    7  1991 tadult    recStk   1635970.0
    8  1991 tadult  quantity     84009.2
    9  1991  teggs   teggStk 419648000.0
    10 1991  teggs tadultStk     84009.2

为了简洁起见,您也可以像这样 Package 函数:

twostage.pivot_longer <- function(x, aggregated_names, aggregated_values,
                                      skipped_columns){
      # First aggregation
      y <- tidyr::pivot_longer(x,
                               -c(skipped_columns[[1]]),
                          names_to= aggregated_names[1],
                          values_to = aggregated_values)
      # Second aggregation
      y <- tidyr::pivot_longer(y,
                               -c(skipped_columns[[2]], aggregated_names[1]),
                               names_to= aggregated_names[2],
                               values_to = aggregated_values)
      return(y)
    }

然后像这样使用它:

annual21atest_fun <- twostage.pivot_longer(x = pivot_test,
                                           aggregated_names = c('stage', 'stock'),
                                           aggregated_values = 'quantity',
                                           skipped_columns = list(
                                             c('year', 'recStk', 'teggStk', 'tadultStk'),
                                             'year'
                                             ))

相关问题