R语言 对按另一列分组后包含分类数据的多列进行计数

kfgdxczn  于 2023-03-15  发布在  其他
关注(0)|答案(2)|浏览(184)

我有一个大的数据集(49000 X 118),我想做的是我想按一列分组,然后有多列的摘要。我的数据的问题是,每列的摘要有不同的长度。
下面是我的数据的一个简单示例

dat<- data.frame(test_number= as.factor(c("test1", "test1", "test1","test1","test1","test1", "test2","test2","test2", "test3","test3","test3","test3","test3","test3")), 
                 question1_response= as.factor(c("yes", NA, "no","not answered", "yes", "yes", NA, "no","yes","yes","yes","yes","yes","yes","yes")),
                 question2_response= as.factor(c("yes","yes","yes","yes","yes","yes","yes","yes","yes","yes","yes","yes","yes","yes","no")),
                 question3_response= as.factor(c("yes", NA, "no","yes", NA, "no","yes", NA, "no","yes", NA, "no","yes", NA, "no")))

我想按test_number分组,并在2:4列中获得每个响应的摘要
我试过的一些代码:
x一个一个一个一个x一个一个二个一个x一个一个三个一个
我期望结果是这样的(我是在excel中做的)

我用NAs替换了不相等的列长,但是只要我得到信息,我对结构并不特别。
谢谢

368yc8dk

368yc8dk1#

您可以将所有的问题回答堆叠到一列中,然后使用values_fn = length转换为宽格式进行计数。

library(tidyr)

dat %>%
  pivot_longer(-test_number) %>%
  pivot_wider(id_cols = c(test_number, value), names_from = name,
              values_from = name, values_fn = length, values_fill = 0)

# # A tibble: 10 × 5
#    test_number value        question1_response question2_response question3_response
#    <fct>       <fct>                     <int>              <int>              <int>
#  1 test1       yes                           3                  6                  2
#  2 test1       NA                            1                  0                  2
#  3 test1       no                            1                  0                  2
#  4 test1       not answered                  1                  0                  0
#  5 test2       NA                            1                  0                  1
#  6 test2       yes                           1                  3                  1
#  7 test2       no                            1                  0                  1
#  8 test3       yes                           6                  5                  2
#  9 test3       NA                            0                  0                  2
# 10 test3       no                            0                  1                  2
mqkwyuun

mqkwyuun2#

这就是你想要的吗?

library(tidyverse)

dat %>%
  pivot_longer(matches("question")) %>%
  group_by(test_number, name, value) %>%
  summarise(n = n(), .groups = "drop") %>%
  pivot_wider(values_from = n)

# A tibble: 10 × 5
   test_number value        question1_response question2_response question3_response
   <fct>       <fct>                     <int>              <int>              <int>
 1 test1       no                            1                 NA                  2
 2 test1       not answered                  1                 NA                 NA
 3 test1       yes                           3                  6                  2
 4 test1       NA                            1                 NA                  2
 5 test2       no                            1                 NA                  1
 6 test2       yes                           1                  3                  1
 7 test2       NA                            1                 NA                  1
 8 test3       yes                           6                  5                  2
 9 test3       no                           NA                  1                  2
10 test3       NA                           NA                 NA                  2

相关问题