在R中,如何得到两个变量的所有组合的和?

byqmnocz  于 2023-01-28  发布在  其他
关注(0)|答案(2)|浏览(162)

我有一个包含学生成绩和科目的长数据集。我想保留一个长数据集,但我想添加一个列,告诉我学生在人文课程(英语和历史)和STEM课程(生物和数学)中有多少个F。我还想为D、C、B和A添加相同的列。
我知道我可以明确地说明这一点,但在未来,他们可能会有其他科目(如增加化学到STEM)或完全不同的类别,如外语,所以我希望它是可扩展的。
我知道如何得到所有的列组合,我知道如何手动处理每个部分--但我不知道如何组合这两个。任何帮助都将不胜感激!

#Sample data
library(tidyverse)

student_grades <- tibble(student_id = c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 5, 5),
                      subject = c(rep(c("english", "biology", "math", "history"), 4), NA, "biology"),
                      grade = c(1, 2, 3, 4, 5, 4, 3, 2, 2, 4, 1, 1, 1, 1, 2, 3, 3, 4))
#All combinations of grades and subjects
all_subject_combos <- c("eng|his", "bio|math")
all_grades <- c("F", "D", "C", 
             "B", "A")

subjects_and_letter_grades <- expand.grid(all_subject_combos, all_grades)

all_combos <- subjects_and_letter_grades %>%
  unite("names", c(Var1, Var2)) %>%
  mutate(names = str_replace_all(names, "\\|", "_")) %>%
  pull(names)
#Manual generation of numbers of Fs by subject
#This is what I want the results to look like, but with all other letter grades

student_grades %>%
  group_by(student_id) %>%
  mutate(eng_his_F = sum((case_when(
    str_detect(subject, "eng|his") & grade == 1 ~ 1, 
    TRUE ~ 0)), na.rm = TRUE),
bio_math_F = sum((case_when(
  str_detect(subject, "bio|math") & grade == 1 ~ 1, 
  TRUE ~ 0)), na.rm = TRUE)) %>%
ungroup()

理想情况下,这将是可伸缩的,为任何数量的主题组合,并不会要求我写了同样的代码,为D,C,B和A。

lmvvr0a8

lmvvr0a81#

我们可以用map循环all_combos向量,然后在每个list中,按"student_id"进行分组(也可以在循环外执行此操作,并创建一个对象在此处使用此操作),通过计算创建与循环同名的新列(!!),并对case_when的输出的sum使用:=运算符,然后将数据与原始数据绑定

library(dplyr)
library(purrr)
library(stringr)
map_dfc(all_combos, ~ student_grades %>% 
  group_by(student_id) %>%
  transmute(!! .x := sum(case_when(str_detect(subject,
   str_replace(.x, "(\\w+)_(\\w+)_.", "\\1|\\2")) &
    grade == match(str_extract(.x, ".$"), all_grades)~ 1, TRUE ~ 0))) %>%
  ungroup %>% 
  dplyr::select(-student_id)) %>%
  bind_cols(student_grades, .)
  • 输出
# A tibble: 18 × 13
   student_id subject grade eng_his_F bio_math_F eng_his_D bio_math_D eng_his_C bio_math_C eng_his_B bio_math_B eng_hi…¹ bio_m…²
        <dbl> <chr>   <dbl>     <dbl>      <dbl>     <dbl>      <dbl>     <dbl>      <dbl>     <dbl>      <dbl>    <dbl>   <dbl>
 1          1 english     1         1          0         0          1         0          1         1          0        0       0
 2          1 biology     2         1          0         0          1         0          1         1          0        0       0
 3          1 math        3         1          0         0          1         0          1         1          0        0       0
 4          1 history     4         1          0         0          1         0          1         1          0        0       0
 5          2 english     5         0          0         1          0         0          1         0          1        1       0
 6          2 biology     4         0          0         1          0         0          1         0          1        1       0
 7          2 math        3         0          0         1          0         0          1         0          1        1       0
 8          2 history     2         0          0         1          0         0          1         0          1        1       0
 9          3 english     2         1          1         1          0         0          0         0          1        0       0
10          3 biology     4         1          1         1          0         0          0         0          1        0       0
11          3 math        1         1          1         1          0         0          0         0          1        0       0
12          3 history     1         1          1         1          0         0          0         0          1        0       0
13          4 english     1         1          1         0          1         1          0         0          0        0       0
14          4 biology     1         1          1         0          1         1          0         0          0        0       0
15          4 math        2         1          1         0          1         1          0         0          0        0       0
16          4 history     3         1          1         0          1         1          0         0          0        0       0
17          5 <NA>        3         0          0         0          0         0          0         0          1        0       0
18          5 biology     4         0          0         0          0         0          0         0          1        0       0
# … with abbreviated variable names ¹​eng_his_A, ²​bio_math_A
w80xi6nr

w80xi6nr2#

这里有另一种方法来看待它。我使用一个小的Map表(subject_to_field)来Map主题到它的领域(英语-〉人文学科,数学-〉STEM等)。我认为这可能有助于可伸缩性。当主题被添加或删除时,你需要维护这个表。
left_join然后将字段与student_grades tibble组合在一起。
添加“grade 2”列并不是必需的,但可以提高可读性。最后,我们需要做的是执行适当的分组和计数。在这种方法中,对于学生没有出现的成绩,您不会得到零计数。

library(tidyverse)

student_grades <- tibble(student_id = c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 5, 5),
                         subject = c(rep(c("english", "biology", "math", "history"), 4), NA, "biology"),
                         grade = c(1, 2, 3, 4, 5, 4, 3, 2, 2, 4, 1, 1, 1, 1, 2, 3, 3, 4))

student_grades <- student_grades %>%
  mutate(grade2 = case_when(
    grade == 1 ~ "A",
    grade == 2 ~ "B", 
    grade == 3 ~ "C", 
    grade == 4 ~ "D", 
    grade == 5 ~ "F"))

subject_to_field <- tibble(
  subject = c("biology", "english", "history", "math"),
  field = c("STEM", "Humanities", "Humanities", "STEM")
)

student_grades <- student_grades %>%
  left_join(subject_to_field, by = c("subject" = "subject"))

student_summary <- student_grades %>%
  group_by(student_id, field, subject, grade2) %>%
  summarise(count = n())

它将给予以下输出:

> student_summary
# A tibble: 18 × 5
# Groups:   student_id, field, subject [18]
   student_id field      subject grade2 count
        <dbl> <chr>      <chr>   <chr>  <int>
 1          1 Humanities english A          1
 2          1 Humanities history D          1
 3          1 STEM       biology B          1
 4          1 STEM       math    C          1
 5          2 Humanities english F          1
 6          2 Humanities history B          1
 7          2 STEM       biology D          1
 8          2 STEM       math    C          1
 9          3 Humanities english B          1
10          3 Humanities history A          1
11          3 STEM       biology D          1
12          3 STEM       math    A          1
13          4 Humanities english A          1
14          4 Humanities history C          1
15          4 STEM       biology A          1
16          4 STEM       math    B          1
17          5 STEM       biology D          1
18          5 NA         NA      C          1

相关问题