基于标准的分组值-R

r1wp621o  于 2023-04-03  发布在  其他
关注(0)|答案(2)|浏览(99)

我有这样的数据集:

dt <- data.table(ID = c(1, 2, 3, 4),
                 q1= c(1, 2, 3, 5), 
                 q2= c(3, 5, 2, 4), 
                 q3= c(2, 3, 4, 3),
                 education = c("A", "B", "C", "D"))

我想得到一个表,将值分为不同的组:值为1,3,4的所有ID应计入名为“YES”的组中所有值为1,3的ID应计入名为“maybe”的组中(此处某些ID将被计入两次)所有值为5,2的ID应计入“NO”
最后的产出应该是一个针对每个教育水平的表格,并以问题为基础:

YES    maybe    NO
q1
q2
q3

希望你能帮我

anauzrmj

anauzrmj1#

下面是一个data.table-native方法:

library(data.table)
categs <- list(YES=c(1,3,4), maybe=c(1,3), NO=c(2,5))
out <- melt(dt, "education", measure.vars = c("q1", "q2", "q3")
  )[, names(categs) := lapply(categs, `%in%`, x = value)
  ][, lapply(.SD, function(z) as.numeric(sum(z))), by = .(education, variable), .SDcols = names(categs)
  ][, names(categs) := lapply(.SD, function(z) z/sum(z)),
    by = .(education), .SDcols = names(categs)]
out
#     education variable   YES maybe    NO
#        <char>   <fctr> <num> <num> <num>
#  1:         A       q1   0.5   0.5   0.0
#  2:         B       q1   0.0   0.0   0.5
#  3:         C       q1   0.5   1.0   0.0
#  4:         D       q1   0.0   0.0   1.0
#  5:         A       q2   0.5   0.5   0.0
#  6:         B       q2   0.0   0.0   0.5
#  7:         C       q2   0.0   0.0   1.0
#  8:         D       q2   0.5   0.0   0.0
#  9:         A       q3   0.0   0.0   1.0
# 10:         B       q3   1.0   1.0   0.0
# 11:         C       q3   0.5   0.0   0.0
# 12:         D       q3   0.5   1.0   0.0
vof42yt1

vof42yt12#

我不知道我是否明白你的真正愿望,但据我所知,这将给予你你想要的:

library(data.table)
library(dplyr)
library(tidyr)
dt <- data.table(ID = c(1, 2, 3, 4),
                 q1= c(1, 2, 3, 5), 
                 q2= c(3, 5, 2, 4), 
                 q3= c(2, 3, 4, 3),
                 education = c("A", "B", "C", "D"))

dt <- dt %>% pivot_longer(q1:q3) %>% 
  group_by(education, name) %>% 
  mutate(YES = sum(value %in% c(1,3, 4)),
         maybe = sum(value %in% c(1, 3)),
         NO = sum(value %in% c(2, 5))) %>%
  dplyr::select(!c(value, ID)) %>%
  distinct()

dt

输出:

education name    YES maybe    NO
   <chr>     <chr> <int> <int> <int>
 1 A         q1        1     1     0
 2 A         q2        1     1     0
 3 A         q3        0     0     1
 4 B         q1        0     0     1
 5 B         q2        0     0     1
 6 B         q3        1     1     0
 7 C         q1        1     1     0
 8 C         q2        0     0     1
 9 C         q3        1     0     0
10 D         q1        0     0     1
11 D         q2        1     0     0
12 D         q3        1     1     0

因此,对于每个教育水平和每个q,它会给你一个“是”,“可能”和“否”的计数。

编辑:----

如果您希望为每个教育级别创建一个单独的表,请使用以下代码:

for (level in unique(dt$education)) {
  assign(level, dt %>% filter(education == level), envir = .GlobalEnv)
}

现在,每个教育级别都有一个表,如下所示:

A
  education name    YES maybe    NO
  <chr>     <chr> <int> <int> <int>
1 A         q1        1     1     0
2 A         q2        1     1     0
3 A         q3        0     0     1

相关问题