我需要根据如下所示的数据计算新列:
structure(list(english_score = c(3L, 4L, 3L, 3L, 4L, 3L, 4L,
2L, 4L, 2L, 3L, 3L, 2L, 2L, 3L, 4L, 3L, 3L, 4L, 3L, 4L, 3L, 2L
), math_score = c(4L, 4L, 3L, 4L, 4L, 4L, 3L, 2L, 3L, 3L, 4L,
2L, 4L, 2L, 4L, 2L, 3L, 3L, 2L, 2L, 2L, 4L, 2L), science_score = c(3L,
4L, 4L, 4L, 3L, 4L, 4L, 3L, 3L, 2L, 3L, 4L, 4L, 4L, 4L, 4L, 4L,
2L, 3L, 2L, 3L, 3L, 4L)), row.names = c(NA, -23L), class = c("data.table",
"data.frame"), .internal.selfref = <pointer: 0x000002478ee34d50>)
我想制作这样的东西:
structure(list(english_score = c(3L, 4L, 3L, 3L, 4L, 3L, 4L,
2L, 4L, 2L, 3L, 3L, 2L, 2L, 3L, 4L, 3L, 3L, 4L, 3L, 4L, 3L, 2L
), math_score = c(4L, 4L, 3L, 4L, 4L, 4L, 3L, 2L, 3L, 3L, 4L,
2L, 4L, 2L, 4L, 2L, 3L, 3L, 2L, 2L, 2L, 4L, 2L), science_score = c(3L,
4L, 4L, 4L, 3L, 4L, 4L, 3L, 3L, 2L, 3L, 4L, 4L, 4L, 4L, 4L, 4L,
2L, 3L, 2L, 3L, 3L, 4L), english_level = c("Level C", "Level D",
"Level C", "Level C", "Level D", "Level C", "Level D", "Level B",
"Level D", "Level B", "Level C", "Level C", "Level B", "Level B",
"Level C", "Level D", "Level C", "Level C", "Level D", "Level C",
"Level D", "Level C", "Level B"), math_level = c("Level D", "Level D",
"Level C", "Level D", "Level D", "Level D", "Level C", "Level B",
"Level C", "Level C", "Level D", "Level B", "Level D", "Level B",
"Level D", "Level B", "Level C", "Level C", "Level B", "Level B",
"Level B", "Level D", "Level B"), science_level = c("Level C",
"Level D", "Level D", "Level D", "Level C", "Level D", "Level D",
"Level C", "Level C", "Level B", "Level C", "Level D", "Level D",
"Level D", "Level D", "Level D", "Level D", "Level B", "Level C",
"Level B", "Level C", "Level C", "Level D")), row.names = c(NA,
-23L), class = c("data.table", "data.frame"), .internal.selfref = <pointer:
0x000002478ee34d50>)
到目前为止,我的方法一直是使用一个函数来计算新变量的水平...
myfunction<-function(x){case_when(x<2~"Level A",
x>1 & x<3~"Level B",
x>2 & x<4~"Level C",
x>3~"Level D")}
....然后,创建新变量并逐一为其赋值。
DT[, english_level:=lapply(.SD, myfunction), .SDcols='english_score']
DT[, math_level:=lapply(.SD, myfunction), .SDcols='math_score']
DT[, science_level:=lapply(.SD, myfunction), .SDcols='science_score']
如何简化这个过程,最好使用data.table?
2条答案
按热度按时间qhhrdooz1#
这里有一个选项,您可以避免创建自己的函数,而是创建一个Map表,然后将每个分数Map到年级。
c7rzv4ha2#
我会这样做(我把你的数据叫做
DT
,因为utils::data()
是一个基R函数):另外,您的
myfunction()
使用dplyr::case_when()
。这将工作,但一些dplyr
函数与data.table
冲突(between()
,first()
和last()
与我目前的版本)。您可以用data.table::fcase()
替换它。这应该也比
dplyr
版本快。此外,使用这个特定的函数,实际上可以将case when type logic替换为将字母表中的第
n
个字母指定为一个等级: