如何在参数中使用data.table fifelse和vectors？

flmtquvp 于 2023-05-26 发布在其他

关注(0)|答案(3)|浏览(102)

假设我有此数据。帧

DF <- data.frame(one=c(1, NA, NA, 1, NA, NA), two=c(NA,1,NA, NA, NA,1), 
         three=c(NA,NA, 1, NA, 1,NA))

one    two  three         output
  1     NA    NA             one
 NA      1    NA             two
 NA     NA     1           three
  1     NA    NA             one  
 NA     NA     1           three
 NA      1    NA             two

这些列是互斥的。
我需要生成输出

output=c("one","two","three","one","three", "two")

我试过用data.table fifelse但是

with(DF,fifelse(one==1, "one", fifelse(two==1,"two", "three", na="three"), 
   na=fifelse(two==1,"two", "three", na="three")))

Error in fifelse(one == 1, "one", fifelse(two == 1, "two", "three", na = "three"),  : 
  Length of 'na' is 6 but must be 1

它似乎不接受参数上的向量。
dplyr的if_else在这里工作得很好。

with(DF,if_else(one==1, "one", if_else(two==1,"two", "three", missing="three"), 
   missing=if_else(two==1,"two", "three", missing="three")))

我怎样才能得到与data.table相同的输出？
任何其他简单的选择。我可以用R碱基

apply(DF,1, function(x) which(!is.na(x)))

然后用字符替换这些数字。

来源：https://stackoverflow.com/questions/76322685/how-to-use-data-table-fifelse-with-vectors-in-the-arguments

3条答案

按热度按时间

jfewjypa1#

另一个数据.表替代：

for (col in names(DF)) set(DF, which(DF[[col]] == 1), j = "output", value = col)

赞(0）回复(0）举报 2023-05-26

deyfvvtc2#

如果每行只有一个非NA值，可以尝试max.col

> names(DF)[max.col(!is.na(DF))]
[1] "one"   "two"   "three" "one"   "three" "two"

或col + na.omit（但如果您追求速度，则可能会很慢）

> names(DF)[na.omit(c(t(col(DF) * DF)))]
[1] "one"   "two"   "three" "one"   "three" "two"

对标

microbenchmark(
    f1 = names(DF)[max.col(!is.na(DF))],
    f2 = names(DF)[na.omit(c(t(col(DF) * DF)))]
)

给予

Unit: microseconds
 expr   min     lq    mean median    uq    max neval
   f1  28.5  51.45  92.343  64.40  91.8 1532.5   100
   f2 300.7 527.65 634.755 595.35 691.5 2405.4   100

赞(0）回复(0）举报 2023-05-26

ct2axkht3#

fifelse不是最好的工具，我建议fcase更容易：

data.table

library(data.table)
as.data.table(DF)[, fcase(one == 1, "one", two == 1, "two", three == 1, "three")]
# [1] "one"   "two"   "three" "one"   "three" "two"

dplyr

dplyr模拟值为case_when：

library(dplyr)
with(DF, case_when(one == 1 ~ "one", two == 1 ~ "two", three == 1 ~ "three"))
# [1] "one"   "two"   "three" "one"   "three" "two"

base R

data.table和dplyr实现都假定预先知道列名。一个base-R方法，它不知道：

colnames(DF)[apply(DF, 1, which.max)]
# [1] "one"   "two"   "three" "one"   "three" "two"

（顺便说一句，which.max也可以是which.min，实际上我们只是在寻找一个非NA的值。
在这种情况下，如果您有其他不应该考虑的列，您将需要在apply(DF, ...)中设置DF的子集，以便它只查看所需的列。

赞(0）回复(0）举报 2023-05-26

我来回答

如何在参数中使用data.table fifelse和vectors？

3条答案

对标

data.table

dplyr

base R

相关问题

热门标签

最新问答