R语言将有答案或NA的变量转换为虚拟变量的最佳方法是什么？

inn6fuwd 于 2023-03-27 发布在其他

关注(0)|答案(2)|浏览(134)

我有一个例子数据如下：

df <- data.frame(Q1_A = c("This is a reason", NA, "This is a reason", NA),
                 Q1_B = c("This is another reason", "This is another reason", NA, NA))

每个答案都有多个可能的答案。因此，必须将它们分开。因此，NA s也不是真正的NA s
我想运行一个回归的形式：

lm( y ~ Q1_A + Q1_B + ... + )

然后显示为输出：

Coefficients:
(Intercept)         Q1_A         Q1_B
   34.66099     -0.02058     -1.58728

我想这意味着我需要把所有的NA值转换为基本水平。
将这些变量转换为虚拟变量的最佳方法是什么？
预期输出：

df <- data.frame(Q1_A = c("This is a reason", "Baselevel", "This is a reason", "Baselevel"),
                 Q1_B = c("This is another reason", "This is another reason", "Baselevel", "Baselevel"))

来源：https://stackoverflow.com/questions/75799680/what-is-the-best-way-to-turn-variables-with-either-an-answer-or-na-into-dummy-va

2条答案

按热度按时间

nfzehxib1#

当处理这样的数据时，我们通常将reason列转换为0和1 dummies，而列名指示原因。当原因相当长时，我们使用lookup data.frame在需要时查找列名。

library(dplyr)
library(tidyr)

df %>% 
  mutate(across(c(Q1_A:Q1_B),
                   ~ ifelse(!is.na(.x), 1, 0))
            )

#>   Q1_A Q1_B
#> 1    1    1
#> 2    0    1
#> 3    1    0
#> 4    0    0

# create lookup df and use when necessary
lookup_df <- df %>%
  summarise(across(everything(), ~ na.omit(unique(.x)))) %>% 
  pivot_longer(everything())

lookup_df
#> # A tibble: 2 × 2
#>   name  value                 
#>   <chr> <chr>                 
#> 1 Q1_A  This is a reason      
#> 2 Q1_B  This is another reason

数据来自OP

df <- data.frame(Q1_A = c("This is a reason", NA, "This is a reason", NA),
                 Q1_B = c("This is another reason", "This is another reason", NA, NA))

创建于2023-03-21带有reprex v2.0.2

赞(0）回复(0）举报 2023-03-27

dsekswqp2#

使用tidyr::replace_na：

df |> mutate(across(starts_with("Q"), ~relevel(as.factor(tidyr::replace_na(., "Baselevel")),  ref = "Baselevel")))

对于Q1_A，您将得到

[1] This is a reason Baselevel        This is a reason Baselevel       
Levels: Baselevel This is a reason

赞(0）回复(0）举报 2023-03-27

我来回答

R语言将有答案或NA的变量转换为虚拟变量的最佳方法是什么？

2条答案

相关问题

热门标签

最新问答

R语言 将有答案或NA的变量转换为虚拟变量的最佳方法是什么？

2条答案

相关问题

热门标签

最新问答

R语言将有答案或NA的变量转换为虚拟变量的最佳方法是什么？