R语言用库中的标签文本替换编码数据

bjg7j2ky 于 2023-04-18 发布在其他

关注(0)|答案(1)|浏览(161)

我有一个大约5000列和1000行的dataframe，包含编码数据（因子）。因为定义每个标签（大约20000个）太繁琐了，有没有一种方法在R中将代码替换为标签文本，并将变量替换为之前在库中定义的名称？
1-如何解码每个变量（替换为变量名称文本从库作为excel文件）？2-如何解码数据（替换编码数据的标签）？
这里的例子数据和图书馆我可以有：libray
Data
谢谢大家！

r

来源：https://stackoverflow.com/questions/76006962/replace-coded-data-by-label-text-from-a-library

1条答案

按热度按时间

dojqjjoe1#

首先，我定义了一个我假设您想要实现的玩具示例：
鉴于

## Toy Library
library <- data.frame(variable = c("V1", "V1", "V2", "V2", "V2"),
                      name = c("var name 1", "var name 1", "var name 2", "var name 2", "var name 2"),
                      code = c(1,2,1,2,3),
                      label = c("x1", "x2", "A", "B", "C"))

## Toy Data
data <- data.frame(Subj = seq(1,10),
                   V1 = sample(c(1,2), 10, replace = TRUE),
                   V2 = sample(c(1,2,3), 10, replace = TRUE))

你想要一个新的 Dataframe data_recoded：

data_recoded
#    Subj var name 1 var name 2
# 1     1         x1          C
# 2     2         x2          C
# 3     3         x2          B
# 4     4         x2          C
# 5     5         x2          C
# 6     6         x1          B
# 7     7         x1          B
# 8     8         x1          C
# 9     9         x1          C
# 10   10         x1          A

？（注意，此新数据框包含非语法变量名。）
R提供了“因子”数据类型，可以方便地处理编码的分类数据。通过将数据框中的每列转换为因子，您可以使用标签“重新编码”代码。
首先，您需要从库中提取每个变量的因子水平和标签，例如使用（使用dplyr包）：

library(dplyr)

## Create dictionary of factor levels and labels from toy library
dictionary <- library %>%
  group_by(variable) %>%
  summarise(codes = list(code),
            levels = list(label),
            names = unique(name))

接下来，您需要一个重新编码单个数据框列的函数，您可以稍后将其应用于所有列。例如：

## Function to replace codes with labels
recode_variable <- function(var) {
  ## Get column 'var' from data frame
  column <- data[, var]
  ## Get factor levels and labels for variable 'var'
  varlevels <- dictionary %>% filter(variable == var)
  ## Create new data frame from column as factor
  df <- data.frame(name = factor(column,
                                 levels = varlevels$codes[[1]],
                                 labels = varlevels$levels[[1]])
                   )
  ## Replace column name with name from dictionary
  names(df) <- varlevels$names
  ## Return data frame
  return(df)
}

最后，可以提取要重新编码的列的名称

## Get original names of columns to be recoded
columnnames <- names(data) %>% setdiff("Subj")

并使用来自包purrr的map_dfc()函数将该函数应用于所有选择的列，并将结果 Dataframe 列绑定在一起：

library(purrr)

## Recode columns
data_recoded <- columnnames %>%
  ## Apply function to all variables in columnames, cbind columns to new data frame
  map_dfc(~ recode_variable(.)) %>%
  ## Add Subject ID back as first column
  mutate(Subj = data$Subj, .before = 1)

data_recoded
#    Subj var name 1 var name 2
# 1     1         x1          C
# 2     2         x2          C
# 3     3         x2          B
# 4     4         x2          C
# 5     5         x2          C
# 6     6         x1          B
# 7     7         x1          B
# 8     8         x1          C
# 9     9         x1          C
# 10   10         x1          A

赞(0）回复(0）举报 2023-04-18

我来回答

R语言用库中的标签文本替换编码数据

1条答案

相关问题

热门标签

最新问答

R语言 用库中的标签文本替换编码数据

1条答案

相关问题

热门标签

最新问答

R语言用库中的标签文本替换编码数据