R语言 从单独的表行创建具有列特征的数据框

9fkzdhlc  于 2023-02-17  发布在  其他
关注(0)|答案(3)|浏览(128)

我有一个描述性的辅助表,表中的行指定了变量的特征,其中varCat描述了变量类别,rept是该类别的后续实现重复次数,form是它们的数据格式:

require(dplyr)
require(tidyr)
require(purrr)

descr <- tibble(
  varCat = c("a", "b"),
  rept = c(1, 3),
  form = c("text", "num")
)
descr
#> # A tibble: 2 × 3
#>   varCat  rept form 
#>   <chr>  <dbl> <chr>
#> 1 a          1 text 
#> 2 b          3 num

我想获取的是以下(空) Dataframe :

d
#> # A tibble: 0 × 4
#> # … with 4 variables: a <chr>, b_1 <dbl>, b_2 <dbl>, b_3 <dbl>

创建于2022年9月27日,使用reprex v2.0.2
涉及两个步骤:
1.辅助表的varrept一起建立“目标”数据框中的列名,使得如果rept等于1,则不应应用后缀;然而,如果rept大于1,则应当创建具有后缀的列序列;
1.每列的格式应读取descr$form
我已经设法实施了这些步骤,尽管我觉得相当笨拙:

# Step 1:
tmp <- descr %>%
  uncount(rept, .id = "rept") %>%
  group_by(varCat) %>%
  mutate(
    n = n(),
    var = case_when(
      n > 1 ~ paste0(varCat, "_", rept),
      TRUE ~ varCat
    )
  ) %>%
  ungroup %>%
  select(var, form)
c <- tmp$var
d <- matrix(ncol = length(c), nrow = 0) %>%
  as_tibble(.name_repair = "unique") %>%
  set_names(c)

# Step 2:
for (i in colnames(d)) {
  for (j in seq_along(tmp$var)) {
    if (tmp$var[j] == i & tmp$form[j] == "text") d[i] <- as.character(d[i]) else
    if (tmp$var[j] == i & tmp$form[j] == "num") d[i] <- as.numeric(d[i])
  }
}
d
#> # A tibble: 0 × 4
#> # … with 4 variables: a <chr>, b_1 <dbl>, b_2 <dbl>, b_3 <dbl>

创建于2022年9月27日,使用reprex v2.0.2
我相信一定有一个更简洁的方法来实现这一点。任何帮助将不胜感激。

xesrikrc

xesrikrc1#

mapply 与返回列表的自定义函数结合使用,然后使用call data.frame 将列表转换为 data.frame

foo <- function(varCat, rept, form){
  f <- setNames(c("character", "numeric"), c("text", "num"))[ form ]
  x <- rep(list(vector(mode = f)), rept)
  x <- setNames(x, rep(varCat, rept))
  if(rept > 1) names(x) <- paste(names(x), seq(names(x)), sep = "_")
  x
}

out <- data.frame(mapply(foo, descr$varCat, descr$rept, descr$form,
                         USE.NAMES = FALSE))

#check the output
out
# [1] a   b_1 b_2 b_3
# <0 rows> (or 0-length row.names)
str(out)
# 'data.frame': 0 obs. of  4 variables:
# $ a  : chr 
# $ b_1: num 
# $ b_2: num 
# $ b_3: num
wz3gfoph

wz3gfoph2#

与@zx8754的答案类似,但也使用a/b_1/b_2/b_3命名:

as.data.frame(
    list("text"=character(0), "num"=numeric(0))[rep(descr$form, descr$rept)],
    col.names=paste0(
        rep(descr$varCat, descr$rept),
        unlist(lapply(descr$rept, \(x) if(x > 1) paste0("_", sequence(x)) else "" ))
    )
)
##[1] a   b_1 b_2 b_3
##<0 rows> (or 0-length row.names)

关键元素是as.data.frame.list,它允许通过col.names=参数直接命名生成列类型的子集list()

p1iqtdky

p1iqtdky3#

使用purrr::pmapdplyr::bind_colstidyverse方法可能如下所示:

library(dplyr)
library(purrr)

descr <- tibble(
  varCat = c("a", "b"),
  rept = c(1, 3),
  form = c("text", "num")
)

purrr::pmap(descr, function(varCat, rept, form) {
  col_type <- switch(form,
                     "text" = character(0),
                     "num" = numeric(0)
  )
  d <- bind_cols(map(seq(rept), ~ col_type))
  names(d) <- if (rept > 1) {
    paste(varCat, seq(rept), sep = "_")    
  } else {
    varCat
  }
  d
}) %>%
  bind_cols()
#> New names:
#> • `` -> `...1`
#> New names:
#> • `` -> `...1`
#> • `` -> `...2`
#> • `` -> `...3`
#> # A tibble: 0 × 4
#> # … with 4 variables: a <chr>, b_1 <dbl>, b_2 <dbl>, b_3 <dbl>

相关问题