如何使用for循环索引构建数据集的子集，其中的文件显示文件名中的特定模式

hts6caw3 于 2023-04-18 发布在其他

关注(0)|答案(1)|浏览(110)

我有许多文件要分析，所有的名称模型如下（为方便起见，我会报告简化reprex在这里）：

A_(ele)_c.xls
A_(ele)_d.xls
A_(ele)_e.xls

B_(ele)_c.xls
B_(ele)_d.xls
B_(ele)_e.xls

我尝试的是将它们存储在宏列表中，以便：

list$A would contain

c
d
e

以及：

list$B would contain

c
d
e

所以我使用下面的代码

nm1 = c(LETTERS[1:2])
nm2 = c(letters[3:5])

list = NULL
for (i in Files){
  for (j in nm1){
    for (k in nm2){
      list[[j]][[k]][[i]] = if(str_detect(i, pattern = j) == TRUE){read_excel(i)} 
    }
  }
}

这似乎可以很好地对元素进行排序，但只是针对nm1中包含的j index。如何修改str_detect function中的pattern argument以将k index也考虑在内？
请随时提出替代方案
谢谢

来源：https://stackoverflow.com/questions/76021489/how-to-build-subset-of-datset-with-files-showing-specifc-pattern-in-file-names-u

1条答案

按热度按时间

qyswt5oh1#

除非你有特别的理由创建很多对象，否则我建议你把它们作为一个列表导入，然后你可以把列表元素转换成一个数据集（假设结构是同构的），并为A，B，C和a，b，c等创建指示符。我还建议使用fs::dir_ls()或dir()通过检查文件夹中的内容来构建文件路径。
工作流可能如下所示：

library(tidyverse)
library(fs)

# create vector of file paths
paths <- fs::dir_ls()

# this vector could look like this
c(
  "A_(ele)_c.xls",
  "A_(ele)_d.xls",
  "A_(ele)_e.xls",
  "B_(ele)_c.xls",
  "B_(ele)_d.xls",
  "B_(ele)_e.xls"
)

# read in all excel files into a single list
dat <- map(paths, readxl::read_xls)

# row-bind all datasets (if columns match) and create indices for A, B, C as well as c, d, e
dat |> 
  list_rbind(names_to = "filename") |> 
  mutate(
    id1 = str_extract(filename, "^\\w"),
    id2 = str_extract(filename, "_(\\w).xls", group = 1)
  ) |> 
  select(!filename)

赞(0）回复(0）举报 2023-04-18

我来回答

如何使用for循环索引构建数据集的子集，其中的文件显示文件名中的特定模式

1条答案

相关问题

热门标签

最新问答