如何在R中合并来自不同目录的文件？

qmelpv7a 于 2023-04-03 发布在其他

关注(0)|答案(1)|浏览(128)

我有一个名为simulations的文件夹，其中包含100个子文件夹，每个子文件夹中都包含模拟结果。每个子文件夹中的每个模拟结果都在四个单独的文件中，分别命名为seq[1].nex，seq[2].nex，seq[3].nex和seq[4].nex。这些文件中的每个文件都具有相同的格式，如下所示：

#NEXUS

Begin data;
Dimensions ntax=5 nchar=55;
Format datatype=Standard symbols="01" missing=? gap=-;
Matrix
L1   1100110010010100010110000110000010000100001011010010110
L2   1101110110011010010000010111000010010000001001010110110
L3   0111111100010100010011000001100011010100010010110011110
L4   1101110110011010010000010111000010010000001001010110110
L5   1101110100110100010110010110001010010100001011010110100
;
End;

名为seq的文件具有相同的行数（即L1-L5），但它们的每行长度不同。例如，seq[2].nex如下所示：

#NEXUS

Begin data;
Dimensions ntax=5 nchar=20;
Format datatype=Standard symbols="012" missing=? gap=-;
Matrix
L1   10000012202011210001
L2   10002112212010210012
L3   10002112212210220022
L4   10002112212010220012
L5   10001112212010222012 
;
End;

对于100个子文件夹中的每一个，我都希望将seq[1].nex、seq[2].nex、seq[3].nex和seq[4].nex合并到一个文件seq.nex中。（即，2-4）到第一个文件中相应的行。使用上面的两个示例，我想要的输出看起来像这样：

#NEXUS

Begin data;
Dimensions ntax=5 nchar=55;
Format datatype=Standard symbols="01" missing=? gap=-;
Matrix
L1   110011001001010001011000011000001000010000101101001011010000012202011210001
L2   110111011001101001000001011100001001000000100101011011010002112212010210012
L3   011111110001010001001100000110001101010001001011001111010002112212210220022
L4   110111011001101001000001011100001001000000100101011011010002112212010220012
L5   110111010011010001011001011000101001010000101101011010010001112212010222012
;
End;

然后我想重复这个过程，为100个子文件夹中的每个子文件夹合并文件。

r

来源：https://stackoverflow.com/questions/75892483/how-do-i-merge-files-from-separate-directories-in-r

1条答案

按热度按时间

oalqel3c1#

这里有一种方法：

library(data.table)

# get path to simulations folder
pth_to_simulations = "simulations"

# get a list of all subfolders, with full names
fldrs = dir(pth_to_simulations, full.names=T)

# Create a function that ingests a subfolder, reads files, and concatenates
read_sims <- function(fldr) {
  sims = dir(fldr,full.names = T)
  sims = lapply(sims, fread, skip=6, nrows=5, header=F)
  sims = do.call(merge, c(by="V1", sims))
  sims[, .(V2 = paste0(c(.SD), collapse="")), V1]
}

# Apply the function to each of the fldrs in `simulations`
lapply(fldrs, read_sims)

如果示例文件在simulations/sim1中，则结果如下：

[[1]]
   V1                                                                          V2
1: L1 110011001001010001011000011000001000010000101101001011010000012202011210001
2: L2 110111011001101001000001011100001001000000100101011011010002112212010210012
3: L3 011111110001010001001100000110001101010001001011001111010002112212210220022
4: L4 110111011001101001000001011100001001000000100101011011010002112212010220012
5: L5 110111010011010001011001011000101001010000101101011010010001112212010222012

此输出是长度为1的列表，因为只有一个文件夹（`sim1）。您的输出将是长度为100的列表，其中每个元素包含连接的信息

赞(0）回复(0）举报 2023-04-03

我来回答

如何在R中合并来自不同目录的文件？

1条答案

相关问题

热门标签

最新问答