R语言 如何计算100个不同csv文件中同一列(同名)的平均值,但文件名有一部分是相同的?

u7up0aaq  于 2023-01-15  发布在  其他
关注(0)|答案(1)|浏览(126)

我有一堆csv文件,结构是这样的:

df <- data.frame (first_column  = c(3, 2, 6, 7),
                  second_column = c(7, 5, 1, 8))

所有csv文件的名称类似

"type1_1.csv"
"type1_2.csv"
...
"type2_1.csv"
"type2_2.csv"
...

每个csv都有first_columnsecond_column。我想要创建一个新的 Dataframe ,如下所示:

# name        meanofsecond_column
# type1_1     5.25
# ...

我已经开始做的是,分别写出每一个:

type1_1 <- read_csv("type1_1.csv")
type1_1mean <- mean(type1_1$second_column)
...
df <- data.frame (name  = c(type1_1, type1_2...),
                  meanofsecondcolumn = c(type1_1mean, type1_2mean...))

但是,由于有100多个csv文件,这种方法不是很高效,也不干净,我怎么才能让它更精简呢?

flseospp

flseospp1#

# path where your csv files are (here current working directory)
CSV_FOLDER <- "."

# list all csv files in given directory
# second parameter is a regex meaning ends with .csv
# third parameter make function return file names with path
csv_files <- list.files(CSV_FOLDER, "\\.csv$", full.names=TRUE)

# apply given function on each file and collect results in a list
res <- lapply(csv_files, function(csv_file) {
  # read current file
  tmp <- read_csv(csv_file)

  # build a data.frame from filename (without path) and mean of second column
  return(data.frame(
    name = basename(csv_file),
    meanofsecondcolumn = mean(tmp$second_column)
  ))
})

# rbind all single line data.frames in a single data.frame
res <- do.call("rbind", res)

相关问题