我如何使用R从一个URL下载多个CSV文件,并且只下载一定数量的行(例如:第3-13行),

fjaof16o  于 2023-07-31  发布在  其他
关注(0)|答案(1)|浏览(101)
type here

library(rvest)

url <- "https://www.misoenergy.org/markets-and-operations/real-time--market-data/market-reports/#nt=%2FMarketReportType%3AHistorical%20MCP%2FMarketReportName%3AASM%20Real-Time%20Final%20Market%20MCPs%20(csv)&t=10&p=0&s=MarketReportPublished&sd=desc"

# Scrape the webpage to extract the URLs
page <- read_html(url)
file_links <- page %>% html_nodes("a[href$='.csv']") %>% html_attr("href")

# Create a directory to store the downloaded files
dir.create("downloaded_files")

# Define the range of rows to extract from each file
start_row <- 3
end_row <- 13

# Loop over each file URL and download the file
for (file_link in file_links) {
  filename <- basename(file_link)
  file_path <- paste0("downloaded_files/", filename)
  
  # Download the file
  download.file(file_link, destfile = file_path)
  
  # Read the downloaded file
  data <- read.csv(file_path, skip = start_row - 1, nrows = end_row - start_row + 1)
  
  # Do something with the data, e.g., print the extracted rows
  print(data)
}

字符串
它在我的桌面上创建了一个空文件夹,但里面似乎什么都没有。我不确定这是不是代码问题,或者是不是在url方面有什么问题需要解决

72qzrwbm

72qzrwbm1#

问题似乎来自于你试图提取链接的代码。我认为这是因为{rvest}不读取JavaScript,而网站显然需要它来显示文档。
不过,我认为还有其他选择。csv的链接具有类似的结构。所以,我会做以下事情。

date_start <- as.Date("2023-06-06")
date_end <- as.Date("2022-06-06")

list_dates <- format(seq(date_end, date_start,
                  by = "day"),
                  "%Y%m%d")

for (date in list_dates) {
  
  file_link <- paste0("https://docs.misoenergy.org/marketreports/",
                      date,
                      "_asm_rtmcp_final.csv")
  filename <- basename(file_link)
  file_path <- paste0("downloaded_files/", filename)
  
  # Download the file
  download.file(file_link, destfile = file_path)
  
  # Read the downloaded file
  data <-  read.csv(file_link, skip = 3, header = T)

  # Do something with the data, e.g., print the extracted rows
  print(data)
}

字符串

相关问题