type here
library(rvest)
url <- "https://www.misoenergy.org/markets-and-operations/real-time--market-data/market-reports/#nt=%2FMarketReportType%3AHistorical%20MCP%2FMarketReportName%3AASM%20Real-Time%20Final%20Market%20MCPs%20(csv)&t=10&p=0&s=MarketReportPublished&sd=desc"
# Scrape the webpage to extract the URLs
page <- read_html(url)
file_links <- page %>% html_nodes("a[href$='.csv']") %>% html_attr("href")
# Create a directory to store the downloaded files
dir.create("downloaded_files")
# Define the range of rows to extract from each file
start_row <- 3
end_row <- 13
# Loop over each file URL and download the file
for (file_link in file_links) {
filename <- basename(file_link)
file_path <- paste0("downloaded_files/", filename)
# Download the file
download.file(file_link, destfile = file_path)
# Read the downloaded file
data <- read.csv(file_path, skip = start_row - 1, nrows = end_row - start_row + 1)
# Do something with the data, e.g., print the extracted rows
print(data)
}
字符串
它在我的桌面上创建了一个空文件夹,但里面似乎什么都没有。我不确定这是不是代码问题,或者是不是在url方面有什么问题需要解决
1条答案
按热度按时间72qzrwbm1#
问题似乎来自于你试图提取链接的代码。我认为这是因为{rvest}不读取JavaScript,而网站显然需要它来显示文档。
不过,我认为还有其他选择。csv的链接具有类似的结构。所以,我会做以下事情。
字符串