我有一个343行的数据集,其中只包含一个URL到(343)个托管Google Earth KMZ shapefile的网站,我必须下载。我有一个用R写的for循环,除了网站有点不可靠,经常提供“502坏网关错误”,所以当R循环遇到这个错误时,它崩溃并停止。所以我无法下载我需要的343个shapefile中的29个以上,因为错误非常频繁。你知道是否有一种方法可以强制R刷新网站,直到它工作(不再收到错误)并成功下载所需的shapefile,而不会跳过链接?
我在这里附上代码,包括我使用的包:
loadandinstall <- function(mypkg) {if (!is.element(mypkg, installed.packages()[,1])){
install.packages(mypkg, repos="http://cran.r-project.org")}; library(mypkg, character.only=TRUE)}
loadandinstall("stringr")
loadandinstall("rvest")
loadandinstall("XML")
loadandinstall("maptools")
loadandinstall("rgeos")
loadandinstall("rgdal")
loadandinstall("foreign")
loadandinstall("raster")
loadandinstall("sp")
loadandinstall("parallel")
loadandinstall("snow")
#Reads in the CSV with my 343 websites (embedded Google Maps spatial location data)
urlstring<-read.csv(str_c(basedir,"AllPlantingLocationData_WebsiteSource_2019.csv"),header=TRUE,colClasses=c("character","character","character"))[,1]
#Create empty character vector
gurls<-read.csv(str_c(basedir,"AllPlantingLocationData_WebsiteSource_2019.csv"),header=TRUE,colClasses=c("character","character","character"))[,2]
#Create vector of file names for downloaded kmls
names<-read.csv(str_c(basedir,"AllPlantingLocationData_WebsiteSource_2019.csv"),header=TRUE,colClasses=c("character","character","character"))[,3]
for(i in 1:length(urlstring)){
#Get the website source html code and convert to searchable list
t<-as.list(readLines(urlstring[i]))
#Locate the Google Maps URL within the source code
m<-unlist(t[which(!is.na(str_locate(t,"<iframe src=")[,1]))])
#Remove extra characters surrounding the Google Maps URL
gurl<-unlist(strsplit(substring(str_c(m),14,nchar(m))," "))[1]
gurl<-substring(gurl,1,nchar(gurl)-1)
#Replace the "embed" command within the Google Maps URL to a "kml" command, to create a download trigger URL
gurls[i]<-str_c(gsub("embed","kml",gurl))
#Download file
browseURL(gurls[i])
#Extract relevant file name into the new database
Sys.sleep(10) ### Leave the newest file enough time to download ###
tmpshot <- fileSnapshot("/Users/badiskhiari/Downloads/")
names[i]<-rownames(tmpshot$info[which.max(tmpshot$info$mtime),])
print(str_c("URL ",i," complete!!"))
}
字符串
1条答案
按热度按时间yh2wf1be1#
使用
httr2
进行下载时,可以使用httr2::req_retry()
控制请求重试策略。剩下的只是做一个工作示例:字符串
创建于2023-07-21使用reprex v2.0.2