R语言 返回403禁止的错误

7cwmlq89  于 2023-01-03  发布在  其他
关注(0)|答案(1)|浏览(289)

下面的代码以前是可以工作的,但是我尝试下载文件的网站增加了一个用户验证步骤。我试过一些方法,包括让代码在循环步骤中休眠,但是到目前为止没有任何效果。有什么建议吗?

library(tidyverse)
library(rvest)

page <-
 "https://burnsville.civicweb.net/filepro/documents/25657/" %>%
  read_html

df <- tibble(
  names1 = page %>%
    html_nodes(".document-link") %>%
    html_text2() %>%
    str_remove_all("\r") %>%
    str_squish(),
  links = page %>%
    html_nodes(".document-link") %>%
    html_attr("href") %>%
    paste0("https://burnsville.civicweb.net", .)
)

destfile<-("destination.pdf")

df %>% 
  map(~ download.file(df$links, destfile = paste0(df$names1, ".pdf")))

#loop through and download PDFs
for (i in df$links) {
  tryCatch({
    download.file(url,
                  basename(url),
                  mode = "wb",
                  quiet=TRUE)
  }, error = function(e){})
}

先谢了!

nnt7mjpx

nnt7mjpx1#

library(tidyverse)
library(rvest)

page <-
  "https://burnsville.civicweb.net/filepro/documents/25657/" %>%
  read_html

docs <- tibble(
  names = page %>%
    html_nodes(".document-link") %>%
    html_text2() %>%
    str_remove_all("\r") %>%
    str_squish(),
  links = page %>%
    html_nodes(".document-link") %>%
    html_attr("href") %>%
    paste0("https://burnsville.civicweb.net", .), 
  file = str_extract(links, "[^/]*$")
)

map2(docs$links, docs$file, ~ download.file(url = .x, 
                                            destfile = str_c(.y, ".pdf"), 
                                            mode = "wb"))

相关问题