我正在创建一个R函数,该函数接受站点编号,导航Canada Hydrometric,并下载该站点的所有数据。我遇到了一些问题,它们可能是由于单选按钮和/或搜索按钮未命名。这是我所拥有的:
station_number <- "08NM083"
url <- "https://wateroffice.ec.gc.ca/search/historical_e.html"
user_a <- httr::user_agent("Mozilla/5.0 (Macintosh; Intel Mac OS X 12_0_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.69 Safari/537.36")
my_session <- session(url, user_a)
form <- html_form(my_session)[[2]]
其给出:
<form> 'search-form' (GET https://wateroffice.ec.gc.ca/search/historical_results_e.html)
<field> (submit) : Search
<field> (radio) search_type: station_name
<field> (text) station_name:
<field> (radio) search_type: station_number
<field> (text) station_number:
<field> (radio) search_type: province
<field> (select) province: AB
<field> (radio) search_type: basin
<field> (select) basin:
<field> (radio) search_type: region
<field> (select) region: ATL
<field> (radio) search_type: coordinate
<field> (number) north_degrees:
<field> (number) north_minutes:
<field> (number) north_seconds:
<field> (number) south_degrees:
<field> (number) south_minutes:
<field> (number) south_seconds:
<field> (number) east_degrees:
<field> (number) east_minutes:
<field> (number) east_seconds:
<field> (number) west_degrees:
<field> (number) west_minutes:
<field> (number) west_seconds:
<field> (select) parameter_type: all
<field> (number) start_year: 1850
<field> (number) end_year: 2023
<field> (number) minimum_years:
<field> (checkbox) latest_year: Y
<field> (select) regulation: all
<field> (select) station_status: all
<field> (select) operation_schedule:
<field> (select) contributing_agency: all
<field> (select) gross_drainage_operator: >
<field> (number) gross_drainage_area:
<field> (select) effective_drainage_operator: >
<field> (number) effective_drainage_area:
<field> (select) sediment: ---
<field> (select) real_time: ---
<field> (select) rhbn: ---
<field> (select) contributed: ---
<field> (submit) : Search
然而,当我填写表格并提交时,似乎什么都没有改变。
filled <- form %>%
html_form_set(station_number = station_number,
search_type = "station_number")
resp <- session_submit(x = my_session, form = filled)
my_session
和resp
:
> my_session
<session> https://wateroffice.ec.gc.ca/search/historical_e.html
Status: 200
Type: text/html; charset=UTF-8
Size: 45034
> resp
<session> https://wateroffice.ec.gc.ca/search/historical_e.html
Status: 200
Type: text/html; charset=UTF-8
Size: 45284
欢迎提出任何建议!
编辑
kaliiiiiiiiii的建议粘贴在车站号码到网址已经为我的问题的这一部分奇妙的工作!我仍然不知道如何下载csv文件.
当前代码:
station_number <- "08NM083"
url <- paste0("https://wateroffice.ec.gc.ca/search/historical_results_e.html?search_type=station_number&station_number=",
station_number,
"&start_year=1850&end_year=2023&minimum_years=&gross_drainage_operator=%3E&gross_drainage_area=&effective_drainage_operator=%3E&effective_drainage_area=")
user_a <- httr::user_agent("Mozilla/5.0 (Macintosh; Intel Mac OS X 12_0_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.69 Safari/537.36")
my_session <- session(url, user_a)
form <- html_form(my_session)[[2]]
filled <- form %>%
html_form_set(check_all = "all")
resp <- session_submit(x = my_session, form = filled, submit = "download")
resp
link <- resp %>%
read_html() %>%
html_element("p+ section .col-lg-4:nth-child(1) a") %>%
html_attr("href")
full_link <- url_absolute(link, url)
我尝试下载文件:
download.file(full_link, destfile = "Downloads/test_hydat.csv")
test <- read_csv(full_link)
上面两个只包含html代码。
2条答案
按热度按时间h7appiyu1#
解决了!我需要跳转到“下载csv”链接,并专门提取新会话的响应内容。下面的完整代码适用于需要做类似事情的任何人:
6ju8rftf2#
为什么不直接使用API:
去所有的电台?
对于其他编程语言,使用curlconverter进行转换
您也可以直接使用URL进行搜索: