如何使用RSelenium r包从“tablist”类中获取带有“div”标签的页面上的Google地理坐标

fdx2calv  于 2023-03-27  发布在  Go
关注(0)|答案(1)|浏览(129)

我尝试使用R软件的RSelenium包中的函数获取html页面的地理坐标。目标是获取值 20º27'36.1“S 54º38'03.1“W。请按照代码进行尝试。非常感谢您的帮助。

library(rvest)
library(RSelenium)
library(httpuv)

port <- httpuv::randomPort()

rD <- rsDriver(browser = c("firefox"),
               verbose=TRUE,
               check = FALSE,
               port = port)

driver <- rD[["client"]]

urll <- "https://www.zapimoveis.com.br/lancamento/venda-apartamento-2-quartos-bairro-seminario-campo-grande-ms-46m2-id-2600496487/"
driver$navigate(urll)

politicas <- driver$findElement(using = "css",
                                value = "button.cookie-notifier__cta")
politicas$clickElement()

botaomapa <- driver$findElement(using = "xpath", "/html/body/main/div[1]/section/section/section[1]/button[2]")
botaomapa$clickElement()

#Attempt 1: using xpath from coordinates
coord <- driver$findElement(using="xpath", "/html/body/div/div/div/div[4]/div/div/div/div/div[1]/div")#errorrrrr

#Attempt 2: by botaomapa object
coord <- botaomapa$findElement(using="xpath", "/html/body/div/div/div/div[4]/div/div/div/div/div[1]/div")#errorrrr

#Attempt 3: by rvest package
readmap <- read_html(urll)
auxiliar <- readmap %>% html_elements("section")
auxiliar2 <- auxiliar%>%html_elements("#listing-map")
c1 <- readmap%>%html_nodes(xpath="/html/body/main/div[1]/section/section/section[2]/div/div[3]/article/iframe")#nothing
c2 <- auxiliar2%>%html_nodes(xpath="/html/body/main/div[1]/section/section/section[2]/div/div[3]/article/iframe")#nothing
c3 <- auxiliar2%>%html_nodes(xpath="/html/body/div/div/div/div[4]/div/div/div/div/div[1]/div")#nothing
uqzxnwby

uqzxnwby1#

麻烦的是,Map包含在iframe中,很难访问iframe中的任何内容。看起来你可以找到iframe及其属性!iframe的src=属性中包含的链接包含坐标,所以你可以提取iframe链接,然后从中提取坐标。
在原始代码中执行此步骤之后:

politicas$clickElement()

我是这么做的

library(stringr)
library(rvest)

# pull the webpage html
html <- driver$getPageSource()[[1]]


# look for the iframe's node
# then pull the source attribute
map_link <- html %>% 
  read_html() %>% 
  html_node(".map-embed__iframe") %>%
  html_attr("src")

下面是链接的样子:

map_link
[1] "https://www.google.com/maps/embed/v1/place?key=AIzaSyB1BH90qSMLRWrSEKe8D7fml7-kWHN2qjY&q=-20.460039,-54.634191"

然后可以使用正则表达式或其他方法来提取坐标

#remove everything before q=

map_link %>% str_remove(".*q=")
[1] "-20.460039,-54.634191"

这是我把这些坐标放进谷歌时看到的,所以看起来和原来的Map一样:

相关问题