R语言检索提取空表

ffdz8vbo 于 2023-02-10 发布在其他

关注(0)|答案(1)|浏览(134)

我用来抓取数据的网站已经更改，我在将数据拉入表格格式时遇到了问题。我使用了下面两种不同类型的代码试图获取表格，但它返回的是空白而不是表格。
我是一个刮擦方面的新手，希望能得到Maven组的帮助。我应该在rvest中寻找其他解决方案，还是尝试学习像rSelenium这样的程序？
https://www.pgatour.com/stats/detail/02675

抓取多个链接

library("dplyr")
library("purr")
library("rvest")

df23 <- expand.grid(
  stat_id = c("02568","02674", "02567", "02564", "101")  
) %>% 
  mutate(
    links = paste0(
      'https://www.pgatour.com/stats/detail/',
      stat_id
    )
  ) %>% 
  as_tibble()

#replaced tournament_id with stat_id
get_info <- function(link, stat_id){
  data <- link %>%
    read_html() %>%
    html_table() %>%
    .[[2]] 
}

test_main_stats <- df23 %>%
  mutate(tables = map2(links, stat_id, possibly(get_info, otherwise = tibble())))

test_main_stats <- test_main_stats %>% 
  unnest(everything())

替代代码

url <- read_html("https://www.pgatour.com/stats/detail/02568")
test1 <- url %>%
  html_nodes(".css-8atqhb") %>%
  html_table

r

来源：https://stackoverflow.com/questions/75404199/rvest-pulls-empty-tables

1条答案

按热度按时间

hc8w905p1#

这个页面使用javascript来创建表，所以rvest不能直接工作，但是如果查看页面的源代码，所有的数据都以JSON格式存储在一个““节点中。
这段代码找到了那个节点，并将其从JSON转换为一个列表，变量是主表，但JSON数据结构中包含了大量其他信息。

#read page
library(rvest)
page <- read_html("https://www.pgatour.com/stats/detail/02675")

#find the script with the correct id tage, strip the html code
datascript <- page %>% html_elements(xpath = ".//script[@id='__NEXT_DATA__']") %>% html_text()

#convert from JSON 
output <- jsonlite::fromJSON(datascript)
#explore the output
str(output)

#get the main table 
answer <-output$props$pageProps$statDetails$rows

赞(0）回复(0）举报 2023-02-10

我来回答

R语言检索提取空表

1条答案

相关问题

热门标签

最新问答

R语言 检索提取空表

1条答案

相关问题

热门标签

最新问答

R语言检索提取空表