Rvest表返回空

5ssjco0h  于 2023-06-19  发布在  其他
关注(0)|答案(2)|浏览(137)

我正在尝试从以下链接中抓取表格:https://www.mlbdraftleague.com/mahoning-valley/roster

library(rvest)
library(magrittr)

url <- "https://www.mlbdraftleague.com/mahoning-valley/roster"
page <- read_html(url) %>% 
  html_table(fill = T)

我试过了,它返回了空的Dfs,其中有适量的表(5)和适量的列,但是 Dataframe 是空的。所有的帮助是赞赏。

lnlaulya

lnlaulya1#

library(tidyverse)
library(httr2)

"https://statsapi.mlb.com/api/v1/teams/545/roster?hydrate=person(rosterEntries,education,stats(type=season,season=2023,sportId=22,teamId=545))&rosterType=active&season=2023&sportId=22" %>% 
  request() %>% 
  req_perform() %>% 
  resp_body_json(simplifyVector = TRUE) %>% 
  pluck("roster") %>% 
  unnest(everything(), names_sep = "_") 

# A tibble: 41 × 47
   person_id person_full_name      person_link   person_first_name person_last_name person_birth_date person_current_age person_birth_city person_birth_state_p…¹
       <int> <chr>                 <chr>         <chr>             <chr>            <chr>                          <int> <chr>             <chr>                 
 1    813834 AJ Rausch             /api/v1/peop… AJ                Rausch           2002-03-19                        21 Powell            OH                    
 2    800677 Ahmad Harajli         /api/v1/peop… Ahmad             Harajli          2001-08-31                        21 Dearborn          MI                    
 3    701144 Alex Shea             /api/v1/peop… Brian             Shea             2001-05-04                        22 Union             KY                    
 4    701475 Andreaus Lewis        /api/v1/peop… Andreaus          Lewis            2002-12-10                        20 Atlanta           GA                    
 5    701499 Andrew Lucas          /api/v1/peop… Andrew            Lucas            2000-02-04                        23 Camarillo         CA                    
 6    813836 Braeden O'Shaughnessy /api/v1/peop… Braeden           O'Shaughnessy    2000-11-19                        22 Poland            OH                    
 7    681376 Brandon Hylton        /api/v1/peop… Brandon           Hylton           2000-02-01                        23 Livingston        NJ                    
 8    695480 Brennyn Abendroth     /api/v1/peop… Brennyn           Abendroth        2003-06-07                        20 Effingham         IL                    
 9    695746 Cale Lansville        /api/v1/peop… Cale              Lansville        2003-01-06                        20 Englewood         CO                    
10    809953 Cam Liss              /api/v1/peop… Cameron           Liss             2000-04-15                        23 Spokane           WA                    
# ℹ 31 more rows
xj3cbfub

xj3cbfub2#

其中一个选项是pluck从嵌套列表中一个接一个地搜索相关列,尽管当只有几个/更少的列时更有意义,比如教练:

library(dplyr)
library(jsonlite)
library(purrr)
url_roster  <- "https://statsapi.mlb.com/api/v1/teams/545/roster?hydrate=person(rosterEntries,education,stats(type=season,season=2023,sportId=22,teamId=545))&rosterType=active&season=2023&sportId=22"
url_coaches <- "https://statsapi.mlb.com/api/v1/teams/545/coaches?hydrate=person&rosterType=active&sportId=22&season=2023"

fromJSON(url_roster, simplifyVector = FALSE)[["roster"]] %>% 
  map(~ list(pos    = pluck(.x, "person", "primaryPosition", "type"),
             name   = pluck(.x, "person", "fullName"),
             j_nr   = pluck(.x, "jerseyNumber"),
             wins   = pluck(.x, "person", "stats", 1, "splits", 1, "stat", "wins"),
             losses = pluck(.x, "person", "stats", 1, "splits", 1, "stat", "losses"),
             era    = pluck(.x, "person", "stats", 1, "splits", 1, "stat", "era"),
             b_side = pluck(.x, "person", "batSide", "code"),
             p_hand = pluck(.x, "person", "pitchHand", "code"),
             height = pluck(.x, "person", "height"),
             weight = pluck(.x, "person", "weight"),
             dob    = pluck(.x, "person", "birthDate"),
             school = pluck(.x, "person", "education", "colleges", 1, "name")
             )) %>% 
  bind_rows()
#> # A tibble: 41 × 12
#>    pos   name  j_nr  b_side p_hand height weight dob   school  wins losses era  
#>    <chr> <chr> <chr> <chr>  <chr>  <chr>   <int> <chr> <chr>  <int>  <int> <chr>
#>  1 Outf… AJ R… "17"  R      R      "5' 1…    195 2002… Ohio      NA     NA <NA> 
#>  2 Pitc… Ahma… "41"  R      R      "6' 4…    240 2001… Michi…     1      1 6.14 
#>  3 Pitc… Alex… "48"  L      L      "6' 4…    211 2001… Cinci…     0      0 12.60
#>  4 Catc… Andr… "9"   L      R      "5' 1…    195 2002… Pensa…    NA     NA <NA> 
#>  5 Pitc… Andr… "33"  R      R      "5' 1…    190 2000… Texas…     1      1 1.80 
#>  6 Infi… Brae… "4"   R      R      "6' 3…    200 2000… Young…    NA     NA <NA> 
#>  7 Outf… Bran… "32"  L      R      "6' 8…    255 2000… Stets…    NA     NA <NA> 
#>  8 Pitc… Bren… "52"  R      R      "6' 4…    195 2003… South…     0      0 6.00 
#>  9 Pitc… Cale… ""    R      R      "6' 1…    205 2003… San J…    NA     NA <NA> 
#> 10 Pitc… Cam … "46"  L      L      "6' 0…    202 2000… Washi…     0      0 0.00 
#> # ℹ 31 more rows

fromJSON(url_coaches, simplifyVector = FALSE)[["roster"]] %>% 
  map(~ list(title  = pluck(.x, "title"),
             name   = pluck(.x, "person", "fullName"))) %>% 
  bind_rows()
#> # A tibble: 4 × 2
#>   title           name         
#>   <chr>           <chr>        
#> 1 Manager         Dmitri Young 
#> 2 Hitting Coach   Bryant Nelson
#> 3 Pitching Coach  Ray King     
#> 4 Assistant Coach Craig Antush

创建于2023-06-13带有reprex v2.0.2
您可能还需要检查baseballr包。

相关问题