R语言 我如何做网页抓取USDA FoodData中心?

628mspwn  于 2023-04-18  发布在  其他
关注(0)|答案(1)|浏览(90)

我尝试使用R在这个URL中删除一个表。我尝试了这个代码,但它显示xml_missing。我如何在这个URL中检索营养表?

library(rvest)
library(tidyverse)

url <- "https://fdc.nal.usda.gov/fdc-app.html#/food-details/2237774/nutrients"

read_html(url) %>% html_element(xpath = '// [@id="nutrients-table"]')
bjp0bcyl

bjp0bcyl1#

我已经能够用下面的代码得到表:

library(RSelenium)
library(rvest)

shell('docker run -d -p 4446:4444 selenium/standalone-firefox')
remDr <- remoteDriver(remoteServerAddr = "localhost", port = 4446L, browserName = "firefox")
remDr$open()
remDr$navigate("https://fdc.nal.usda.gov/fdc-app.html#/food-details/2237774/nutrients")

Sys.sleep(10)
html_Text <- remDr$getPageSource()[[1]]
(read_html(html_Text) %>% html_table())[[1]]

# A tibble: 42 x 29
   Name       Amount Unit  Data ~1 Deriv~2 n     Samples Min   Max   Median Lab M~3 Footn~4 Last ~5 ``    ``    ``   
   <chr>      <chr>  <chr> <chr>   <chr>   <chr> <chr>   <chr> <chr> <chr>  <chr>   <lgl>   <lgl>   <lgl> <lgl> <chr>
 1 Energy     4      kcal  UNKNOWN "Calcu~ ""    "Sampl~ "Ana~ Anal~ Analy~ Analys~ NA      NA      NA    NA    Amou~
 2 Analysis ~ Analy~ Anal~ Analys~ ""      ""    ""      ""    NA    NA     NA      NA      NA      NA    NA    NA   
 3 Amount/10~ Unit   Tech~ Method  "City"  "Sta~ "Acqui~ "FDC~ NA    NA     NA      NA      NA      NA    NA    NA   
 4 Protein    0.83   g     UNKNOWN "Calcu~ ""    "Sampl~ "Ana~ Anal~ Analy~ Analys~ NA      NA      NA    NA    Amou~
 5 Analysis ~ Analy~ Anal~ Analys~ ""      ""    ""      ""    NA    NA     NA      NA      NA      NA    NA    NA   
 6 Amount/10~ Unit   Tech~ Method  "City"  "Sta~ "Acqui~ "FDC~ NA    NA     NA      NA      NA      NA    NA    NA   
 7 Total lip~ 0      g     UNKNOWN "Calcu~ ""    "Sampl~ "Ana~ Anal~ Analy~ Analys~ NA      NA      NA    NA    Amou~
 8 Analysis ~ Analy~ Anal~ Analys~ ""      ""    ""      ""    NA    NA     NA      NA      NA      NA    NA    NA   
 9 Amount/10~ Unit   Tech~ Method  "City"  "Sta~ "Acqui~ "FDC~ NA    NA     NA      NA      NA      NA    NA    NA   
10 Carbohydr~ 0.42   g     UNKNOWN "Calcu~ ""    "Sampl~ "Ana~ Anal~ Analy~ Analys~ NA      NA      NA    NA    Amou~
# ... with 32 more rows, 13 more variables: `` <lgl>, `` <lgl>, `` <lgl>, `` <lgl>, `` <lgl>, `` <lgl>, `` <lgl>,
#   `` <lgl>, `` <lgl>, `` <lgl>, `` <lgl>, `` <lgl>, `` <lgl>, and abbreviated variable names
#   1: `Data Prov. Deriv. Method`, 2: `Deriv. By`, 3: `Lab Method`, 4: Footnote, 5: `Last Updated`
# i Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names

相关问题