R语言 回收不刮从< span>

mfpqipee  于 11个月前  发布在  其他
关注(0)|答案(1)|浏览(96)

我试图从亚马逊刮价格.它以前工作,但现在不,我不知道他们是否实施了一些保护,或者如果我没有正确使用rvest.


的数据
我试着用这段代码:

library(rvest)

my_url <- "https://www.amazon.com/s?k=reusable+straws"
user_agent <- user_agent("Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:120.0) Gecko/20100101 Firefox/120")
my_session <- session(my_url, user_agent)

my_session %>%
  html_elements(".a-offscreen")

字符串
我可以刮上面的<a class>刚刚好,我可以刮下面的<span class="a-size-base a-color-secondary">罚款,但没有一个价格跨度。
有什么想法吗?

lymnna71

lymnna711#

考虑使用SelectorGadget等工具来更好地识别要抓取的正确HTML元素。

library(tidyverse)
library(rvest)

"https://www.amazon.com/s?k=reusable+straws" %>% 
  read_html() %>% 
  html_elements(".puis-card-border") %>% # Select each product box
  map_dfr(~ tibble( # Map over every box to extract info
    title = html_element(.x, ".a-color-base.a-text-normal") %>% 
      html_text2(), 
    price = html_element(.x, ".a-price") %>% 
      html_text2(), 
    rating = html_element(.x, ".aok-align-bottom") %>% 
      html_text2()
  ))

# A tibble: 60 x 3
   title                                               price rating
   <chr>                                               <chr> <chr> 
 1 "HSHIJYA 18 Pack Reusable Stainless Steel Straws w~ $18.~ 4.7 o~
 2 "Piteno\u00ae 16-Pack Reusable Glass Straws, Clear~ $6.9~ 4.7 o~
 3 "Softy Straws Premium Reusable Stainless Steel Dri~ $12.~ 4.7 o~
 4 "15 FITS ALL TUMBLERS STRAWS - Reusable Silicone S~ $14.~ 4.6 o~
 5 "Tronco Set of 6 Stainless Steel Reusable Metal St~ $9.9~ 4.6 o~
 6 "Hiware 12-Pack Reusable Stainless Steel Metal Str~ $6.2~ 4.8 o~
 7 "24 PCS, Reusable Straws with 4 Brushes, 10.5\" Lo~ $5.9~ 4.6 o~
 8 "Kynup Reusable Straws, 4Pack Collapsible Portable~ $9.9~ 4.6 o~
 9 "Ello Impact Reusable Hard Plastic Straws with Cle~ $3.4~ 4.7 o~
10 "ALINK 10.5 in Long Rainbow Colored Reusable Trita~ $4.9~ 4.7 o~
# i 50 more rows
# i Use `print(n = ...)` to see more rows

字符串

相关问题