R语言我不能刮所有的产品上的一个全球速卖通页面

jchrr9hc 于 2023-03-20 发布在其他

关注(0)|答案(1)|浏览(99)

我试图刮所有的产品上的一个全球速卖通页面的代码如下，但它只返回10第一产品.
当我期望它返回所有产品时，我尝试了下面的代码，因为CSS选择器选择了所有产品名称。Here is the picture.

AlPage <- "https://www.aliexpress.com/w/wholesale-running-shoes.html?SearchText=running+shoes&catId=0&g=n&initiative_id=SB_20230318171033&sortType=total_tranpro_desc&spm=a2g0o.home.1000002.0&trafficChannel=main"

url<-read_html(AlPage)

print(url)

alproduct_name<-html_nodes(url,".manhattan--title--24F0J-G, .cards--title--2rMisuY") %>% html_text2()
alproduct_name

我还检查了所有产品的类名，因为我认为它们可能有不同的类名，但它们都是相同的。

r

来源：https://stackoverflow.com/questions/75779724/i-cannot-scrape-all-the-products-on-an-an-aliexpress-page

1条答案

按热度按时间

pkbketx91#

我怀疑最初的网页只显示前10个结果，然后剩下的结果会随着用户向下滚动而动态加载，所以使用rvest很难做到这一点。下面是使用RSelenium的方法：
我还将html节点更改为h1。您找到的节点对我不起作用，但h1仍然从该页面中提取鞋子名称。

# define url ---------------------------------------------------------
url <- "https://www.aliexpress.com/w/wholesale-running-shoes.html?SearchText=running+shoes&catId=0&g=n&initiative_id=SB_20230318171033&sortType=total_tranpro_desc&spm=a2g0o.home.1000002.0&trafficChannel=main"

# start RSelenium ------------------------------------------------------------

rD <- rsDriver(browser="firefox", port=4548L, chromever = NULL)
remDr <- rD[["client"]]

# Navigate to webpage -----------------------------------------------------
remDr$navigate(url)

# scroll to bottom of the page to load all the results
webElem <- remDr$findElement("css", "body")
webElem$sendKeysToElement(list(key = "end"))

# pull page html
html <- remDr$getPageSource()[[1]]

# Use Rvest to read the webpage
AlPage <-html %>% read_html()

# scan the webpage for the h1 node and pull the text associated with that node

alproduct_name <- AlPage %>% 
                  html_nodes("h1") %>% 
                  html_text2()

alproduct_name

46个结果正确吗？

赞(0）回复(0）举报 2023-03-20

我来回答

R语言我不能刮所有的产品上的一个全球速卖通页面

1条答案

相关问题

热门标签

最新问答

R语言 我不能刮所有的产品上的一个全球速卖通页面

1条答案

相关问题

热门标签

最新问答

R语言我不能刮所有的产品上的一个全球速卖通页面