使用R和rvest抓取TripAdvisor文本

roejwanj  于 2023-06-19  发布在  其他
关注(0)|答案(1)|浏览(142)

链接https://www.tripadvisor.com/AttractionProductReview-g60750-d12086300-San_Diego_Whale_Watching_Cruise-San_Diego_California.html
我想从“期待什么”中得到文本,我试了很多方法,但我不能得到它。

link <- "https://www.tripadvisor.com/AttractionProductReview-g60750-d12086300-San_Diego_Whale_Watching_Cruise-San_Diego_California.html"
webpage <- read_html(link)

webpage %>% html_node( '#\\:lithium-RmpiitkqlsnklaH1\\: .KxBGd' ) %>% html_text(trim = T)
webpage %>% html_nodes('[data-has-vuc|="true"]') %>% html_text(trim = T)
webpage  %>% html_nodes("span.biGQs._P.pZUbB.KxBGd") %>% html_text(trim = T)

Any suggestion?

uz75evzq

uz75evzq1#

使用Chromote渲染页面并评估js以提取某些元素。可能不是最健壮的解决方案,可能需要一些调整,但它应该说明如何处理这样的问题。同样的javascript驱动策略也应该适用于(R)Selenium。

library(chromote)
library(rvest)
b <- ChromoteSession$new()
{
  b$Page$navigate("https://www.tripadvisor.com/AttractionProductReview-g60750-d12086300-San_Diego_Whale_Watching_Cruise-San_Diego_California.html")
  b$Page$loadEventFired()
  Sys.sleep(2)
}

# find element with javascript:
# use XPath to find (the first) <span></span> element that includes "What to expect", 
# find closest <dt> in it's parents, get next sibling element, <dd> with text content

b$Runtime$evaluate(
  'document.evaluate(
    "//span[contains(., \'What to expect\')]", 
    document, null, XPathResult.UNORDERED_NODE_ITERATOR_TYPE, null )
  .iterateNext()
  .closest("dt")
  .nextElementSibling
  .innerHTML')$result$value %>% 
  read_html() %>% 
  html_text()
#> Travel back in time when sailing on the America, a replica of the sailing ship
#> that won the first America's Cup sailing competition in 1851. Your classic
#> sailing vessel provides a smooth ride and spacious decks, perfect for sailing
#> on the Pacific Ocean in search of gray whales and other marine life. Since your
#> boat principally moves under wind power without using the engine, your captain
#> can get closer to the marine animals without scaring them. The boat's deep keel
#> provides excellent stability and large decks offer unobstructed views, making
#> the America a prime vessel for whale-watching. Snacks and drinks (non-alcoholic)
#> are offered during the cruise. You are welcome to bring along a picnic lunch or
#> your favorite bottle of wine to enjoy onboard. Whale sightings are guaranteed on
#> your cruise. If no whales are sighted, you can return for a complimentary whale
#> watching cruise on another day in the same season. The America also provides a
#> 'No Seasickness' guarantee.

创建于2023-06-16带有reprex v2.0.2

相关问题