selenium 不能刮起维维诺信息？

h79rfbju 于 2022-11-10 发布在其他

关注(0)|答案(1)|浏览(102)

我正在尝试从Vivino获取有关葡萄酒的品尝笔记和食物配对信息，这些信息不能从他们的API访问，但在Python中使用Selenium时获得了NoSuchElementException。我能够搜集到价格和年份的信息，但不能进一步向下收集数据。
我试图从https://www.vivino.com/US-TX/en/villa-maria-auckland-private-bin-sauvignon-blanc/w/39034?year=2021&price_id=26743464上抓取的页面

我尝试使用WebDriverWait来加载页面：

driver.get('https://www.vivino.com/US-TX/en/villa-maria-auckland-private-bin-sauvignon-blanc/w/39034?year=2021&price_id=26743464')
WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH, "//div[@data-testid='mentions']")))

我曾尝试使用XPath获取关键字柑橘、热带、树木水果，...：

tasting_notes = driver.find_elements(By.XPATH, "//div[@data-testid='mentions']")

我尝试使用类名获取文本本身：


# test = driver.find_elements(By.CLASS_NAME,"tasteNote__flavorGroup--1Uaen")

继续得到NoSuchElementException。有没有其他方法可以让我获取信息，或者Vivino不知何故阻止了我浏览这一部分？
编辑：在尝试查找数据之前，我曾尝试添加滚动到底部的代码：

while True:

        # Scroll down to the bottom.
        driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

        # Wait to load the page.
        time.sleep(2)

        # Calculate new scroll height and compare with last scroll height.
        new_height = driver.execute_script("return document.body.scrollHeight")

        if new_height == last_height:
            break

        last_height = new_height

仍然有这个问题。
编辑：已解决！感谢Furas的解释和Eugeny的代码。

selenium

来源：https://stackoverflow.com/questions/74215108/selenium-cant-scrape-vivino-information

1条答案

按热度按时间

laawzig21#

正如furas在评论中提到的，此页面有lazy load，因此您需要滚动页面。但滚动到底部在这里没有帮助，因为页面只加载您正在查看的内容。因此，您需要将页面慢慢滚动到底部。
以下是如何做到这一点的代码。我不确定这是否是最优雅的解决方案，但它很管用：)

driver = webdriver.Chrome()
driver.get('https://www.vivino.com/US-TX/en/villa-maria-auckland-private-bin-sauvignon-blanc/w/39034?year=2021&price_id=26743464')
driver.implicitly_wait(10)
page_height = driver.execute_script("return document.body.scrollHeight")
browser_window_height = driver.get_window_size(windowHandle='current')['height']
current_position = driver.execute_script('return window.pageYOffset')
while page_height - current_position > browser_window_height:
    driver.execute_script(f"window.scrollTo({current_position}, {browser_window_height + current_position});")
    current_position = driver.execute_script('return window.pageYOffset')
    sleep(1)  # It is necessary here to give it some time to load the content
print(driver.find_element(By.XPATH, '//div[@data-testid="mentions"]').text)
driver.quit()

赞(0）回复(0）举报 2022-11-10

我来回答

selenium 不能刮起维维诺信息？

1条答案

相关问题

热门标签

最新问答