scrapy BeautifulSoup有时无法获取所有页面源代码

ny6fqffe 于 2022-11-09 发布在其他

关注(0)|答案(2)|浏览(155)

我正在使用Selenium和beautifulSoup4进行抓取。问题是我的脚本有时候“result”是空的，有时候没有。我不明白为什么它有时候不工作。是网站的安全问题还是内存问题？我不知道
page_source = BeautifulSoup(driver.page_source, "html.parser")
result= page_source.find_all('div',{'class':'pv-profile-section-pager ember-view'})个

scrapy

来源：https://stackoverflow.com/questions/67905864/beautifulsoup-cant-get-all-page-source-sometimes

2条答案

按热度按时间

41zrol4v1#

您的类名可能在某处出错，您可以尝试：

result= page_source.find_all('div',{'class': lambda x: x and 'pv-profile-section-pager' in x})

或者iframehtml标签在这里也可能是个问题Select iframe using Python + Selenium

赞(0）回复(0）举报 2022-11-09

mv1qrgav2#

我会建议有一些延迟，因为根据操作没有错误。
放些东西
如果您想使用Selenium来实现它，我建议您在Python绑定中查看Selenium中的ExplicitWait。
Python - selenium 元素-显式等待

样本代码：

try:
    element = WebDriverWait(driver, 10).until(
        EC.presence_of_element_located((By.ID, "myDynamicElement"))
    )
finally:
    driver.quit()

赞(0）回复(0）举报 2022-11-09

我来回答

scrapy BeautifulSoup有时无法获取所有页面源代码

2条答案

相关问题

热门标签

最新问答