使用分页提取信息- selenium bs 4 python [已关闭]

scyqe7ek 于 2023-08-08 发布在 Python

关注(0)|答案(1)|浏览(91)

已关闭。此问题需要details or clarity。它目前不接受回答。
**希望改进此问题？**通过editing this post添加详细信息并阐明问题。

10天前关门了。
Improve this question的
我正在使用网页抓取销售导航器。我能够导航到第1页，滚动8次，并提取所有的名字，标题使用 selenium 和美丽。下面是代码。

driver.get(dm)
time.sleep(5)

time.sleep(5)
section = driver.find_element(By.XPATH, "//*[@id='search-results-container']")
time.sleep(5)

counter = 0

while counter < 8:  # this will scroll 8 times
    driver.execute_script('arguments[0].scrollTop = arguments[0].scrollTop + arguments[0].offsetHeight;',
                                 section)
    counter += 1
    # add a timer for the data to fully load once you have scrolled the section

    time.sleep(7) # You might need to install time library to use this statement

src2 = driver.page_source
 
# Now using beautiful soup
soup = BeautifulSoup(src2, 'lxml')

name_soup = soup.find_all('span', {'data-anonymize': 'person-name'})

names = []
for name in name_soup:
    names.append(name.text.strip())

字符串
然而，有8个多页，我需要提取所有的名字，并附加到名字列表。
请帮帮忙

python-3.x

来源：https://stackoverflow.com/questions/76778598/extract-info-using-pagination-selenium-bs4-python

1条答案

按热度按时间

dkqlctbz1#

通常，我用于分页的逻辑是

while True:
    ## PAGE SCRAPING CODE [ie, your current code]
    
    ## SEARCH FOR NEXT PAGE [button/link]
    ### IF NEXT PAGE --> click button or go to link
    ### NO NEXT PAGE --> BREAK

字符串
如果你包括你试图刮的链接，我也许可以给予你一个更具体的答案。例如，this是一个我经常用来抓取分页数据的函数，尽管它并不适用于可滚动页面。

赞(0）回复(0）举报 2023-08-08

我来回答

使用分页提取信息- selenium bs 4 python [已关闭]

1条答案

相关问题

热门标签

最新问答