python 如何有所有的结果使用beautifulsoup selenium 和禁用自动测试页面？

u3r8eeie 于 2023-02-15 发布在 Python

关注(0)|答案(1)|浏览(152)

我试图网页抓取网站不知何故，它只显示24个结果，我如何加载所有结果与隐藏的自动测试页？
下面的代码：

# import library
    from selenium import webdriver
    from selenium.webdriver import Chrome
    import pandas as pd
    import bs4

    #create list
    items = []
    prices = []
    volumes = []

    driver = webdriver.Chrome()
    driver.get("https://www.fairprice.com.sg/category/milk-powder")
    soup = bs4.BeautifulSoup(driver.page_source, 'lxml')
    allelem = soup.find_all('div',class_='sc-1plwklf-0 iknXK product-container')

    #read all element
    for item in allelem:
      items.append(item.find('span', class_='sc-1bsd7ul-1 eJoyLL').text.strip())
  
    #read price
    for price in allelem:
      prices.append(price.find('span', class_='sc-1bsd7ul-1 sc-1svix5t-1 gJhHzP biBzHY').text.strip())

    #read volume
    for volume in allelem:
      volumes.append(volume.find('span', class_='sc-1bsd7ul-1 eeyOqy').text.strip())

    print(items)
    print(volumes)
    print(prices)

    #create dataframe
    final_array = []
    for item,price,volume in zip(items,prices,volumes):
     final_array.append({'Item':item,'Volume':volume,'Price':price})
    
    # covert to excel
    df = pd.DataFrame(final_array)
    print(df)
    df.to_excel('ntucv4milk.xlsx',index=False)

码结束

python

来源：https://stackoverflow.com/questions/75433364/how-to-have-all-the-results-using-beautifulsoup-with-selenium-and-disable-the-au

1条答案

按热度按时间

8cdiaqws1#

我的建议是定义三个列表（商品、价格、成交量），通过向下滚动页面，这些列表会逐渐增长。如果您有一个elements的Web元素列表，您可以通过运行

driver.execute_script('arguments[0].scrollIntoView({block: "center", behavior: "smooth"});', elements[-1])

然后，您所要做的就是等待新项加载，然后将它们添加到这三个列表中，如果在给定的时间（max_wait，即10秒）内没有项加载，则可能没有更多项要加载，我们可以中断循环。

items, prices, volumes = [], [], []
c = 0 # counter
max_wait = 10
no_new_items = False

while 1:
    items_new = driver.find_elements(By.CSS_SELECTOR, 'span[class="sc-1bsd7ul-1 eJoyLL"]')
    items   += [item.text.strip()  for item  in items_new[c:]]
    prices  += [price.text.strip() for price in driver.find_elements(By.CSS_SELECTOR, 'span[class="sc-1bsd7ul-1 sc-1svix5t-1 gJhHzP biBzHY"]')[c:]]
    volumes += [vol.text.strip()   for vol   in driver.find_elements(By.XPATH, '//span[@class="sc-1bsd7ul-1 eeyOqy"][1]')[c:]]
    counter = len(items)
    print(counter,'items scraped',end='\r')
    
    driver.execute_script('arguments[0].scrollIntoView({block: "center", behavior: "smooth"});', items_new[-1])

    items_loaded = items_new.copy()
    start = time.time()
    # wait up to `max_wait` seconds for new elements to be loaded
    while len(items_new) == len(items_loaded):
        items_loaded = driver.find_elements(By.CSS_SELECTOR, 'span[class="sc-1bsd7ul-1 eJoyLL"]')
        if time.time() - start > max_wait:
            no_new_items = True
            break
    if no_new_items:
        break

pd.DataFrame({'item':items,'price':prices,'volume':volumes})

产出

赞(0）回复(0）举报 2023-02-15

我来回答

python 如何有所有的结果使用beautifulsoup selenium 和禁用自动测试页面？

1条答案

相关问题

热门标签

最新问答