scrapy 单击加载按钮元素,直到其处于活动状态

to94eoyn  于 2022-12-18  发布在  其他
关注(0)|答案(1)|浏览(155)

我正在为苹果播客写一个scrapy spider。我遇到了一个问题。假设我正在为这个页面(https://podcasts.apple.com/us/podcast/the-art-angle/id1484445852)抓取播客的详细信息,我需要selenium来点击“显示10个更多的剧集”,直到这个按钮可用,并且只有在此之后才抓取所有加载的数据。

def parse_details(self, response):
        #[@class="l-row"]
        self.driver = webdriver.Chrome()
        self.driver.get(response) #or url
        
        while True:
            try:
                load_btn = self.driver.find_element_by_xpath('//*[@class="link"]')
                load_btn.click()
            except:
                break

        loader = ItemLoader(item=AppleItem(), selector=response)
        name_xpath='//span[@class="product-header__title"]/text()'
        description_css='section.product-hero-desc__section div p::text'
        genre_xpath='//ul[@class="inline-list"]/li/text()'
        rating_css='span.we-customer-ratings__averages__display::text'
        num_of_reviews_css='div.we-customer-ratings__count.small-hide.medium-show::text'
        episodes_css='ul.inline-list.inline-list--truncate-single-line.tracks__track__eyebrow > li > time'
            
        loader.add_xpath('name',name_xpath)
        loader.add_css('description',description_css)
        loader.add_xpath('genre', genre_xpath)
        loader.add_css('rating', rating_css)
        loader.add_css('num_of_reviews', num_of_reviews_css)
        loader.add_css('last_episode', episodes_css)
        loader.add_css('first_episode', episodes_css)

        item = loader.load_item()

        yield item

我好像做错了什么,而且很明显不起作用.

ikfrs5lh

ikfrs5lh1#

我强烈建议对于此类任务使用https://www.selenium.dev/documentation/webdriver/waits/https://selenium-python.readthedocs.io/waits.html
例如,你可以等到按钮可用。例如,浏览器等待15秒,并检查每一秒加载按钮的可用性。例如,等待直到加载div不可见,以继续您的刮取。

from time import sleep    

from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as expect
from selenium.webdriver.common.keys import Keys

def parse_details(self, response):
    #[@class="l-row"]
    self.driver = webdriver.Chrome()
    self.driver.get(response) #or url
    
    while True:
        try:
            load_btn = WebDriverWait(self.driver, 15, 1).until(
                                    expect.element_to_be_clickable(
                                    (By.XPATH, '//*[@class="link"]')))
            load_btn.click()
            #sleeping until element will became invisible
            WebDriverWait(self.driver, 15, 1).until(
                          expect.invisibility_of_element_located
                         (By.XPATH, '//[@class="we-loading-spinner we-loading-spinner--small"]')))
            #Going on the bottom of the page if needed
            self.driver.execute_script('window.scrollBy(0, 3999)')
            #Going on the approximate button location
            self.driver.execute_script('window.scrollBy(0, -900)')
        except:
            break

相关问题