scrapy 我对下拉菜单“元素未附加到页面文档”有问题

vmpqdwk3  于 2023-03-18  发布在  其他
关注(0)|答案(1)|浏览(102)

我尝试循环浏览下拉列表中的所有选项。在第二次迭代中,我得到了一个错误-
“ selenium 元素.常见.异常.失效元素引用异常:消息:过时的元素引用:元素未附加到页面文档”“。
请帮助我如何解决这个错误。
超文本:

<select class="form-select form-select--small" name="attribute[273]" id="attribute_select_273" required="">
  <option value="">Choose Options</option>
  <option data-product-attribute-value="497" value="497">Small</option>
  <option data-product-attribute-value="498" value="498">Medium</option>
  <option data-product-attribute-value="499" value="499">Large</option>
</select>

报废代码:

def parse_product_page(self, response):
    products = []
    self.driver.get(response.url)
    elements = self.driver.find_elements(by=By.TAG_NAME, value="Option")
    for option in elements:
        option.click()
        time.sleep(5)
        page_res = response.replace(body=self.driver.page_source)
        category = page_res.meta['category']
        product = page_res.css('h1.productView-title::text').get()
        price = page_res.css('.productView-price .price--withTax::text').get()
        bc = page_res.css('.breadcrumb span::text').extract()
        breadcrumb = ' > '.join(bc)
        product_url = page_res.url
        money_back = page_res.css('article[class*="30-day"]').extract()
        meta_description = page_res.css("article[class*=productView-description] p::text,article[class*=productView-description] strong::text").extract()

        WebDriverWait(self.driver, 10).until(EC.frame_to_be_available_and_switch_to_it(
            (By.CSS_SELECTOR, "div#trustpilotReviewsWidget iframe[title='Customer reviews powered by Trustpilot']")))
        try:
            rating = WebDriverWait(self.driver, 10).until(EC.visibility_of_element_located((By.CSS_SELECTOR,
                 "div.tp-widget-summary__information div.tp-widget-summary__rating > span.rating"))).text
        except:
            rating = 0
        try:
            reviews = WebDriverWait(self.driver, 10).until(EC.visibility_of_element_located((By.CSS_SELECTOR,
            "div.tp-widget-summary__information div.tp-widget-summary__rating > span.tp-widget-summary__count > strong"))).text
        except:
            reviews = 0
        try:
            sku1 = page_res.css('.sku-label::text').get()
            sku2 = page_res.css('.productView-info-value::text').get()
            sku = sku1 + sku2
        except:
            sku = ''
        description = page_res.css('.custom-message-area p::text').get().replace('\n', '').replace('\r', '').strip()
        product_dic = {
            'category': category,
            "sku": sku,
            "product": product,
            "description": f'{description}\n{"".join(meta_description) if len(meta_description) > 0 else ""}',
            "price": price.replace("$", ""),
            "breadcrumb": breadcrumb,
            "product_url": product_url,
            "money_back": True if len(money_back) > 0 else False,
            "rating": rating,
            "total_reviews": reviews
        }
        products.append(product_dic)
    for pro_dict in products:
        yield pro_dict
6yjfywim

6yjfywim1#

问题是,在第三行中,您从页面填充了一个集合elements,但是当您单击该选项时,页面将重新加载,这会断开到elements集合的链接。

self.driver.get(response.url)
elements = self.driver.find_elements(by=By.TAG_NAME, value="Option")
for option in elements:

self.driver.get(response.url)
for option in self.driver.find_elements(by=By.TAG_NAME, value="Option"):

在新代码中,您将在每个循环中重新获取OPTION的整个集合,而不是依赖于在循环之前获取的集合。
我建议你读一些关于什么是陈旧元素以及如何修复它的文章,这将帮助你更好地理解这里出了什么问题以及如何在将来防止它。
此外,不需要重复示例化相同的WebDriverWait。您可以声明一个变量并重用它。

WebDriverWait(self.driver, 10).X
WebDriverWait(self.driver, 10).Y
WebDriverWait(self.driver, 10).Z

wait = WebDriverWait(self.driver, 10)
wait.X
wait.Y
wait.Z

相关问题