python 使用xpath会给出空输出

c3frrgcw  于 2023-01-24  发布在  Python
关注(0)|答案(2)|浏览(636)

我想得到地址,但他们给我空的,我在XPath中做错了什么...这些是页面链接https://www.findtruckservice.com/page/cummins-sales-and-service-farmington-nm-430653
地址的快照:

代码试验:

import scrapy
from scrapy import Selector
from scrapy_selenium import SeleniumRequest
from scrapy.http import Request

class TestSpider(scrapy.Spider):
    name = 'test'

    
    
    def start_requests(self):
            yield SeleniumRequest(
                url ="https://www.findtruckservice.com/search/?city=Florida%2C+CO&mainCat=1&subCat=Truck+Repair&lat=37.0731&lon=-106.247&cat_field=Mobile+Repair+-+Truck+Repair",
                wait_time = 3,
                screenshot = True,
                callback = self.parse,
                dont_filter = True
                )
    
    def parse(self, response):
            books = response.xpath("//h3//a//@href").extract()
            for book in books:
                url = response.urljoin(book)
                yield Request(url, callback=self.parse_book)
            
                    
    def parse_book(self, response):
            address=response.xpath("//div[1][@class='threecol align_left card']//div//text()").get()
            yield{
                'address':address
            }
vaj7vani

vaj7vani1#

要从website打印所需的文本,需要为visibility_of_element_located()引入WebDriverWait,可以使用以下locator strategies之一:

  • 使用 * XPATH * 和 * text * 属性:
driver.get("https://www.findtruckservice.com/page/cummins-sales-and-service-farmington-nm-430653")
print(WebDriverWait(driver, 5).until(EC.visibility_of_element_located((By.XPATH, "//h4[@class='sec-title' and text()='CONTACT']//following::div[@class='container']"))).text)
  • 使用 * XPATH * 和get_attribute("textContent")
driver.get("https://www.findtruckservice.com/page/cummins-sales-and-service-farmington-nm-430653")
print(WebDriverWait(driver, 5).until(EC.visibility_of_element_located((By.XPATH, "//h4[@class='sec-title' and text()='CONTACT']//following::div[@class='container']"))).get_attribute("textContent"))
      • 注意**:您必须添加以下导入:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
  • 控制台输出:
Cummins Sales and Service
1101 N Troy King Rd
Farmington, NM
505-327-7331 (primary)
505-326-2948 (fax)

参考文献

有用文档链接:

az31mfrm

az31mfrm2#

请尝试以下操作:

[...]

address = ' '.join([x.strip() for x in response.xpath("//div[@class='threecol align_left card'][1]/div[@class='container']/text()").extract()])

相关问题