scrapy 正在尝试抓取表提供空输出

mmvthczy  于 2022-11-09  发布在  其他
关注(0)|答案(1)|浏览(152)

我刮表,但他们会给我提供空的输出论文是页面链接https://www.sidmartinbio.org/why-is-the-jugular-vein-so-important/

from scrapy.http import Request
import scrapy
class PushpaSpider(scrapy.Spider):
    name = 'pushpa'
    page_number = 1
    start_urls = ['https://www.sidmartinbio.org/why-is-the-jugular-vein-so-important/']
    custom_settings = {
        'CONCURRENT_REQUESTS_PER_DOMAIN': 1,
        'DOWNLOAD_DELAY': 1,
        'USER_AGENT': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36'
    }

    def parse(self, response):
        details={}
        key=response.xpath("//table//tbody/tr/td[1]/text()").get()
        value=response.xpath("//table//tbody/tr/td[2]/text()").get()
        details[key]=value

        yield details
uujelgoq

uujelgoq1#

正确选择xpath有点困难。现在可以了。

from scrapy.http import Request
import scrapy

class PushpaSpider(scrapy.Spider):
    name = 'pushpa'
    page_number = 1
    start_urls = [
        'https://www.sidmartinbio.org/why-is-the-jugular-vein-so-important']

    def parse(self, response):
        details={}
        key=response.xpath("//td[contains(.,'Source')]/text()").get()
        value=response.xpath("//td[contains(.,'Source')]/following-sibling::td/text()").get()
        details[key]=value

        yield details

输出:

{'Source': 'Sigmoid sinus and Inferior petrosal sinus'}

相关问题