scrapy 正在尝试抓取表提供空输出

mmvthczy 于 2022-11-09 发布在其他

关注(0)|答案(1)|浏览(172)

我刮表，但他们会给我提供空的输出论文是页面链接https://www.sidmartinbio.org/why-is-the-jugular-vein-so-important/

from scrapy.http import Request
import scrapy
class PushpaSpider(scrapy.Spider):
    name = 'pushpa'
    page_number = 1
    start_urls = ['https://www.sidmartinbio.org/why-is-the-jugular-vein-so-important/']
    custom_settings = {
        'CONCURRENT_REQUESTS_PER_DOMAIN': 1,
        'DOWNLOAD_DELAY': 1,
        'USER_AGENT': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36'
    }

    def parse(self, response):
        details={}
        key=response.xpath("//table//tbody/tr/td[1]/text()").get()
        value=response.xpath("//table//tbody/tr/td[2]/text()").get()
        details[key]=value

        yield details

scrapy

来源：https://stackoverflow.com/questions/71938354/trying-t-scrape-table-provide-empty-output

1条答案

按热度按时间

uujelgoq1#

正确选择xpath有点困难。现在可以了。

from scrapy.http import Request
import scrapy

class PushpaSpider(scrapy.Spider):
    name = 'pushpa'
    page_number = 1
    start_urls = [
        'https://www.sidmartinbio.org/why-is-the-jugular-vein-so-important']

    def parse(self, response):
        details={}
        key=response.xpath("//td[contains(.,'Source')]/text()").get()
        value=response.xpath("//td[contains(.,'Source')]/following-sibling::td/text()").get()
        details[key]=value

        yield details

输出：

{'Source': 'Sigmoid sinus and Inferior petrosal sinus'}

赞(0）回复(0）举报 2022-11-09

我来回答

scrapy 正在尝试抓取表提供空输出

1条答案

相关问题

热门标签

最新问答