Scrapy -获取span标记之间拆分的2个值

hgtggwj0  于 2022-11-23  发布在  其他
关注(0)|答案(1)|浏览(163)

我正在使用Scrapy在页面上抓取表格:

import scrapy
from ..items import TestItem
from scrapy.loader import ItemLoader

class TestSpider(scrapy.Spider):
    name = 'test'

    def parse(self, response):
        items = response.xpath('//*[@id="12"]/div/div/div[2]/table/tbody/tr')
        for l in items:
            il = ItemLoader(item=TestItem(), selector=l)
            # From should be text before <span></span> and To should be after
            il.add_xpath('from', 'td[2]')
            il.add_xpath('to', 'td[2]')
            yield il.load_item()
        pass

示例数据:

<tr>
<td class="test">date</td>
<td class="test2">London<span></span>Prague</td>
</tr>

我需要将“伦敦”文本提交到“from”,将“布拉格”文本提交到“to”。换句话说,我如何拆分值?

i1icjdpr

i1icjdpr1#

尝试

items = response.xpath('//*[@id="12"]/div/div/div[2]/table/tbody/tr')
for l in items:
    tf = l.xpath('./td[@class="test2"]//text()').getall()
    il = ItemLoader(item=TestItem(), selector=l)
    # From should be text before <span></span> and To should be after
    il.add_value('from', tf[0])
    il.add_value('to', tf[1])
    yield il.load_item()

相关问题