我正在使用Scrapy在页面上抓取表格:
import scrapy
from ..items import TestItem
from scrapy.loader import ItemLoader
class TestSpider(scrapy.Spider):
name = 'test'
def parse(self, response):
items = response.xpath('//*[@id="12"]/div/div/div[2]/table/tbody/tr')
for l in items:
il = ItemLoader(item=TestItem(), selector=l)
# From should be text before <span></span> and To should be after
il.add_xpath('from', 'td[2]')
il.add_xpath('to', 'td[2]')
yield il.load_item()
pass
示例数据:
<tr>
<td class="test">date</td>
<td class="test2">London<span></span>Prague</td>
</tr>
我需要将“伦敦”文本提交到“from”,将“布拉格”文本提交到“to”。换句话说,我如何拆分值?
1条答案
按热度按时间i1icjdpr1#
尝试