我正在尝试使用爬行蜘蛛抓取网站。当我在命令行上运行爬行时,我得到类型错误- start_requests()接受1个位置参数,给出了3个。我检查了中间件设置,其中**def process_start_requests(self,start_requests,spider)**有3个参数。我已经提到了这个问题-scrapy project middleware -TypeError: process_start_requests() takes 2 positional arguments but 3 were given,但无法解决这个问题。
import scrapy
from scrapy.linkextractors import LinkExtractor
from scrapy.spiders import CrawlSpider, Rule
from scrapy import Request
class FpSpider(CrawlSpider):
name = 'fp'
allowed_domains = 'foodpanda.com.bd'
rules = (Rule(LinkExtractor(allow=('product', 'pandamart')),
callback='parse_items', follow=True, process_request='start_requests'),)
def start_requests(self):
yield Request(url='https://www.foodpanda.com.bd/darkstore/vbpl/pandamart-gulshan-2', meta=dict(playwright=True),
headers={
'sec-ch-ua': '"Google Chrome";v="105", "Not)A;Brand";v="8", "Chromium";v="105"',
'Accept': 'application/json, text/plain, */*',
'Referer': 'https://www.foodpanda.com.bd/',
'sec-ch-ua-mobile': '?0',
'X-FP-API-KEY': 'volo',
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36',
'sec-ch-ua-platform': '"macOS"'
}
)
def parse_items(self, response):
item = {}
item['name'] = response.css('h1.name::text').get()
item['price'] = response.css('div.price::text').get()
item['original_price'] = response.css('div.original-price::text').get()
yield item
错误如下所示:Scrapy type error
1条答案
按热度按时间wkyowqbh1#
问题是这种说法:
process_request='start_requests'
.start_request
是保留的,用于第一次请求。如果您想为后续请求启用Playwright,我假设您正在尝试使用process_requests
,则需要为该函数使用一个不同的名称。请参见下面的代码:
另请注意,
allowed_domains
是一个列表,而不是字符串。