将变量传递给Scrapy

v64noz0r  于 2022-11-09  发布在  其他
关注(0)|答案(1)|浏览(139)

我有一个基本的scrapy项目,在那里我硬编码了2个变量- pProd和pReviews。我现在想要么从csv文件中读取这些变量,要么在调用spider时传递它们。我已经尝试了几个小时,但在调用spider时使用-a属性似乎毫无进展。例如:

scrapy crawl myspider -a Prod="P123" -a Revs="200" -o test.csv

下面是我的代码和硬编码变量:

import scrapy
from scrapy import Spider, Request
import re
import json

class myspider(Spider):
    name = 'myspider'
    allowed_domains = ['mydom.com']
    start_urls = ['https://api.mydom.com']

    def start_requests(self):
        urls = ["https://api.mydom.com"]
        pProd = "P123"
        pReviews = 200
        for url in urls:
            #Generate URL as API only brings back 100 at a time
            for i in range(0, pReviews, 100):
                links = 'https://api.mydom.com/data/reviews.json?Filter=ProductId%3A' + pProd + '&Offset=' + str(i) + '&passkey=123qwe'
                yield scrapy.Request(
                    url=str(links),
                    cb_kwargs={'ProductID' : pProd},
                    callback=self.parse_reviews,
                )

    def parse_reviews(self, response, ProductID):
        data = json.loads(response.text)
        proddata = data['Includes']
        reviews = data['Results']
        p_prodid = ProductID
        try:
            p_prodcat = proddata['Products'][ProductID]['CategoryId']
        except:
            p_prodcat = None

        for review in reviews:
            try:
                r_reviewdate = review['SubmissionTime']
            except:
                r_reviewdate = None

            yield{
                'prodid' : p_prodid,
                'prodcat' : p_prodcat,
                'reviewdate' : r_reviewdate,
            }

我尝试了几种不同的方法,包括在def start_requests中添加变量名,如:

def start_requests(self, pProd='', pReviews='',**kwargs):

但似乎没有得到任何地方。希望能得到一点指导,我哪里错了。

mspsb9vt

mspsb9vt1#

每次编写Scrapy的spider代码时,不必声明构造函数(init),只需像以前那样指定参数即可:

scrapy crawl myspider -a parameter1=value1 -a parameter2=value2

在您的spider代码中,您可以将它们用作spider参数:

class MySpider(Spider):
    name = 'myspider'
    ...
    def parse(self, response):
        ...
        if self.parameter1 == value1:
            # this is True

        # or also
        if getattr(self, parameter2) == value2:
            # this is also True

相关问题