将www.example.com转换request.post为scrapy,请求但未触发回调

kcwpcxri 于 2022-11-23 发布在其他

关注(0)|答案(1)|浏览(119)

我有一个工作代码在scrapy项目，但与request.post.

response = requests.post(url,
data=json.dumps({
"var_a": "var_a",
"var_b": [var_b],
}),
headers={
'content-type': 'application/json',
'cookie': cookie,
})
return response.json()

但是当我试图将它转换为scrapy.request时，回调函数没有被触发。我试过errback，但是也没有被调用。如果有人遇到过同样的问题，请告诉我。

报废代码：

def start_requests(self):
    for listing_item in self.get_listing_items():
        restaurant_url = listing_item.get('restaurant_url')
        yield self.generate_request(restaurant_url)

def generate_request(self):

    headers = {
        'content-type': 'application/json',
        'x-csrf-token': self.x_csrf_token,
    }
    payload = {'var_c': var_c}
    return Request(
        url=self.url,
        headers=headers,
        method='POST',
        body=json.dumps(payload),
        callback=self.parse_restaurant,
        priority=2
    )

def parse_restaurant(self, response):
    try:
        data = json.loads(response.body)
        restaurant = data['data']
    except:
        self.logger.debug('Invalid response, %s' % response.body)
        return
    
    loader = ItemLoader()
    loader.add_restaurant(self._get_menu_item(restaurant))

def _get_menu_item(self, restaurant):    
    subset = []
    for x_item in self.x_items:
        x = self.get_super_item(x_item, restaurant, cookie)
        subset.append(x)
    return subset

def _get_super_item(self, selector, restaurant, cookie):
    yield scrapy.Request(url=self.url,
               method='POST',
               body={
                     "var_a": "var_a",
                     "var_b": "var_b",
                 },
               headers={
                     'content-type': 'application/json',
                     'cookie': cookie,
                     'x-csrf-token': 'x'
                 },
               callback=self._get_super_item_v2,
               )

def _get_super_item_v2(self, response):  # not being called
    print('resp:', response.json())

scrapy

来源：https://stackoverflow.com/questions/74532539/convert-request-post-to-scrapy-request-but-callback-not-firing

1条答案

按热度按时间

euoag5mw1#

问题在于，您在_get_super_menu_item()方法中创建的任何请求都不会被分派到Crawler，因此它们实际上根本就不会被发出。
一个scrapy请求对象只是一个容器，它保存了如何构造一个消息发送到一个url的信息。一旦一个请求被创建，它需要被返回/产生回scrapy引擎，以便它实际发送消息请求。
在parse restaurant方法中，最后一行是

loader.add_restaurant(self._get_menu_item(restaurant))

这意味着调用self._get_menu_item(restaurant)的返回值将是提供给loader.add_restaurant()的参数，也意味着this方法不会向scrapy爬虫返回任何内容。
然后在_get_menu_item方法中，您可以：

subset = []
for x_item in self.x_items:
    x = self.get_super_item(x_item, restaurant, cookie)
    subset.append(x)
return subset

所以subset集合是返回给parse_restaurant方法并用作itemloader.load_restaurant方法的参数。
subset的内容由get_super_item方法的输出决定，该方法是创建请求对象的地方。

def _get_super_item(self, selector, restaurant, cookie):
    yield scrapy.Request(url=self.url, method='POST',
                         body={"var_a": "var_a", var_b": "var_b"},
                         headers={'content-type': 'application/json',
                                 'cookie': cookie, 'x-csrf-token': 'x'},
                         callback=self._get_super_item_v2)

所以_get_super_item()发出的每一个请求对象都被添加到subset中，然后返回subset并用作item.load_restaurant调用的参数。
假设没有其他问题，修复是非常容易的。在parse_restaraunts方法中，只需将返回的subset赋给一个变量，然后逐个迭代生成每个请求。
TL/DR：以下是解决方案/修复

def parse_restaurant(self, response):
    try:
        data = json.loads(response.body)
        restaurant = data['data']
    except:
        self.logger.debug('Invalid response, %s' % response.body)
        return
    
    subset = self._get_menu_item(restaurant)
    for request in subset:
        yield request
    loader = ItemLoader()  # not really sure what this does since it will get
    loader.add_restaurant(subset)  # destroyed once the method finishes anyway.

赞(0）回复(0）举报 2022-11-23

我来回答

将www.example.com转换request.post为scrapy,请求但未触发回调

1条答案

相关问题

热门标签

最新问答