scrapy 无法在嵌套字典中循环:TypeError:型别'bool'的参数无法反覆运算

uujelgoq  于 2022-11-09  发布在  其他
关注(0)|答案(1)|浏览(124)

我试图在一个巨大的嵌套字典中搜索一个关键字。但是我遇到了这个错误,我不知道为什么。我检查了字典的类型,以及它的字典。也许是由于字典的某个部分?它太大了,我无法手动检查。

import scrapy

class MlSpider(scrapy.Spider):
    name = 'detalhador'

start_urls=['https://produto.mercadolivre.com.br/MLB-1304118411-sandalia-feminina-anabela-confortavel-pingente-mac-cod-133-_JM?attributes=COLOR_SECONDARY_COLOR%3AUHJldGE%3D%2CSIZE%3AMzU%3D&quantity=1']

def parse(self, response,**kwargs):
    import json

    d = response.xpath('//script[contains(., "window.__PRELOADED_STATE__")]/text()').re_first(r'(?s)window.__PRELOADED_STATE__ = (.+?\});')
    data = json.loads(d)

    temp='itemPrice'
    res = [val[temp] for key, val in data.items() if temp in val]

输出量:

Traceback (most recent call last):
  File "/home/deborador/anaconda3/envs/mapsmaps/lib/python3.10/site-packages/scrapy/utils/defer.py", line 132, in iter_errback
    yield next(it)
  File "/home/deborador/anaconda3/envs/mapsmaps/lib/python3.10/site-packages/scrapy/utils/python.py", line 354, in __next__
    return next(self.data)
  File "/home/deborador/anaconda3/envs/mapsmaps/lib/python3.10/site-packages/scrapy/utils/python.py", line 354, in __next__
    return next(self.data)
  File "/home/deborador/anaconda3/envs/mapsmaps/lib/python3.10/site-packages/scrapy/core/spidermw.py", line 66, in _evaluate_iterable
    for r in iterable:
  File "/home/deborador/anaconda3/envs/mapsmaps/lib/python3.10/site-packages/scrapy/spidermiddlewares/offsite.py", line 29, in process_spider_output
    for x in result:
  File "/home/deborador/anaconda3/envs/mapsmaps/lib/python3.10/site-packages/scrapy/core/spidermw.py", line 66, in _evaluate_iterable
    for r in iterable:
  File "/home/deborador/anaconda3/envs/mapsmaps/lib/python3.10/site-packages/scrapy/spidermiddlewares/referer.py", line 342, in <genexpr>
    return (_set_referer(r) for r in result or ())
  File "/home/deborador/anaconda3/envs/mapsmaps/lib/python3.10/site-packages/scrapy/core/spidermw.py", line 66, in _evaluate_iterable
    for r in iterable:
  File "/home/deborador/anaconda3/envs/mapsmaps/lib/python3.10/site-packages/scrapy/spidermiddlewares/urllength.py", line 40, in <genexpr>
    return (r for r in result or () if _filter(r))
  File "/home/deborador/anaconda3/envs/mapsmaps/lib/python3.10/site-packages/scrapy/core/spidermw.py", line 66, in _evaluate_iterable
    for r in iterable:
  File "/home/deborador/anaconda3/envs/mapsmaps/lib/python3.10/site-packages/scrapy/spidermiddlewares/depth.py", line 58, in <genexpr>
    return (r for r in result or () if _filter(r))
  File "/home/deborador/anaconda3/envs/mapsmaps/lib/python3.10/site-packages/scrapy/core/spidermw.py", line 66, in _evaluate_iterable
    for r in iterable:
  File "/home/deborador/Documentos/coding/mercadolivre/mercadolivre/mercadolivre/spiders/detalhador.py", line 22, in parse
    res = [val[temp] for key, val in data.items() if temp in val]
  File "/home/deborador/Documentos/coding/mercadolivre/mercadolivre/mercadolivre/spiders/detalhador.py", line 22, in <listcomp>
    res = [val[temp] for key, val in data.items() if temp in val]
TypeError: argument of type 'bool' is not iterable

字典的一部分

{"translations": {}, "initialState": {"id": "MLB1304118411", "variation_id": "46176185176", "layout": "vip-core", "vertical": "core", "components_locations": {"variations": "short_description"}, "components": {"head": [{"id": "compats_feedback", "type": "ui_message", "state": "HIDDEN", "closeable": false}, {"id": "related_searches", "type": "related_searches", "state": "VISIBLE", "title": {"text": "Voc\u00ea tamb\u00e9m pode gostar"}, "related_searches": [{"target": "https://lista.mercadolivre.com.br/chinelo-comfortflex#topkeyword", "timeout": 0, "duration": 0, "label": {"text": "chinelo comfortflex"}}, {"target": "https://lista.mercadolivre.com.br/cal\u00e7ados-andacco#topkeyword", "timeout": 0, "duration": 0, "label": {"text": "calcados andacco"}}, {"target": "https://lista.mercadolivre.com.br/chinelo-usaflex#topkeyword", "timeout": 0, "duration": 0, "label": {"text": "chinelo usaflex"}}, {"target": "https://lista.mercadolivre.com.br/chinelo-ortopedico#topkeyword", "timeout": 0, "duration": 0, "label": {"text": "chinelo ortopedico"}}, {"target": "https://lista.mercadolivre.com.br/chinelo-crocs-feminino#topkeyword", "timeout": 0, "duration": 0, "label": {"text": "chinelo crocs feminino"}}, {"target": "https://lista.mercadolivre.com.br/ramarim-sandalia#topkeyword", "timeout": 0, "duration": 0, "label": {"text": "ramarim sandalia"}}, {"target": "https://lista.mercadolivre.com.br/melissa-oficial#topkeyword", "timeout": 0, "duration": 0, "label": {"text": "melissa oficial"}}]}, {"id": "carousel_cheaper", "type": "carousel", "state": "VISIBLE", "carousel": {}, "carousel_config": {"site_id": "MLB", "item_id": "MLB1304118411", "category_id": "MLB273770", "client": "similar
t98cgbkg

t98cgbkg1#

出现错误的原因是,从json数据解析出的值之一是布尔值,而您正尝试从布尔值访问itemPrice键,该布尔值是int的子类,因此不支持key, value接口。
解决方案是使用更好的算法遍历字典。
例如:

import scrapy
import json

def findkeys(data, temp):
    # if isinstance(data, list):         # after looking at the json it appears
    #     for i in data:                 # its only dictionaries nested in more
    #         for x in findkeys(i, temp): # dictionaries and lists are never
    #             yield x                 # used so I comment out this block 
    if isinstance(data, dict):
        if temp in data:
            yield data[temp]
        for j in data.values():
            for x in findkeys(j, temp):
                yield x

class MlSpider(scrapy.Spider):
    name = 'detalhador'

    start_urls=['https://produto.mercadolivre.com.br/MLB-1304118411-sandalia-feminina-anabela-confortavel-pingente-mac-cod-133-_JM?attributes=COLOR_SECONDARY_COLOR%3AUHJldGE%3D%2CSIZE%3AMzU%3D&quantity=1']

    def parse(self, response,**kwargs):
         d = response.xpath('//script[contains(., "window.__PRELOADED_STATE__")]/text()').re_first(r'(?s)window.__PRELOADED_STATE__ = (.+?\});')
        data = json.loads(d)
        temp='itemPrice'
        lst = list(findkeys(data, temp))
        print(lst)
        # res = [val[temp] for key, val in data.items() if temp in val]

输出功率

[7.77, 7.77]

相关问题