我正在尝试从https://www.ouedkniss.com/boutiques/immobilier获取数据。我发现ouedkniss.com正在使用GraphQL API。我尝试使用此API,但无法提取数据和分页。显示错误。AttributeError: 'list' object has no attribute 'get'
我不知道我是否遗漏了其他内容。以下是我迄今为止尝试的内容:
import scrapy
import json
from ..items import OuedknissItem
from scrapy.loader import ItemLoader
class StoresSpider(scrapy.Spider):
name = 'stores'
allowed_domains = ['www.ouedkniss.com']
def start_requests(self):
payload = json.dumps([
{
"operationName": "SearchStore",
"query": "query Campaign($slug: String!) {\n project(slug: $slug) {\n id\n isSharingProjectBudget\n risks\n story(assetWidth: 680)\n currency\n spreadsheet {\n displayMode\n public\n url\n data {\n name\n value\n phase\n rowNum\n __typename\n }\n dataLastUpdatedAt\n __typename\n }\n environmentalCommitments {\n id\n commitmentCategory\n description\n __typename\n }\n __typename\n }\n}\n",
"variables": {
"q": "", "filter": {
"categorySlug": "immobilier",
"count": 12, "page": 1},
"categorySlug": "immobilier",
"count": 12,
"page": 1
},
}
])
headers= {
"Content-Type": "application/json",
# "X-Requested-With": "XMLHttpRequest",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36"
}
yield scrapy.Request(
url='https://api.ouedkniss.com/graphql',
method="POST",
headers=headers,
body=payload,
callback=self.parse
)
return super().start_requests()
def parse(self, response):
json_resp = json.loads(response.body)
# print(json_resp)
stores = json_resp.get('data')[0].get('stores').get('data')
for store in stores:
loader = ItemLoader(item=OuedknissItem())
loader.add_value('name', store.get('name'))
yield loader.load_item()
3条答案
按热度按时间h5qlskok1#
你的有效载荷json数据格式不好,这就是为什么输出是验证器错误。现在它工作正常。
输出:
gmxoilav2#
看起来您没有将所需的变量传递到查询中。
您拥有:
此查询需要单个变量
slug
。同时,您的变量为:
(You顺便说一句,这里有两次
count
和categorySlug
)试试看:
您可能应该检查
response.ok
,以确保在尝试解析查询之前查询成功。mitkmikd3#
我认为您不能使用ouedkniss API,因为策略请求只允许用于源请求,如下所示。enter image description here