Python脚本在Windows上运行正常,但在Mac上会抛出错误

ctehm74n  于 2022-10-30  发布在  Python
关注(0)|答案(1)|浏览(381)

bounty将在6天后过期。回答此问题可获得+50的声望奖励。SMTH希望吸引更多人关注此问题。

我希望使用requests模块从这个webpage中获取不同的产品信息。我已经用Python创建了一个脚本,通过发出带有适当参数的post请求来获取JSON响应。
该脚本在windows上运行良好,但在mac上抛出此错误JSONDecodeError: Expecting value: line 1 column 1 (char 0)
下面是我的尝试:

import requests

start_url = 'https://www.jumbo.com/producten/'
link = 'https://www.jumbo.com/api/graphql'

payload = {"operation":"searchResult","variables":{"searchTerms":"","sortOption":"","showMoreIds":"","offSet":0,"pageSize":24,"categoryUrl":"/producten/"},"query":"\n  fragment productFields on Product {\n    id: sku\n    brand\n    badgeDescription\n    category\n    subtitle: packSizeDisplay\n    title\n    image\n    inAssortment\n    availability {\n      availability\n      isAvailable\n      label\n    }\n    isAvailable\n    isSponsored\n    link\n    status\n    retailSet\n    prices: price {\n      price\n      promoPrice\n      pricePerUnit {\n        price\n        unit\n      }\n    }\n    crossSellSkus\n    quantityDetails {\n      maxAmount\n      minAmount\n      stepAmount\n      defaultAmount\n      unit\n    }\n    quantityOptions {\n      maxAmount\n      minAmount\n      stepAmount\n      unit\n    }\n    primaryBadge: primaryBadges {\n      alt\n      image\n    }\n    secondaryBadges {\n      alt\n      image\n    }\n    promotions {\n      id\n      group\n      isKiesAndMix\n      image\n      tags {\n        text\n        inverse\n      }\n      start {\n        dayShort\n        date\n        monthShort\n      }\n      end {\n        dayShort\n        date\n        monthShort\n      }\n      attachments{\n        type\n        path\n      }\n    }\n  }\n\n  query searchResult(\n    $searchTerms: String\n    $filters: String\n    $offSet: Int\n    $showMoreIds: String\n    $sortOption: String\n    $pageSize: Int\n    $categoryUrl: String\n  ) {\n    searchResult(\n      searchTerms: $searchTerms\n      filters: $filters\n      offSet: $offSet\n      showMoreIds: $showMoreIds\n      sortOption: $sortOption\n      pageSize: $pageSize\n      categoryUrl: $categoryUrl\n    ) {\n      canonicalRelativePath\n      categoryIdPath\n      categoryTiles {\n        id\n        label\n        imageLink\n        navigationState\n        siteRootPath\n      }\n      urlState\n      newUrl\n      redirectUrl\n      shelfDescription\n      removeAllAction\n      powerFilters {\n        displayName\n        navigationState\n        siteRootPath\n      }\n      metaData {\n        title\n        description\n      }\n      headerContent {\n        headerText\n        count\n      }\n      helperText {\n        show\n        shortBody\n        longBody\n        header\n        linkText\n        targetUrl\n        messageType\n      }\n      recipeLink {\n        linkText\n        targetUrl\n        textIsRich\n      }\n      guidedNavigation {\n        ancestors {\n          label\n        }\n        displayName\n        dimensionName\n        groupName\n        name\n        multiSelect\n        moreLink {\n          label\n          navigationState\n        }\n        lessLink {\n          label\n          navigationState\n        }\n        refinements {\n          label\n          count\n          multiSelect\n          navigationState\n          siteRootPath\n        }\n      }\n      selectedRefinements {\n        refinementCrumbs {\n          label\n          count\n          multiSelect\n          dimensionName\n          ancestors {\n            label\n            navigationState\n          }\n          removeAction {\n            navigationState\n          }\n        }\n        searchCrumbs {\n         terms\n         removeAction {\n          navigationState\n         }\n        }\n        removeAllAction {\n         navigationState\n        }\n      }\n      socialLists {\n        title\n        totalNumRecs\n        lists {\n          id\n          title\n          followers\n          productImages\n          thumbnail\n          author\n          labels\n          isAuthorVerified\n        }\n      }\n      mainContent {\n        searchWarning\n        searchAdjustments {\n          originalTerms\n          adjustedSearches {\n            key\n            terms {\n              autoPhrased\n              adjustedTerms\n              spellCorrected\n            }\n          }\n        }\n      }\n      productsResultList {\n        pagingActionTemplate {\n          navigationState\n        }\n        lastRecNum\n        totalNumRecs\n        sortOptions {\n          navigationState\n          label\n          selected\n        }\n        products {\n          ...productFields\n          retailSetProducts {\n            ...productFields\n          }\n        }\n      }\n    }\n  }\n"}

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36',
    'accept': 'application/json, text/plain, */*',
    'referer': 'https://www.jumbo.com/producten/',
    'origin': 'https://www.jumbo.com',
    'accept-encoding': 'gzip, deflate, br',
    'accept-language': 'en-US,en;q=0.9,bn;q=0.8',
}

with requests.Session() as s:
    s.headers.update(headers)
    s.get(start_url)
    res = s.post(link,json=payload)
    print(res.json())

如何在Mac上使用?
这个video demo表示当我在windows上执行脚本时它的执行方式。

suzh9iv8

suzh9iv81#

您的请求标头包括:

'accept-encoding': 'gzip, deflate, br'

并且后响应包括:

'Content-Encoding': 'br'

因此,使用Brotli compression压缩响应,requests模块不会自动处理该响应。
下面是如何解码它(注意,这是硬编码以使用Brotli,并且为了健壮,它应该检查头部的Content-Encoding

import requests
import brotli
import json
import pprint

start_url = 'https://www.jumbo.com/producten/'
link = 'https://www.jumbo.com/api/graphql'

payload = {"operation":"searchResult","variables":{"searchTerms":"","sortOption":"","showMoreIds":"","offSet":0,"pageSize":24,"categoryUrl":"/producten/"},"query":"\n  fragment productFields on Product {\n    id: sku\n    brand\n    badgeDescription\n    category\n    subtitle: packSizeDisplay\n    title\n    image\n    inAssortment\n    availability {\n      availability\n      isAvailable\n      label\n    }\n    isAvailable\n    isSponsored\n    link\n    status\n    retailSet\n    prices: price {\n      price\n      promoPrice\n      pricePerUnit {\n        price\n        unit\n      }\n    }\n    crossSellSkus\n    quantityDetails {\n      maxAmount\n      minAmount\n      stepAmount\n      defaultAmount\n      unit\n    }\n    quantityOptions {\n      maxAmount\n      minAmount\n      stepAmount\n      unit\n    }\n    primaryBadge: primaryBadges {\n      alt\n      image\n    }\n    secondaryBadges {\n      alt\n      image\n    }\n    promotions {\n      id\n      group\n      isKiesAndMix\n      image\n      tags {\n        text\n        inverse\n      }\n      start {\n        dayShort\n        date\n        monthShort\n      }\n      end {\n        dayShort\n        date\n        monthShort\n      }\n      attachments{\n        type\n        path\n      }\n    }\n  }\n\n  query searchResult(\n    $searchTerms: String\n    $filters: String\n    $offSet: Int\n    $showMoreIds: String\n    $sortOption: String\n    $pageSize: Int\n    $categoryUrl: String\n  ) {\n    searchResult(\n      searchTerms: $searchTerms\n      filters: $filters\n      offSet: $offSet\n      showMoreIds: $showMoreIds\n      sortOption: $sortOption\n      pageSize: $pageSize\n      categoryUrl: $categoryUrl\n    ) {\n      canonicalRelativePath\n      categoryIdPath\n      categoryTiles {\n        id\n        label\n        imageLink\n        navigationState\n        siteRootPath\n      }\n      urlState\n      newUrl\n      redirectUrl\n      shelfDescription\n      removeAllAction\n      powerFilters {\n        displayName\n        navigationState\n        siteRootPath\n      }\n      metaData {\n        title\n        description\n      }\n      headerContent {\n        headerText\n        count\n      }\n      helperText {\n        show\n        shortBody\n        longBody\n        header\n        linkText\n        targetUrl\n        messageType\n      }\n      recipeLink {\n        linkText\n        targetUrl\n        textIsRich\n      }\n      guidedNavigation {\n        ancestors {\n          label\n        }\n        displayName\n        dimensionName\n        groupName\n        name\n        multiSelect\n        moreLink {\n          label\n          navigationState\n        }\n        lessLink {\n          label\n          navigationState\n        }\n        refinements {\n          label\n          count\n          multiSelect\n          navigationState\n          siteRootPath\n        }\n      }\n      selectedRefinements {\n        refinementCrumbs {\n          label\n          count\n          multiSelect\n          dimensionName\n          ancestors {\n            label\n            navigationState\n          }\n          removeAction {\n            navigationState\n          }\n        }\n        searchCrumbs {\n         terms\n         removeAction {\n          navigationState\n         }\n        }\n        removeAllAction {\n         navigationState\n        }\n      }\n      socialLists {\n        title\n        totalNumRecs\n        lists {\n          id\n          title\n          followers\n          productImages\n          thumbnail\n          author\n          labels\n          isAuthorVerified\n        }\n      }\n      mainContent {\n        searchWarning\n        searchAdjustments {\n          originalTerms\n          adjustedSearches {\n            key\n            terms {\n              autoPhrased\n              adjustedTerms\n              spellCorrected\n            }\n          }\n        }\n      }\n      productsResultList {\n        pagingActionTemplate {\n          navigationState\n        }\n        lastRecNum\n        totalNumRecs\n        sortOptions {\n          navigationState\n          label\n          selected\n        }\n        products {\n          ...productFields\n          retailSetProducts {\n            ...productFields\n          }\n        }\n      }\n    }\n  }\n"}

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36',
    'accept': 'application/json, text/plain, */*',
    'referer': 'https://www.jumbo.com/producten/',
    'origin': 'https://www.jumbo.com',
    'accept-encoding': 'gzip, deflate, br',
    'accept-language': 'en-US,en;q=0.9,bn;q=0.8',
}

with requests.Session() as s:
    s.headers.update(headers)
    s.get(start_url)
    res = s.post(link,json=payload)
    body = brotli.decompress(res.content)
    pprint.pprint(json.loads(body))

一个更好的解决方案是从accept-encoding中删除brrequests模块会自动为您处理gzipdeflate,但不会处理br

相关问题