我正在尝试抓取以下网页:
“网址:https://www.peterpanbmw.com/used-vehicles/“
我没有使用scrapy编写UI scraper,因为页面上的数据是通过JavaScript加载的,我只是尝试使用页面上的底层API。
当检查Chrome中的网络选项卡时,它看起来好像是由algolia使用以下参数处理底层搜索查询的数据:
URL为https://sewjn80htn-dsn.algolia.net/1/indexes/*/queries?x-algolia-agent=Algolia%20for%20JavaScript%20(4.9.1)%3B%20Browser%20(lite)%3B%20JS%20Helper%20(3.4.4)&x-algolia-api-key=179608f32563367799314290254e3e44&x-algolia-application-id=SEWJN80HTN
headers = {
'Accept-Encoding': "gzip, deflate, br",
'Accept-Language': "en-US,en;q=0.9",
'Connection': "keep-alive",
'Content-Length': 1702,
'Host': "sewjn80htn-dsn.algolia.net",
'Origin': "https://www.peterpanbmw.com",
'Referer': "https://www.peterpanbmw.com/",
'Sec-Fetch-Dest': "empty",
'Sec-Fetch-Mode': "cors",
'Sec-Fetch-Site': "cross-site",
'User-Agent': "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36",
'content-type': "application/x-www-form-urlencoded",
'sec-ch-ua': ' "Not.A/Brand";v="8", "Chromium";v="114", "Google Chrome";v="114" ',
'sec-ch-ua-mobile': '?0',
'sec-ch-ua-platform': "macOS",
}
字符串
因此,我在python中尝试了以下方法:
import requests
url = "https://sewjn80htn-dsn.algolia.net/1/indexes/*/queries?x-algolia-agent=Algolia%20for%20JavaScript%20(4.9.1)%3B%20Browser%20(lite)%3B%20JS%20Helper%20(3.4.4)&x-algolia-api-key=179608f32563367799314290254e3e44&x-algolia-application-id=SEWJN80HTN"
heads = {
'Accept-Encoding': "gzip, deflate, br",
'Accept-Language': "en-US,en;q=0.9",
'Connection': "keep-alive",
'Content-Length': "1702",
'Host': "sewjn80htn-dsn.algolia.net",
'Origin': "https://www.peterpanbmw.com",
'Referer': "https://www.peterpanbmw.com/",
'Sec-Fetch-Dest': "empty",
'Sec-Fetch-Mode': "cors",
'Sec-Fetch-Site': "cross-site",
'User-Agent': "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36",
'content-type': "application/x-www-form-urlencoded",
'sec-ch-ua': '"Not.A/Brand";v="8", "Chromium";v="114", "Google Chrome";v="114"',
'sec-ch-ua-mobile': '?0',
'sec-ch-ua-platform': "macOS",
}
response = requests.get(url, headers=heads)
型
但这一直在我身上出错。
有可能像这样调用algolia api吗?...任何帮助都将不胜感激
1条答案
按热度按时间dfddblmv1#
我也有同样的问题。你能找到答案吗?给你发表。谢谢。