我在一个电子商务网站上和一个有争议的剧作家一起刮胡子,当我和一个无头的人刮胡子时:真的,我得到了403错误,但与无头假我得到了200,我甚至尝试随机用户代理仍然被阻止。
scrapy 是运行与火狐剧作家驱动程序和webkit驱动程序,但它需要这么多的时间,我想运行它与 chrome
def make_request_from_data(self, data):
payload = json.loads(data)
isbn = payload["isbn"]
url = f"https://www.barnesandnoble.com/s/{isbn}"
meta = {
"region": self.region,
"isbn": isbn,
"playwright": True,
"playwright_include_page": True,
"playwright_context": f"context-{isbn}",
"playwright_context_kwargs": {
"java_script_enabled": True,
},
}
headers = {
"accept-encoding": "gzip, deflate, br",
"accept-language": "en",
"cache-control": "no-cache",
"pragma": "no-cache",
"sec-fetch-dest": "document",
"sec-fetch-mode": "navigate",
"sec-fetch-site": "none",
"sec-fetch-user": "?1",
"upgrade-insecure-requests": "1",
"user-agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36",
}
yield Request(
headers = headers,
url = url,
callback=self.parse,
errback=self.close_context_on_error,
meta=meta,
dont_filter=True,
)
Isbn是这本书的代码,我的猜测是与 chrome 版本,我不知道如何降级 chrome 版本在剧作家
1条答案
按热度按时间xienkqul1#
我只是通过在浏览器上下文中添加自定义UserAgent来修复相同的问题。
https://jsoverson.medium.com/how-to-bypass-access-denied-pages-with-headless-chrome-87ddd5f3413c
How to add custom headers in Playwright