- 我在试着把https://en.mzadqatar.com/qatar/cars/sale的前20页
- 该网站有一个XHR调用,在提供下一页之前检查用户的原始页。
- 如果用户进入第2页,站点将使用户返回主页。
- 您可以在上面的链接输入想要的答案,然后点击页面底部的“下一步”按钮。3
- 如果您直接通过此链接进入https://en.mzadqatar.com/qatar/cars/sale?page=1,网站将返回到主页。
我已经成功地使用请求通过下面的代码浏览了每个页面,但是我不能使用scrapy.Request复制相同的响应。我哪里出错了.....?
下面是使用requests库的成功代码:
import requests
url = "https://en.mzadqatar.com/search"
payload = "type_id=0&id=1&subCategoryId=&pagination=1&search_type=pagination&km_from=&km_to=&price_from=&price_to=&cityId=&CartypeID=&Fueltype=&subsubCategoryId=&gear=&CylinderNumber=&cars_guarantee=&car_condition=&carcolor=&manfactureYear_from=&manfactureYear_to="
headers = {
"cookie": "laravel_session=QYDOviHE487FjGC2FvIaAPNnNdypE9dQcupLrylL",
"authority": "en.mzadqatar.com",
"accept": "*/*",
"accept-language": "en-US,en;q=0.9,lo;q=0.8",
"content-type": "application/x-www-form-urlencoded; charset=UTF-8",
"dnt": "1",
"origin": "https://en.mzadqatar.com",
"referer": "https://en.mzadqatar.com/qatar/cars/sale",
"sec-ch-ua": "'Chromium';v='104', ' Not A;Brand';v='99', 'Google Chrome';v='104'",
"sec-ch-ua-mobile": "?0",
"sec-ch-ua-platform": "'macOS'",
"sec-fetch-dest": "empty",
"sec-fetch-mode": "cors",
"sec-fetch-site": "same-origin",
"sec-gpc": "1",
"user-agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36",
"x-requested-with": "XMLHttpRequest"
}
response = requests.request("POST", url, data=payload, headers=headers)
print(response.text)
url = "https://en.mzadqatar.com/qatar/cars/sale"
querystring = {"page":"1"}
payload = ""
headers = {
"cookie": "laravel_session=QYDOviHE487FjGC2FvIaAPNnNdypE9dQcupLrylL",
"authority": "en.mzadqatar.com",
"accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",
"accept-language": "en-US,en;q=0.9,lo;q=0.8",
"dnt": "1",
"referer": "https://en.mzadqatar.com/qatar/cars/sale",
"sec-ch-ua": "'Chromium';v='104', ' Not A;Brand';v='99', 'Google Chrome';v='104'",
"sec-ch-ua-mobile": "?0",
"sec-ch-ua-platform": "'macOS'",
"sec-fetch-dest": "empty",
"sec-fetch-mode": "cors",
"sec-fetch-site": "same-origin",
"sec-gpc": "1",
"upgrade-insecure-requests": "1",
"user-agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36"
}
response = requests.request("GET", url, data=payload, headers=headers, params=querystring)
print(response.text)
下面是使用scrapy的代码。返回主页的请求
import scrapy
class MzSpider(scrapy.Spider):
name = 'mz'
allowed_domains = ['mzadqatar.com']
start_urls = ['https://en.mzadqatar.com/qatar/cars/sale']
search_url = "https://en.mzadqatar.com/search"
search_body = "type_id=0&id=1&subCategoryId=&pagination=2&search_type=pagination&km_from=&km_to=&price_from=&price_to=&cityId=&CartypeID=&Fueltype=&subsubCategoryId=&gear=&CylinderNumber=&cars_guarantee=&car_condition=&carcolor=&manfactureYear_from=&manfactureYear_to="
dict_search_body = {
"type_id": 0,
"id": 1,
"subCategoryId":"",
"pagination": 1,
"search_type": "pagination",
"km_from": "",
"km_to": "",
"undefined":"" ,
"cityId": "",
"CartypeID":"" ,
"Fueltype": "",
"subsubCategoryId": "",
"gear": "",
"CylinderNumber": "",
"cars_guarantee":"",
"car_condition": "",
"carcolor": "",
"manfactureYear_from":"",
"manfactureYear_to": ""
}
search_headers = {
"authority": "en.mzadqatar.com",
"accept": "*/*",
"accept-language": "en-US,en;q=0.9,lo;q=0.8",
"content-type": "application/x-www-form-urlencoded; charset=UTF-8",
"dnt": "1",
"origin": "https://en.mzadqatar.com",
"referer": "https://en.mzadqatar.com/qatar/cars/sale",
"sec-ch-ua": "'Chromium';v='104', ' Not A;Brand';v='99', 'Google Chrome';v='104'",
"sec-ch-ua-mobile": "?0",
"sec-ch-ua-platform": "'macOS'",
"sec-fetch-dest": "empty",
"sec-fetch-mode": "cors",
"sec-fetch-site": "same-origin",
"sec-gpc": "1",
"user-agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36",
"x-requested-with": "XMLHttpRequest"
}
url = "https://en.mzadqatar.com/qatar/cars/sale?page=1"
headers = {
"authority": "en.mzadqatar.com",
"accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",
"accept-language": "en-US,en;q=0.9,lo;q=0.8",
"dnt": "1",
"referer": "https://en.mzadqatar.com/qatar/cars/sale",
"sec-ch-ua": "'Chromium';v='104', ' Not A;Brand';v='99', 'Google Chrome';v='104'",
"sec-ch-ua-mobile": "?0",
"sec-ch-ua-platform": "'macOS'",
"sec-fetch-dest": "empty",
"sec-fetch-mode": "cors",
"sec-fetch-site": "same-origin",
"sec-gpc": "1",
"upgrade-insecure-requests": "1",
"user-agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36"
}
cookie = {'laravel_session' : 'QYDOviHE487FjGC2FvIaAPNnNdypE9dQcupLrylL'}
def search_requests(self):
yield scrapy.Request(
url=self.search_url,
method='POST',
headers=self.search_headers,
body=self.search_body,
cookies=self.cookie,
callback=self.start_requests
)
def start_requests(self):
yield scrapy.Request(
url=self.url,
method='GET',
headers=self.headers,
body="",
cookies=self.cookie,
callback=self.parse
)
def parse(self, response):
print(response.text)
pass
任何关于如何将请求转化为零碎的帮助。请求将非常感谢。
1条答案
按热度按时间pobjuy321#
正如您所看到的,使用相同的cookies无法同时得到不同的页面结果,这就是为什么您需要对不同的页面使用不同的cookies(我使用
meta={'cookiejar': ...}
):