scrapy Python皮：来自表单请求的400响应

cnjp1d6j 于 2022-11-09 发布在 Python

关注(0)|答案(2)|浏览(147)

我一直在努力刮网站https://fbschedules.com/new-england-patriots-schedule/
这个网站使用一个隐藏的表单来提交一个 AJAX 请求到php文件：https://fbschedules.com/wp-admin/admin-ajax.php
在尝试模拟 AJAX 请求后，Scrapy为以下代码返回400响应：

def parse(self, response):
    headers = {
        'User_Agent': user_agent,
        'Accept': 'application/json, text/javascript, */*; q=0.01',
        'Accept-Language': 'en-US,en;q=0.5',
        'Accept-Encoding': 'gzip, deflate, br',
        'Referer': 'https://fbschedules.com/new-england-patriots-schedule/',
        'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
        'X-Requested-With': 'XMLHttpRequest',
        'Cookie': cookie,
        'DNT': '1',
        'Connection': 'keep-alive',
        'Cache-Control': 'max-age=0'
    }

    data = {
        'action': 'load_fbschedules_ajax',
        'type': 'NFL',
        'display': 'Season',
        'team': 'New+England+Patriots',
        'current_season': '2018',
        'view': '',
        'conference': '',
        'conference-division': '',
        'ncaa-subdivision': '',
        'ispreseason': '',
        'schedule-week': '',
    }

    yield scrapy.FormRequest.from_response('https://fbschedules.com/wp-admin/admin-ajax.php',
                                           headers=headers,
                                           formdata=data,
                                           method='POST',
                                           callback=self.schedule_parse)

任何在正确方向上的帮助都是感激不尽的！
编辑：我还应该提到，我使用以下代码将此spider作为单个脚本运行：

def start():
    configure_logging()
    runner = CrawlerRunner()
    runner.crawl(NflSpider)
    d = runner.join()
    d.addBoth(lambda _: reactor.stop())

    reactor.run()

开始搜索页。控制台输出如下所示：
2018-09-02 18：20：33 [报废.核心.发动机]信息：弹波器打开
2018-09-02 18：20：33 [scrapy.扩展名.日志统计]信息：已爬网0页（0页/分钟），抓取0项（0项/分钟）
2018-09-02 18：20：33 [scrapy.扩展名.telnet]调试：在www.example.com上监听的Telnet控制台127.0.0.1:6024
2018-09-02 18：20：33 [报废.核心.引擎]调试：已抓取（400）https://fbschedules.com/wp-admin/admin-ajax.php〉（指涉者：无）
2018-09-02 18：20：33 [抓取.蜘蛛中间件. http错误]信息：正在忽略响应〈400 https://fbschedules.com/wp-admin/admin-ajax.php〉：HTTP状态代码未处理或不允许
2018-09-02 18：20：33 [报废.核心.发动机]信息：闭合星形轮（已完成）

scrapy

来源：https://stackoverflow.com/questions/52140773/python-scrapy-400-response-from-form-request

2条答案

按热度按时间

smtd7mpg1#

我遇到了同样的问题，我通过向FormRequest参数添加 meta参数来处理它。
请尝试使用scrapy.FormRequest而不是scrapy.FormRequest.from_response：

meta = {'handle_httpstatus_all': True}
yield FormRequest('https://fbschedules.com/wp-admin/admin-ajax.php',
                                           headers=headers,
                                           formdata=data,
                                           method='POST',
                                           meta=meta,
                                           callback=self.schedule_parse)

赞(0）回复(0）举报 2022-11-09

holgip5t2#

我知道这个问题很老了，但是我解决了我的错误，只是添加了用户代理作为头。

headers = {
        'user-agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36'
    }
yield FormRequest('https://fbschedules.com/wp-admin/admin-ajax.php',
                                           headers=headers,
                                           formdata=data,
                                           method='POST',
                                           meta=meta,
                                           callback=self.schedule_parse)

赞(0）回复(0）举报 2022-11-09

我来回答

scrapy Python皮：来自表单请求的400响应

2条答案

相关问题

热门标签

最新问答