scrapy 如何在Scrappy中提交带有会话的表单

7kqas0il  于 2022-11-09  发布在  其他
关注(0)|答案(1)|浏览(158)

我正在尝试使用Scrapy抓取website。要获取我想要的内容,我需要先登录。网址是login_url
在那里我有如下形式:

我的代码如下:

LOGIN_URL1 = "https://www.partslink24.com/partslink24/user/login.do"
class PartsSpider(scrapy.Spider):
    name = "parts"
    login_url = LOGIN_URL1
    start_urls = [
        login_url,
    ]

    def parse(self, response):
        form_data = {
            'accountLogin': COMPANY_ID,
            'userLogin': USERNAME,
            'loginBean.password': PASSWORD
        }
        yield FormRequest(url=self.login_url, formdata=form_data, callback=self.parse1)

    def parse1(self, response):
        inspect_response(response, self)
        print("RESPONSE: {}".format(response))

def start_scraper(vin_number):
    process = CrawlerProcess()
    process.crawl(PartsSpider)
    process.start()

但问题是,他们检查会话是否被激活,我得到一个错误,表单无法提交。
当我检查提交登录表单后得到的响应时,我得到以下错误:

在他们的网站上的代码,检查如下:

var JSSessionChecker = {
    check: function()
    {
        if (!Ajax.getTransport())
        {
            alert('NO_AJAX_IN_BROWSER');
        }
        else
        {

            new Ajax.Request('/partslink24/checkSessionCookies.do', {
                method:'post',
                onSuccess: function(transport)
                {
                    if (transport.responseText != 'true')
                    {
                        if (Object.isFunction(JSSessionChecker.showError)) JSSessionChecker.showError(); 
                    }
                },
                onFailure: function(e) 
                { 
                    if (Object.isFunction(JSSessionChecker.showError)) JSSessionChecker.showError(); 
                },
                onException: function (request, e) 
                { 
                    if (Object.isFunction(JSSessionChecker.showError)) JSSessionChecker.showError(); 
                }
            });
        }
    },

    showError: function()
    {
        var errorElement = $('sessionCheckError');
        if (errorElement)
        {
            errorElement.show();
        }
    }
};
JSSessionChecker.check();

如果成功,则只返回true
在提交表单之前,是否有任何方法可以激活会话?
先谢谢你。

编辑

错误页面,我得到使用从@fam的答案.

x6492ojm

x6492ojm1#

请检查此代码。

import scrapy

LOGIN_URL1 = "https://www.partslink24.com/partslink24/user/login.do"
class PartsSpider(scrapy.Spider):
    name = "parts"
    login_url = LOGIN_URL1
    start_urls = [
        login_url,
    ]

    def parse(self, response):
        form_data = {
            'loginBean.accountLogin': "COMPANY_ID",
            'loginBean.userLogin': "USERNAME",
            'loginBean.sessionSqueezeOut' : "false",
            'loginBean.password': "PASSWORD",
            'loginBean.userOffsetSec' : "18000",
            'loginBean.code2f' : ""
        }
        yield scrapy.FormRequest.from_response(response=response, url=self.login_url, formdata=form_data, callback=self.parse1)

    def parse1(self, response):
        #scrapy.inspect_response(response, self)
        print("RESPONSE: {}".format(response))

def start_scraper(vin_number):
    process = scrapy.CrawlerProcess()
    process.crawl(PartsSpider)
    process.start()

我没有收到错误,响应如下:

RESPONSE: <200 https://www.partslink24.com/partslink24/user/login.do>

**EDIT:**以下代码是Selenium的。它会让你很容易地登录到页面。你只需要下载chrome驱动程序并安装Selenium。

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.options import Options
import time

chrome_options = Options()

# chrome_options.add_argument("--headless")

driver = webdriver.Chrome(executable_path="./chromedriver", options=chrome_options)
driver.get("https://www.partslink24.com/partslink24/user/login.do")

# enter the form fields

company_ID = "company id"
user_name = "user name"
password = "password"

company_ID_input = driver.find_element_by_xpath("//input[@name='accountLogin']")
company_ID_input.send_keys(company_ID)
time.sleep(1)

user_name_input = driver.find_element_by_xpath("//input[@name='userLogin']")
user_name_input.send_keys(user_name)
time.sleep(1)

password_input = driver.find_element_by_xpath("//input[@id='inputPassword']")
password_input.send_keys(password)
time.sleep(1)

# click the search button and get links from first page

click_btn = driver.find_element_by_xpath("//a[@tabindex='5']")
click_btn.click()
time.sleep(5)

不要忘记更改凭据。

相关问题