Scrapy:为什么登录后使用Scrapy注销基本请求[duplicate]

ki1q1bka  于 2022-11-09  发布在  其他
关注(0)|答案(1)|浏览(149)

此问题在此处已有答案

Scrapy-Splash Session Handling(1个答案)
三个月前关门了。
我真的不明白为什么一个基本的请求后登录我 checkout 使用scrapy,我已经提出了几个问题,在各种Scrapy论坛(问题链接,redditgithubStackoverflow)但他们并不简单地提供答案。我可以很容易地实现这一点与 selenium 没有任何问题,复制相同的与scrapy现在似乎是一个问题,我已经尝试了50多个不同的SO解决方案。我只需要一个原因,为什么我被注销,一旦我产卵另一个请求后登录。
下面是基本的Selenium和Scrapy脚本,其中包含用于登录的虚拟帐户详细信息。

from selenium.webdriver import Chrome, ChromeOptions
from selenium.webdriver.chrome.service import Service

# define our URL

url = 'https://www.oddsportal.com/login/'
username = 'chuky'
password = 'A151515a'
path = r'C:\Users\Glodaris\OneDrive\Desktop\Repo\Scraper\chromedriver.exe'
webdriver_service = Service(path)
options = ChromeOptions()

# options=options

browser = Chrome(service=webdriver_service, options=options)

browser.get(url)
browser.implicitly_wait(2)
browser.find_element(By.ID, 'onetrust-accept-btn-handler').click()
browser.find_element(By.ID,'login-username1').send_keys(username)
browser.find_element(By.ID,'login-password1').send_keys(password)
browser.implicitly_wait(10)
browser.find_element(By.XPATH,'//*[@id="col-content"]//button[@class="inline-btn-2"]').click()#.send_keys(self.password)

print('successful login')
browser.implicitly_wait(10)
browser.get('https://www.oddsportal.com/results/')

斯克拉皮

class OddsportalSpider(CrawlSpider):
    name = 'oddsportal'
    allowed_domains = ['oddsportal.com']  
    # start_urls = ['http://oddsportal.com/results/']
    login_page = 'https://www.oddsportal.com/login/'

    def start_requests(self):
        """called before crawling starts. Try to login"""
        yield scrapy.Request(
        url=self.login_page,
        callback=self.login,
        dont_filter=True    
        )
    # parse response
    def login(self, response):
        """Generate a login request."""

        yield FormRequest.from_response(
             response=response,
              formdata={'login-username': 'chuky', 
                  'login-password': 'A151515a',
                  'login-submit': '',
                },
              callback=self.after_login,
              dont_filter=True
              )
    #simply check if log-in was successful, and spawn another request to /results/
    def after_login(self, response):

        if b"Wrong username or password" in response.body:
            logger.warning("LOGIN ATTEMPT FAILED")
            return
        else:
            logger.info("LOGIN ATTEMPT SUCCESSFUL")
            url = 'https://www.oddsportal.com/results/'
            return  scrapy.Request(url=url,callback=self.parse_item,  dont_filter=True) 
    def parse_item(self, response):  
        print( 'Thissssssssss----------------------',response.url)
        open_in_browser(response)

我得到注销,一旦我产生一个请求/results/后,成功登录.据说,在默认情况下,scrapy处理cookie,我已经尝试发送cookie和标题旁边的每一个请求,但没有工作.请我需要有人尝试从另一端,并告诉我这一点的原因,因为我的React显示,我登录,但发送一个请求后,该日志的我.
步骤重现scrayReact:
1.勉强起步的项目
1.网站Map----oddsportal.com
1.将用户代理设置为默认Scrapy用户代理:用户代理= 'oddsportal_website(+http:www.yourdomain.com)'
1.运行蜘蛛程序:古怪的运动
记录档

{'BOT_NAME': 'oddsportal_website',
 'DUPEFILTER_CLASS': 'scrapy_splash.SplashAwareDupeFilter',
 'HTTPCACHE_STORAGE': 'scrapy_splash.SplashAwareFSCacheStorage',
 'NEWSPIDER_MODULE': 'oddsportal_website.spiders',
 'ROBOTSTXT_OBEY': True,
 'SPIDER_MODULES': ['oddsportal_website.spiders']}
2022-08-15 09:47:48 [scrapy.utils.log] DEBUG: Using reactor: twisted.internet.selectreactor.SelectReactor
2022-08-15 09:47:48 [scrapy.extensions.telnet] INFO: Telnet Password: 66aa39ca3b133f3d
2022-08-15 09:47:48 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
 'scrapy.extensions.telnet.TelnetConsole',
 'scrapy.extensions.logstats.LogStats']
2022-08-15 09:47:48 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware',
 'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
 'oddsportal_website.middlewares.UserAgentRotatorMiddleware',
 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
 'scrapy.downloadermiddlewares.retry.RetryMiddleware',
 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
 'scrapy_splash.SplashCookiesMiddleware',
 'scrapy_splash.SplashMiddleware',
 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
 'scrapy.downloadermiddlewares.stats.DownloaderStats']
2022-08-15 09:47:48 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
 'scrapy_splash.SplashDeduplicateArgsMiddleware',
 'log_count/DEBUG': 9,
 'log_count/INFO': 11,
 'request_depth_max': 2,
 'response_received_count': 4,
 'robotstxt/request_count': 1,
 'robotstxt/response_count': 1,
 'robotstxt/response_status_count/200': 1,
 'scheduler/dequeued': 3,
 'scheduler/dequeued/memory': 3,
 'scheduler/enqueued': 3,
 'scheduler/enqueued/memory': 3,
 'start_time': datetime.datetime(2022, 8, 15, 8, 47, 48, 449490)}
55ooxyrt

55ooxyrt1#

您已登录!这只是因为用户名不是响应的一部分,而是通过API调用或使用JavaScript和Cookie加载的(您可以在结果页面上查看页面源代码,然后搜索Chuky,您不会找到它),而且由于Scrapy只从您设置的URL加载响应(没有JS或其他API调用)它不会显示。确认您已登录的一个好方法是转到https://www.oddsportal.com/settings/,它在HTML中有用户名

相关问题