scrapy 运行源自scrappy项目的.exe文件后出错

q3qa4bjr  于 2023-02-08  发布在  其他
关注(0)|答案(1)|浏览(127)

我正在写一个非常好用的零碎的项目。我已经用pyinstaller把它转换成了一个可执行文件。现在我正期待着一些关于导入模块的麻烦,因为我读到很多人在这方面都有麻烦。但是由于某些原因,我甚至没有走到那一步。只要我运行main.exe文件,控制台就会打开并显示以下消息:
追溯(最近调用最后调用):文件"rascraper\main.py",第1行,
这是对应的www.example.com文件main.py file

from rascraper.spiders.spiderone import PostsSpider
from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings

def main():

    process = CrawlerProcess(get_project_settings())
    process.crawl(PostsSpider)
    process.start()


if __name__ == '__main__':
    main()

这是我的蜘蛛课

import scrapy

class PostsSpider(scrapy.Spider):
    name = 'posts'

    # artist = input(f'Artist Name:')
    # filter = input(f'filter on Country? (y/n):')
    #
    # if filter == 'y':
    #     country = input(f'Country:')
    #     start_urls = [
    #         f'https://ra.co/dj/{artist}/past-events?country={country}'
    #     ]
    #
    # elif filter == 'n':
    #     start_urls = [
    #         f'https://ra.co/dj/{artist}/past-events'
    #     ]

    HEADERS = {
        'accept': '/*',
        'accept-encoding': 'gzip, deflate, br',
        'accept-language': 'nl-NL,nl;q=0.9,en-US;q=0.8,en;q=0.7,fr;q=0.6',
        'authorization': 'df67dacc9c704696b908a618dd4f59be',
        'cache-control': 'max-age=0',
        'content-type': 'application/json',
        'origin': 'https://ra.co',
        'referer': 'https://ra.co/',
        'sec-ch-ua': '"Not_A Brand";v="99", "Google Chrome";v="109", "Chromium";v="109"',
        'sec-ch-ua-mobile': '?0',
        'sec-ch-ua-platform': 'Windows',
        'sec-fetch-dest': 'empty',
        'sec-fetch-mode': 'cors',
        'sec-fetch-site': 'same-site',
    }

    def parse(self, response):

        for post in response.css('li.Column-sc-18hsrnn-0.inVJeD'):

            date = post.css('.Text-sc-1t0gn2o-0.jmZufm::text').get()
            event = post.css('.Text-sc-1t0gn2o-0.Link__StyledLink-k7o46r-0.dXQVFW::text').get()
            location = post.css('.Text-sc-1t0gn2o-0.Link__StyledLink-k7o46r-0.echVma::text').get()
            venue = post.css('.Text-sc-1t0gn2o-0.Link__StyledLink-k7o46r-0.dxNiKF::text').get()
            acts = post.css('.Text-sc-1t0gn2o-0.bYvpkM::text').get()

            item = {}
            item['Date'] = date
            item['Event'] = event
            item['Location'] = location
            item['Venue'] = venue
            item['Acts'] = acts

            yield item

此错误来自何处,如何解决?

5ktev3wc

5ktev3wc1#

    • 使用PyInstaller从零碎的项目生成独立的可执行文件**

要创建单个可执行文件,您需要执行以下步骤:
1.将此添加到所有蜘蛛(source):

import scrapy.utils.misc
import scrapy.core.scraper

def warn_on_generator_with_return_value_stub(spider, callable):
    pass

scrapy.utils.misc.warn_on_generator_with_return_value = warn_on_generator_with_return_value_stub
scrapy.core.scraper.warn_on_generator_with_return_value = warn_on_generator_with_return_value_stub

在我的示例中,spider.py将如下所示:

import scrapy
import scrapy.utils.misc
import scrapy.core.scraper

def warn_on_generator_with_return_value_stub(spider, callable):
    pass

scrapy.utils.misc.warn_on_generator_with_return_value = warn_on_generator_with_return_value_stub
scrapy.core.scraper.warn_on_generator_with_return_value = warn_on_generator_with_return_value_stub

class ExampleSpider(scrapy.Spider):
    name = 'example_spider'
    allowed_domains = ['scrapingclub.com']
    start_urls = ['https://scrapingclub.com/exercise/detail_basic/']

    def parse(self, response):
        item = dict()
        item['title'] = response.xpath('//h3/text()').get()
        item['price'] = response.xpath('//div[@class="card-body"]/h4/text()').get()
        yield item

1.将此代码添加到main.py(如果不添加此代码,则无论何时尝试从项目目录之外的目录运行可执行文件,都会收到错误):

import os

os.environ.setdefault('SCRAPY_SETTINGS_MODULE', PATH_TO_SETTINGS)

在本例中main.py

import os
from rascraper.spiders.spider import ExampleSpider
from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings

def main():
    os.environ.setdefault('SCRAPY_SETTINGS_MODULE', 'settings')
    process = CrawlerProcess(get_project_settings())
    process.crawl(ExampleSpider)
    process.start()

if __name__ == '__main__':
    main()

1.运行pyinstaller以生成规范文件:python -m PyInstaller --onefile --name example_exe main.py.
1.更改等级库文件:将项目中的所有文件添加到datas列表中。
之前:

# -*- mode: python ; coding: utf-8 -*-

block_cipher = None

a = Analysis(['main.py'],
             pathex=[],
             binaries=[],
             datas=[],
             hiddenimports=[],
             hookspath=[],
             hooksconfig={},
             runtime_hooks=[],
             excludes=[],
             win_no_prefer_redirects=False,
             win_private_assemblies=False,
             cipher=block_cipher,
             noarchive=False)
pyz = PYZ(a.pure, a.zipped_data,
             cipher=block_cipher)

exe = EXE(pyz,
          a.scripts,
          a.binaries,
          a.zipfiles,
          a.datas,  
          [],
          name='example_exe',
          debug=False,
          bootloader_ignore_signals=False,
          strip=False,
          upx=True,
          upx_exclude=[],
          runtime_tmpdir=None,
          console=True,
          disable_windowed_traceback=False,
          target_arch=None,
          codesign_identity=None,
          entitlements_file=None )

之后:

# -*- mode: python ; coding: utf-8 -*-

block_cipher = None

a = Analysis(['main.py'],
             pathex=[],
             binaries=[],
             datas=[('items.py','.'),
                    ('middlewares.py','.'),
                    ('pipelines.py','.'),
                    ('settings.py','.'),
                    ('spiders','spiders'),
                    ('..\\scrapy.cfg', '.')],
             hiddenimports=[],
             hookspath=[],
             hooksconfig={},
             runtime_hooks=[],
             excludes=[],
             win_no_prefer_redirects=False,
             win_private_assemblies=False,
             cipher=block_cipher,
             noarchive=False)
pyz = PYZ(a.pure, a.zipped_data,
             cipher=block_cipher)

exe = EXE(pyz,
          a.scripts,
          a.binaries,
          a.zipfiles,
          a.datas,  
          [],
          name='example_exe',
          debug=False,
          bootloader_ignore_signals=False,
          strip=False,
          upx=True,
          upx_exclude=[],
          runtime_tmpdir=None,
          console=True,
          disable_windowed_traceback=False,
          target_arch=None,
          codesign_identity=None,
          entitlements_file=None )

1.构建等级库文件:python -m PyInstaller example_exe.spec

    • 结果:**

现在应该有一个独立的可执行文件,您可以在任何目录中运行:

C:\Users\MY_USER\Desktop>example_exe.exe

...
...
[scrapy.core.engine] INFO: Spider opened
[scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
[scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
[scrapy.core.engine] DEBUG: Crawled (404) <GET https://scrapingclub.com/robots.txt> (referer: None)
[scrapy.core.engine] DEBUG: Crawled (200) <GET https://scrapingclub.com/exercise/detail_basic/> (referer: None)
[scrapy.core.scraper] DEBUG: Scraped from <200 https://scrapingclub.com/exercise/detail_basic/>
{'title': 'Long-sleeved Jersey Top', 'price': '$12.99'}
[scrapy.core.engine] INFO: Closing spider (finished)
[scrapy.statscollectors] INFO: Dumping Scrapy stats:
...
...
    • OP项目专用:**

项目树如下所示:

C:.
│   main.py
│   scrapy.cfg
│
└───rascraper
    │   items.py
    │   middlewares.py
    │   pipelines.py
    │   settings.py
    │   __init__.py
    │
    ├───spiders
    │   │   spiderone.py
    │   │   __init__.py
    │   │
    │   └───__pycache__
    │           spiderone.cpython-310.pyc
    │           __init__.cpython-310.pyc
    │
    └───__pycache__
            middlewares.cpython-310.pyc
            pipelines.cpython-310.pyc
            settings.cpython-310.pyc
            __init__.cpython-310.pyc

因此datas列表应该是:

datas=[('rascraper\\items.py', '.'),
       ('rascraper\\middlewares.py', '.'),
        ('rascraper\\pipelines.py', '.'),
        ('rascraper\\settings.py', '.'),
        ('rascraper\\spiders', 'spiders'),
        ('scrapy.cfg', '.')],
    • 更正:**在main.py中,它应仅为os.environ.setdefault('SCRAPY_SETTINGS_MODULE', 'settings')

相关问题