Scrapy Spider未在Docker容器中执行close方法

siotufzp  于 2023-01-05  发布在  Docker
关注(0)|答案(1)|浏览(136)

我有一个 flask 应用程序,将运行一个scrappy蜘蛛。该应用程序在我的开发机器上工作正常,但当我在容器中运行它时,蜘蛛的关闭方法不会执行。
下面是蜘蛛的代码:

# -*- coding: utf-8 -*-
import scrapy
from bs4 import BeautifulSoup
from scrapy.exceptions import CloseSpider

class ToScrapeCSSSpider(scrapy.Spider):
    name = "toscrape-css"
    start_urls = [
        'http://quotes.toscrape.com/',
    ]

    def parse(self, response):
        page_text = response.text
        # raise CloseSpider("Blocked")

        soup = BeautifulSoup(page_text, "lxml")
        if "xml" in str.lower(page_text[:20]):
            sitemap = True
            links = soup.findAll("loc")
            for link in links:
                yield scrapy.Request(url=link.text, callback=self.parse)

        else:
            raise CloseSpider("I want to close it")
    def close(spider, reason):
        print("Closing spider")
        # self.pbar.clear()
        # self.pbar.write('Closing {} spider'.format(spider.name))
        print("Spider closed")

这是我在www.example.com上的 flask 应用程序main.py:

import crochet
crochet.setup()     # initialize crochet

import json
import pandas as pd
from flask import  redirect, url_for, request
from scrapy.crawler import CrawlerRunner, CrawlerProcess
import time
from datetime import datetime, timedelta
import grequests
from flask import render_template, jsonify, Flask, redirect, url_for, request, flash
from app2.articles_finder.spiders.test_spider import ToScrapeCSSSpider
from app2 import app2


@app2.route("/test_docker")
def test_docker():
    scrap_docker()
    return  "Ok",200
@crochet.run_in_reactor
def scrap_docker():
    eventual = crawl_runner.crawl(ToScrapeCSSSpider)
    eventual.addCallback(finished_docker)

def finished_docker(null):
    print("Scrapping is over in docker container")

最后她是我的 Docker 档案:

FROM phusion/baseimage:0.9.19

# Use baseimage-docker's init system.
CMD ["/sbin/my_init"]

ENV TERM=xterm-256color
ENV SCRAPPER_HOME=/app/links_finder
ENV PYTHON_VERSION="3.6.5"
ENV FRONT_ADDRESS = blabla


# Set the locale
RUN locale-gen en_US.UTF-8
ENV LANG en_US.UTF-8
ENV LANGUAGE en_US:en
ENV LC_ALL en_US.UTF-8

# Install necessary packages

RUN apt-get update && apt-get install -y \
    build-essential
#RUN apt-get update && apt-get install -y \
#    build-essential \

# Install core packages
#RUN apt-get update
RUN apt-get install -y build-essential checkinstall software-properties-common llvm cmake wget git nano nasm yasm zip unzip pkg-config \
    libreadline-gplv2-dev libncursesw5-dev libssl-dev libsqlite3-dev tk-dev libgdbm-dev libc6-dev libbz2-dev

# Install Python 3.6.5
RUN wget https://www.python.org/ftp/python/${PYTHON_VERSION}/Python-${PYTHON_VERSION}.tar.xz \
    && tar xvf Python-${PYTHON_VERSION}.tar.xz \
    && rm Python-${PYTHON_VERSION}.tar.xz \
    && cd Python-${PYTHON_VERSION} \
    && ./configure \
    && make altinstall \
    && cd / \
    && rm -rf Python-${PYTHON_VERSION}

RUN apt-get install -y python3-pip

WORKDIR ${SCRAPPER_HOME}
COPY . ${SCRAPPER_HOME}
RUN ls

#COPY  run_gunicorn_app_2.py ${SCRAPPER_HOME}

RUN pip3 install -r requirements2.txt


RUN chmod 777 -R *

# Clean up
RUN apt-get clean && rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
#ENTRYPOINT python3 ${SCRAPPER_HOME}/run_gunicorn_app_2.py

EXPOSE 3456

ENTRYPOINT python3 run_gunicorn_app_2.py
#ENTRYPOINT python3 ${SCRAPPER_HOME}/run_gunicorn_app_2.py

要求2.txt文件:

tqdm==4.19.4
APScheduler ==3.6.1
Flask==1.0.2
Flask-Admin==1.3.0
Flask-Bcrypt==0.7.1
Flask-DebugToolbar==0.10.0
Flask-Login==0.3.2
Flask-Mail==0.9.1
Flask-Script==2.0.5
Flask-SQLAlchemy==2.1
Flask-WTF==0.12
Flask-redis==0.4.0
gunicorn==19.4.5
itsdangerous==0.24
pytz==2016.10
structlog==16.1.0
termcolor==1.1.0
WTForms==2.1
scrapy==1.6.0
grequests==0.4.0
#pandas==0.24
crochet==1.10.0
redis==3.3.8
beautifulsoup4==4.7.1
publicsuffixlist==0.7.1
PyMySQL==0.9.3

当我运行Docker容器时,得到的结果如下:

很明显:close方法根本没有执行。有什么提示吗?我已经被这个问题困扰了很长一段时间,所以任何cluse都是非常受欢迎的。谢谢!

ef1yzkbh

ef1yzkbh1#

经过大量的调试,最后似乎没有任何问题。我只需要在python3之后添加-u来添加日志记录。

ENTRYPOINT python3 -u run_gunicorn_app_2.py

相关问题