我有一个 flask 应用程序,将运行一个scrappy蜘蛛。该应用程序在我的开发机器上工作正常,但当我在容器中运行它时,蜘蛛的关闭方法不会执行。
下面是蜘蛛的代码:
# -*- coding: utf-8 -*-
import scrapy
from bs4 import BeautifulSoup
from scrapy.exceptions import CloseSpider
class ToScrapeCSSSpider(scrapy.Spider):
name = "toscrape-css"
start_urls = [
'http://quotes.toscrape.com/',
]
def parse(self, response):
page_text = response.text
# raise CloseSpider("Blocked")
soup = BeautifulSoup(page_text, "lxml")
if "xml" in str.lower(page_text[:20]):
sitemap = True
links = soup.findAll("loc")
for link in links:
yield scrapy.Request(url=link.text, callback=self.parse)
else:
raise CloseSpider("I want to close it")
def close(spider, reason):
print("Closing spider")
# self.pbar.clear()
# self.pbar.write('Closing {} spider'.format(spider.name))
print("Spider closed")
这是我在www.example.com上的 flask 应用程序main.py:
import crochet
crochet.setup() # initialize crochet
import json
import pandas as pd
from flask import redirect, url_for, request
from scrapy.crawler import CrawlerRunner, CrawlerProcess
import time
from datetime import datetime, timedelta
import grequests
from flask import render_template, jsonify, Flask, redirect, url_for, request, flash
from app2.articles_finder.spiders.test_spider import ToScrapeCSSSpider
from app2 import app2
@app2.route("/test_docker")
def test_docker():
scrap_docker()
return "Ok",200
@crochet.run_in_reactor
def scrap_docker():
eventual = crawl_runner.crawl(ToScrapeCSSSpider)
eventual.addCallback(finished_docker)
def finished_docker(null):
print("Scrapping is over in docker container")
最后她是我的 Docker 档案:
FROM phusion/baseimage:0.9.19
# Use baseimage-docker's init system.
CMD ["/sbin/my_init"]
ENV TERM=xterm-256color
ENV SCRAPPER_HOME=/app/links_finder
ENV PYTHON_VERSION="3.6.5"
ENV FRONT_ADDRESS = blabla
# Set the locale
RUN locale-gen en_US.UTF-8
ENV LANG en_US.UTF-8
ENV LANGUAGE en_US:en
ENV LC_ALL en_US.UTF-8
# Install necessary packages
RUN apt-get update && apt-get install -y \
build-essential
#RUN apt-get update && apt-get install -y \
# build-essential \
# Install core packages
#RUN apt-get update
RUN apt-get install -y build-essential checkinstall software-properties-common llvm cmake wget git nano nasm yasm zip unzip pkg-config \
libreadline-gplv2-dev libncursesw5-dev libssl-dev libsqlite3-dev tk-dev libgdbm-dev libc6-dev libbz2-dev
# Install Python 3.6.5
RUN wget https://www.python.org/ftp/python/${PYTHON_VERSION}/Python-${PYTHON_VERSION}.tar.xz \
&& tar xvf Python-${PYTHON_VERSION}.tar.xz \
&& rm Python-${PYTHON_VERSION}.tar.xz \
&& cd Python-${PYTHON_VERSION} \
&& ./configure \
&& make altinstall \
&& cd / \
&& rm -rf Python-${PYTHON_VERSION}
RUN apt-get install -y python3-pip
WORKDIR ${SCRAPPER_HOME}
COPY . ${SCRAPPER_HOME}
RUN ls
#COPY run_gunicorn_app_2.py ${SCRAPPER_HOME}
RUN pip3 install -r requirements2.txt
RUN chmod 777 -R *
# Clean up
RUN apt-get clean && rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
#ENTRYPOINT python3 ${SCRAPPER_HOME}/run_gunicorn_app_2.py
EXPOSE 3456
ENTRYPOINT python3 run_gunicorn_app_2.py
#ENTRYPOINT python3 ${SCRAPPER_HOME}/run_gunicorn_app_2.py
要求2.txt文件:
tqdm==4.19.4
APScheduler ==3.6.1
Flask==1.0.2
Flask-Admin==1.3.0
Flask-Bcrypt==0.7.1
Flask-DebugToolbar==0.10.0
Flask-Login==0.3.2
Flask-Mail==0.9.1
Flask-Script==2.0.5
Flask-SQLAlchemy==2.1
Flask-WTF==0.12
Flask-redis==0.4.0
gunicorn==19.4.5
itsdangerous==0.24
pytz==2016.10
structlog==16.1.0
termcolor==1.1.0
WTForms==2.1
scrapy==1.6.0
grequests==0.4.0
#pandas==0.24
crochet==1.10.0
redis==3.3.8
beautifulsoup4==4.7.1
publicsuffixlist==0.7.1
PyMySQL==0.9.3
当我运行Docker容器时,得到的结果如下:
很明显:close方法根本没有执行。有什么提示吗?我已经被这个问题困扰了很长一段时间,所以任何cluse都是非常受欢迎的。谢谢!
1条答案
按热度按时间ef1yzkbh1#
经过大量的调试,最后似乎没有任何问题。我只需要在python3之后添加-u来添加日志记录。