所以我正在做一个项目,使用多个蜘蛛抓取不同的网站。我想让它,使蜘蛛再次运行时,用户说“是”时,要求继续。
keyword = input("enter keyword: ")
page_range = input("enter page range: ")
flag = True
while flag:
process = CrawlProcess()
process.crawl(crawler1, keyword, page_range)
process.crawl(crawler2, keyword, page_range)
process.crawl(crawler3, keyword, page_range)
process.start()
isContinue = input("Do you want to continue? (y/n): ")
if isContinue == 'n':
flag = False
但我得到一个错误,说React堆是不可重启的。
Traceback (most recent call last):
File "/Users/user/Desktop/programs/eshopSpider/eshopSpider.py", line 47, in <module>
process.start()
File "/Users/user/opt/anaconda3/lib/python3.8/site-packages/scrapy/crawler.py", line 327, in start
reactor.run(installSignalHandlers=False) # blocking call
File "/Users/user/opt/anaconda3/lib/python3.8/site-packages/twisted/internet/base.py", line 1317, in run
self.startRunning(installSignalHandlers=installSignalHandlers)
File "/Users/user/opt/anaconda3/lib/python3.8/site-packages/twisted/internet/base.py", line 1299, in startRunning
ReactorBase.startRunning(cast(ReactorBase, self))
File "/Users/user/opt/anaconda3/lib/python3.8/site-packages/twisted/internet/base.py", line 843, in startRunning
raise error.ReactorNotRestartable()
twisted.internet.error.ReactorNotRestartable
所以我想使用while循环是行不通的。我甚至不知道从哪里开始...
3条答案
按热度按时间5cnsuln71#
Method 1:
scrapy
createsReactor
which can't be reused afterstop
but if you will runCrawler
in separated process then new process will have to create newReactor
.It will not work if you use
threading
instead ofmultiprocessing
because threads share variables so new thread will use the sameReactor
as previous thread.Minimal working code (tested on Linux).
Method 2:
Found in Google: Restarting a Twisted Reactor .
It is old post which uses
del
to remove moduletwisted
from memory and later itimports
it again.Minimal working code (tested on Linux)
Method 3:
It seems you could use use CrawlRunner instead of
CrawlProcess
- but I didn't test it yet.Base on last example in doc for Running multiple spiders in the same process I created code which runs
while
-loop inside reactor (so it doesn't have to stop it) but it first starts one Spider, next runs second Spider, next it asks for contiuation and it runs again first Spider, next runs second Spider. It doesn't runs both Spiders at the same time but maybe it could be somehow changed.EDIT:
The same but now all crawlers run at the same time
yrefmtwq2#
您可以移除
while
循环,并改用回呼。编辑:添加了示例:
7ajki6be3#